[PDF] Interlaced Greedy Algorithm for Maximization of Submodular Functions in Nearly Linear Time

Abstract

Full PDF

IInterlaced Greedy Algorithm for Maximization ofSubmodular Functions in Nearly Linear Time

Alan Kuhnle

Department of Computer ScienceFlorida State University [email protected]

Abstract

A deterministic approximation algorithm is presented for the maximization of non-monotone submodular functions over a ground set of size n subject to cardinalityconstraint k ; the algorithm is based upon the idea of interlacing two greedy pro-cedures. The algorithm uses interlaced, thresholded greedy procedures to obtaintight ratio / − ε in O (cid:0) nε log (cid:0) kε (cid:1)(cid:1) queries of the objective function, which im-proves upon both the ratio and the quadratic time complexity of the previouslyfastest deterministic algorithm for this problem. The algorithm is validated inthe context of two applications of non-monotone submodular maximization, onwhich it outperforms the fastest deterministic and randomized algorithms in priorliterature. A nonnegative function f deﬁned on subsets of a ground set U of size n is submodular iff for all A, B ⊆ U , x ∈ U \ B , such that A ⊆ B , it holds that f ( B ∪ x ) − f ( B ) ≤ f ( A ∪ x ) − f ( A ) . Intuitively, the property of submodularity captures diminishing returns. Because of a rich varietyof applications, the maximization of a nonnegative submodular function with respect to a cardinal-ity constraint (MCC) has a long history of study (Nemhauser et al., 1978). Applications of MCCinclude viral marketing (Kempe et al., 2003), network monitoring (Leskovec et al., 2007), videosummarization (Mirzasoleiman et al., 2018), and MAP Inference for Determinantal Point Processes(Gillenwater et al., 2012), among many others. In recent times, the amount of data generated bymany applications has been increasing exponentially; therefore, linear or sublinear-time algorithmsare needed.If a submodular function f is monotone , greedy approaches for MCC have proven effective andnearly optimal, both in terms of query complexity and approximation factor: subject to a cardinalityconstraint k , a simple greedy algorithm gives a (1 − /e ) approximation ratio in O ( kn ) queries(Nemhauser et al., 1978), where n is the size of the instance. Furthermore, this ratio is optimalunder the value oracle model (Nemhauser and Wolsey, 1978). Badanidiyuru and Vondrák (2014)sped up the greedy algorithm to require O (cid:0) nε log nε (cid:1) queries while sacriﬁcing only a small ε > inthe approximation ratio, while Mirzasoleiman et al. (2015) developed a randomized (1 − /e − ε ) approximation in O ( n/ε ) queries.When f is non-monotone, the situation is very different; no subquadratic deterministic algorithm hasyet been developed. Although a linear-time, randomized (1 /e − ε ) -approximation has been devel-oped by Buchbinder et al. (2015), which requires O (cid:0) nε log ε (cid:1) queries, the performance guaranteeof this algorithm holds only in expectation. A derandomized version of the algorithm with ratio /e The function f is monotone if for all A ⊆ B , f ( A ) ≤ f ( B ) .Preprint. Under review. a r X i v : . [ c s . D S ] O c t able 1: Fastest algorithms for cardinality constraint Algorithm Ratio Time complexity Deterministic?

FastInterlaceGreedy (Alg. 2) / − ε O (cid:0) nε log kε (cid:1) YesGupta et al. (2010) / − ε O (cid:0) nk + nε (cid:1) YesBuchbinder et al. (2015) /e − ε O (cid:0) nε log ε (cid:1) Nohas been developed by Buchbinder and Feldman (2018a) but has time complexity O ( k n ) . There-fore, in this work, an emphasis is placed upon the development of nearly linear-time, deterministicapproximation algorithms. Contributions

The deterministic approximation algorithm InterlaceGreedy (Alg. 1) is provided for maximizationof a submodular function subject to a cardinality constraint (MCC). InterlaceGreedy achieves ratio / in O ( kn ) queries to the objective function. A faster version of the algorithm is formulated inFastInterlaceGreedy (Alg. 2), which achieves ratio (1 / − ε ) in O (cid:0) nε log kε (cid:1) queries. In Table 1,the relationship is shown to the fastest deterministic and randomized algorithms for MCC in priorliterature.Both algorithms operate by interlacing two greedy procedures together in a novel manner; that is,the two greedy procedures alternately select elements into disjoint sets and are disallowed fromselection of the same element. This technique is demonstrated ﬁrst with the interlacing of twostandard greedy procedures in InterlaceGreedy, before interlacing thresholded greedy proceduresdeveloped by Badanidiyuru and Vondrák (2014) for monotone submodular functions to obtain thealgorithm FastInterlaceGreedy.The algorithms are validated in the context of cardinality-constrained maximum cut and socialnetwork monitoring, which are both instances of MCC. In this evaluation, FastInterlaceGreedyis more than an order of magnitude faster than the fastest deterministic algorithm (Gupta et al.,2010) and is both faster and obtains better solution quality than the fastest randomized al-gorithm (Buchbinder et al., 2015). The source code for all implementations is available at https://gitlab.com/kuhnle/non-monotone-max-cardinality . Organization

The rest of this paper is organized as follows. Related work and preliminaries onsubmodular optimization are discussed in the rest of this section. In Section 2, InterlaceGreedy andFastInterlaceGreedy are presented and analyzed. Experimental validation is provided in Section 4.

Related Work

The literature on submodular optimization comprises many works. In this section, a short review ofrelevant techniques is given for MCC; that is, maximization of non-monotone, submodular functionsover a ground set of size n with cardinality constraint k . For further information on other types ofsubmodular optimization, interested readers are directed to the survey of Buchbinder and Feldman(2018b) and references therein.A deterministic local search algorithm was developed by Lee et al. (2010), which achieves ratio / − ε in O ( n log n ) queries. This algorithm runs two approximate local search procedures in suc-cession. By contrast, the algorithm FastInterlaceGreedy employs interlacing of greedy proceduresto obtain the same ratio in O (cid:0) nε log kε (cid:1) queries. In addition, a randomized local search algorithmwas formulated by Vondrák (2013), which achieves ratio ≈ . in expectation.Gupta et al. (2010) developed a deterministic, iterated greedy approach, wherein two greedy pro-cedures are run in succession and an algorithm for unconstrained submodular maximization areemployed. This approach requires O ( nk ) queries and has ratio / (4 + α ) , where α is the inverseratio of the employed subroutine for unconstrained, non-monotone submodular maximization; un-der the value query model, the smallest possible value for α is 2, as shown by Feige et al. (2011),so this ratio is at most / . The iterated greedy approach of Gupta et al. (2010) ﬁrst runs one stan-dard greedy algorithm to completion, then starts a second standard greedy procedure; this differsfrom the interlacing procedure which runs two greedy procedures concurrently and alternates be-2ween the selection of elements. The algorithm of Gupta et al. (2010) is experimentally compared toFastInterlaceGreedy in Section 4. The iterated greedy approach of Gupta et al. (2010) was extendedand analyzed under more general constraints by a series of works: Mirzasoleiman et al. (2016);Feldman et al. (2017); Mirzasoleiman et al. (2018).An elegant randomized greedy algorithm of Buchbinder et al. (2014) achieves expected ratio /e in O ( kn ) queries for MCC; this algorithm was derandomized by Buchbinder and Feldman (2018a),but the derandomized version requires O (cid:0) k n (cid:1) queries. The randomized version was sped up inBuchbinder et al. (2015) to achieve expected ratio /e − ε and require O (cid:0) nε log ε (cid:1) queries. Al-though this algorithm has better time complexity than FastInterlaceGreedy, the ratio of /e − ε holds only in expectation, which is much weaker than a deterministic approximation ratio. Thealgorithm of Buchbinder et al. (2015) is experimentally evaluated in Section 4.Recently, an improvement in the adaptive complexity of MCC was made by Balkanski et al. (2018).Their algorithm, BLITS, requires O (cid:0) log n (cid:1) adaptive rounds of queries to the objective, where thequeries within each round are independent of one another and thus can be parallelized easily. Pre-viously the best adaptivity was the trivial O ( n ) . However, each round requires Ω( OP T ) samplesto approximate expectations, which for the applications evaluated in Section 4 is Ω( n ) . For thisreason, BLITS is evaluated as a heuristic in comparison with the proposed algorithms in Section4. Further improvements in adaptive complexity have been made by Fahrbach et al. (2019) andEne and Nguyen (2019).Streaming algorithms for MCC make only one or a few passes through the ground set.Streaming algorithms for MCC include those of Chekuri et al. (2015); Feldman et al. (2018);Mirzasoleiman et al. (2018). A streaming algorithm with low adaptive complexity has recently beendeveloped by Kazemi et al. (2019). In the following, the algorithms are allowed to make an arbitrarynumber of passes through the data.Currently, the best approximation ratio of any algorithm for MCC is . ofBuchbinder and Feldman (2016). Their algorithm also works under a more general constraintthan cardinality constraint; namely, a matroid constraint. This algorithm is the latest in a series ofworks (e.g. (Naor and Schwartz, 2011; Ene and Nguyen, 2016)) using the multilinear extension ofa submodular function, which is expensive to evaluate. Preliminaries

Given n ∈ N , the notation [ n ] is used for the set { , , . . . , n − } . In this work, functions f withdomain all subsets of a ﬁnite set are considered; hence, without loss of generality, the domain ofthe function f is taken to be [ n ] , which is all subsets of [ n ] . An equivalent characterization ofsubmodularity is that for each A, B ⊆ [ n ] , f ( A ∪ B ) + f ( A ∩ B ) ≤ f ( A ) + f ( B ) . For brevity, thenotation f x ( A ) is used to denote the marginal gain f ( A ∪ { x } ) − f ( A ) of adding element x to set A .In the following, the problem studied is to maximize a submodular function under a cardinalityconstraint (MCC), which is formally deﬁned as follows. Let f : 2 n → R + be submodular; let k ∈ [ n ] . Then the problem is to determine arg max A ⊆ [ n ]: | A |≤ k f ( A ) . An instance of MCC is the pair ( f, k ) ; however, rather than an explicit description of f , the function f is accessed by a value oracle; the value oracle may be queried on any set A ⊆ [ n ] to yield f ( A ) .The efﬁciency or runtime of an algorithm is measured by the number of queries made to the oraclefor f .Finally, without loss of generality, instances of MCC considered in the following satisfy n ≥ k . Ifthis condition does not hold, the function may be extended to [ m ] by adding dummy elements to thedomain which do not change the function value. That is, the function g : 2 m → R + is deﬁned as g ( A ) = f ( A ∩ [ n ]) ; it may be easily checked that g remains submodular, and any possible solutionto the MCC instance ( g, k ) maps to a solution of ( f, k ) of the same value. Hence, the ratio of anysolution to ( g, k ) to the optimal is the same as the ratio of the mapped solution to the optimal on ( f, k ) . The mapping is to discard all elements greater than n . Approximation Algorithms

In this section, the approximation algorithms based upon interlacing greedy procedures are pre-sented. In Section 2.1, the technique is demonstrated with standard greedy procedures in algorithmInterlaceGreedy. In Section 2.2, the nearly linear-time algorithm FastInterlaceGreedy is introduced.

In this section, the InterlaceGreedy algorithm (InterlaceGreedy, Alg. 1) is introduced. Interlace-Greedy takes as input an instance of MCC and outputs a set C . Algorithm 1

InterlaceGreedy ( f, k ) : The InterlaceGreedy Algorithm Input: f : 2 [ n ] → R + , k ∈ [ n ] Output: C ⊆ [ n ] , such that | C | ≤ k . A ← B ← ∅ for i ← to k − do a i ← arg max x ∈ [ n ] \ ( A i ∪ B i ) f x ( A i ) A i +1 ← A i + a i b i ← arg max x ∈ [ n ] \ ( A i +1 ∪ B i ) f x ( B i ) B i +1 ← B i + b i D ← E ← { a } for i ← to k − do d i ← arg max x ∈ [ n ] \ ( D i ∪ E i ) f x ( D i ) D i +1 ← D i + d i e i ← arg max x ∈ [ n ] \ ( D i +1 ∪ E i ) f x ( E i ) E i +1 ← E i + e i return C ← arg max { f ( A i ) , f ( B i ) , f ( D i ) , f ( E i ) : i ∈ [ k + 1] } InterlaceGreedy operates by interlacing two standard greedy procedures. This interlacing is accom-plished by maintaining two disjoint sets A and B , which are initially empty. For k iterations, theelement a (cid:54)∈ B with the highest marginal gain with respect to A is added to A , followed by an anal-ogous greedy selection for B ; that is, the element b (cid:54)∈ A with the highest marginal gain with respectto B is added to B . After the ﬁrst set of interlaced greedy procedures complete, a modiﬁed versionis repeated with sets D, E , which are initialized to the maximum-value singleton { a } . Finally, thealgorithm returns the set with the maximum f -value of any query the algorithm has made to f .If f is submodular, InterlaceGreedy has an approximation ratio of / and query complexity O ( kn ) ;the deterministic algorithm of Gupta et al. (2010) has the same time complexity to achieve ratio / .The full proof of Theorem 1 is provided in Appendix A. Theorem 1.

Let f : 2 [ n ] → R + be submodular, let k ∈ [ n ] , let O = arg max | S |≤ k f ( S ) , and let C = InterlaceGreedy ( f, k ) . Then f ( C ) ≥ f ( O ) / , and InterlaceGreedy makes O ( kn ) queries to f .Proof sketch. The argument of Fisher et al. (1978) shows that the greedy algorithm is a (1 / -approximation for monotone submodular maximization with respect to a matroid constraint. Thisargument also applies to non-monotone, submodular functions, but it shows only that f ( S ) ≥ f ( O ∪ S ) , where S is returned by the greedy algorithm. Since f is non-monotone, it is possi-ble for f ( O ∪ S ) < f ( S ) . The main idea of the InterlaceGreedy algorithm is to exploit the fact thatif S and T are disjoint, f ( O ∪ S ) + f ( O ∪ T ) ≥ f ( O ) + f ( O ∪ S ∪ T ) ≥ f ( O ) , (1)which is a consequence of the submodularity of f . Therefore, by interlacing two greedy procedures,two disjoint sets A , B are obtained, which can be shown to almost satisfy f ( A ) ≥ f ( O ∪ A ) and f ( B ) ≥ f ( O ∪ B ) , after which the result follows from (1). There is a technicality wherein theelement a must be handled separately, which requires the second round of interlacing to address.4 .2 The FastInterlaceGreedy Algorithm In this section, a faster interlaced greedy algorithm (FastInterlaceGreedy (

FIG ), Alg. 2) is formu-lated, which requires O ( n log k ) queries. As input, an instance ( f, k ) of MCC is taken, as well as aparameter δ > . Algorithm 2

FIG ( f, k, δ ) : The FastInterlaceGreedy Algorithm Input: f : 2 [ n ] → R + , k ∈ [ n ] Output: C ⊆ [ n ] , such that | C | ≤ k . A ← B ← ∅ M ← τ A ← τ B ← max x ∈ [ n ] f ( x ) i ← − , a − ← , b − ← while τ A ≥ εM/n or τ B ≥ εM/n do ( a i +1 , τ A ) ← ADD ( A, B, a i , τ A ) ( b i +1 , τ B ) ← ADD ( B, A, b i , τ B ) i ← i + 1 D ← E ← { a } , τ D ← τ E ← M i ← , d ← , e ← while τ D ≥ εM/n or τ E ≥ εM/n do ( d i +1 , τ D ) ← ADD ( D, E, d i , τ D ) ( e i +1 , τ E ) ← ADD ( E, D, e i , τ E ) i ← i + 1 return C ← arg max { f ( A ) , f ( B ) , f ( D ) , f ( E ) } Algorithm 3

ADD ( S, T, j, τ ) : The ADD subroutine Input:

Two sets

S, T ⊆ [ n ] , element j ∈ [ n ] , τ ∈ R + Output: ( i, τ ) , such that i ∈ [ n ] , τ ∈ R + if | S | = k then return (0 , (1 − δ ) τ ) while τ ≥ εM/n do for ( x ← j ; x < n ; x ← x + 1) do if x (cid:54)∈ T then if f x ( S ) ≥ τ then S ← S ∪ { x } return ( x, τ ) τ ← (1 − δ ) τ j ← return (0 , τ ) The algorithm

FIG works as follows. As in InterlaceGreedy, there is a repeated interlacing of twogreedy procedures. However, to ensure a faster query complexity, these greedy procedures arethresholded: a separate threshold τ is maintained for each of the greedy procedures. The interlacingis accomplished by alternating calls to the ADD subroutine (Alg. 3), which adds a single elementand is described below. When all of the thresholds fall below the value δM/k , the maximum ofthe greedy solutions is returned; here, δ > is the input parameter, M is the maximum value of asingleton, and k ≤ n is the cardinality constraint.The ADD subroutine is responsible for adding a single element above the input threshold and de-creasing the threshold. It takes as input four parameters: two sets

S, T , element j , and threshold τ ;furthermore, ADD is given access to the oracle f , the budget k , and the parameter δ of FIG . As anoverview,

ADD adds the ﬁrst element x ≥ j , such that x (cid:54)∈ T and such that the marginal gain f x ( S ) is at least τ . If no such element x ≥ j exists, the threshold is decreased by a factor of (1 − δ ) andthe process is repeated (with j set to ). When such an element x is found, the element x is addedto S , and the new threshold value and position x are returned. Finally, ADD ensures that the size of S does not exceed k . The ﬁrst element x > j in the natural ordering on [ n ] = { , . . . , n − } . FIG is proven.

Theorem 2.

Let f : 2 [ n ] → R + be submodular, let k ∈ [ n ] , and let ε > . Let O =arg max | S |≤ k f ( S ) . Choose δ such that (1 − δ ) / > / − ε , and let C = FIG ( f, k, δ ) . Then f ( C ) ≥ (1 − δ ) f ( O ) / ≥ (1 / − ε ) f ( O ) . Proof.

Let

A, B, C, D, E, M have their values at termination of

FIG ( f, k, δ ) . Let A = { a , . . . , a | A |− } be ordered by addition of elements by FIG into A . The proof requires the fol-lowing four inequalities: f ( O ∪ A ) ≤ (2 + 2 δ ) f ( A ) + δM, (2) f (( O \ { a } ) ∪ B ) ≤ (2 + 2 δ ) f ( B ) + δM, (3) f ( O ∪ D ) ≤ (2 + 2 δ ) f ( D ) + δM, (4) f ( O ∪ E ) ≤ (2 + 2 δ ) f ( E ) + δM. (5)Once these inequalities have been established, Inequalities 2, 3, submodularity of f , and A ∩ B = ∅ imply f ( O \ { a } ) ≤ δ )( f ( A ) + f ( B )) + 2 δM. (6)Similarly, from Inequalities 4, 5, submodularity of f , and D ∩ E = { a } , it holds that f ( O ∪ { a } ) ≤ δ )( f ( D ) + f ( E )) + 2 δM. (7)Hence, from the fact that either a ∈ O or a (cid:54)∈ O and the deﬁnition of C , it holds that f ( O ) ≤ δ ) f ( C ) + 2 δM. Since f ( C ) ≤ f ( O ) and M ≤ f ( O ) , the theorem is proved.The proofs of Inequalities 2–5 are similar. The proof of Inequality 3 is given here, while the proofsof the others are provided in Appendix B. Proof of Inequality 3.

Let A = { a , . . . , a | A |− } be ordered as speciﬁed by FIG . Likewise, let B = { b , . . . , b | B |− } be ordered as speciﬁed by FIG . Lemma 1. O \ ( B ∪ { a } ) = { o , . . . , o l − } can be ordered such that f o i ( B i ) ≤ (1 + 2 δ ) f b i ( B i ) , (8) for any i ∈ [ | B | ] .Proof. For each i ∈ [ | B | ] , deﬁne τ B i to be the value of τ when b i was added into B by the ADD subroutine. Order o ∈ ( O \ ( B ∪ { a } )) ∩ A = { o , . . . , o (cid:96) − } by the order in which these elementswere added into A . Order the remaining elements of O \ ( B ∪ { a } ) arbitrarily. Then, when b i w;aschosen by ADD , it holds that o i (cid:54)∈ A i +1 , since A = { a } and a (cid:54)∈ O \ ( B ∪ { a } ) . Also, it holdsthat o i (cid:54)∈ B i since B i ⊆ B ; hence o i was not added into some (possibly non-proper) subset B (cid:48) i of B i at the previous threshold value τ Bi (1 − δ ) . By submodularity, f o i ( B i ) ≤ f o i ( B (cid:48) i ) < τ Bi (1 − δ ) . Since f b i ( B i ) ≥ τ B i and δ < / , inequality (8) follows.6rder ˆ O = O \ ( B ∪ { a } ) = { o , . . . , o l − } as deﬁned in the proof of Lemma 1, and let ˆ O i = { o , . . . , o i − } , if i ≥ , and let ˆ O = ∅ . Then f ( ˆ O ∪ B ) − f ( B ) = l − (cid:88) i =0 f o i ( ˆ O i ∪ B )= | B |− (cid:88) i =0 f o i ( ˆ O i ∪ B ) + l − (cid:88) i = | B | f o i ( ˆ O i ∪ B ) ≤ | B |− (cid:88) i =0 f o i ( B i ) + l − (cid:88) i = | B | f o i ( B ) ≤ | B |− (cid:88) i =0 (1 + 2 δ ) f b i ( B i ) + l − (cid:88) i = | B | f o i ( B ) ≤ (1 + 2 δ ) f ( B ) + δM, where any empty sum is deﬁned to be 0; the ﬁrst inequality follows by submodularity, the secondfollows from Lemma 1, and the third follows from the deﬁnition of B , and the facts that, for any i such that | B | ≤ i < l , max x ∈ [ n ] \ A | B | +1 f x ( B ) < εM/n , l − | B | ≤ k , and o i (cid:54)∈ A | B | +1 . Theorem 3.

Let f : 2 [ n ] → R + be submodular, let k ∈ [ n ] , and let δ > . Then the number ofqueries to f by FIG ( f, k, δ ) is at most O (cid:0) nδ log kδ (cid:1) .Proof. Recall [ n ] = { , , . . . , n − } . Let S ∈ { A, B, D, E } , and S = { s , . . . , s | S |− } in theorder in which elements were added to S . When ADD is called by

FIG to add an element s i ∈ [ n ] to S , if the value of τ is the same as the value when s i − was added to S , then s i > s i − . Finally,once ADD queries the marginal gain of adding ( n − , the threshold is revised downward by a factorof (1 − δ ) .Therefore, there are at most O ( n ) queries of f at each distinct value of τ A , τ B , τ D , τ E . Since atmost O ( δ log kδ ) values are assumed by each of these thresholds, the theorem follows. In this section, examples are provided showing that InterlaceGreedy or FastInterlaceGreedy mayachieve performance ratio at most / ε on speciﬁc instances, for each ε > . These examplesshow that the analysis in the preceding sections is tight.Let ε > and choose k such that /k < ε . Let O and D be disjoint sets each of k distinct elements;and let U = O ˙ ∪{ a, b } ˙ ∪ D . A submodular function f will be deﬁned on subsets of U as follows.Let C ⊆ U . • If both a ∈ C and b ∈ C , then f ( C ) = 0 . • If a ∈ C xor b ∈ C , then f ( C ) = | C ∩ O | k + k . • If a (cid:54)∈ C and b (cid:54)∈ C , then f ( C ) = | C ∩ O | k .The following proposition is proved in Appendix D. Proposition 1.

The function f is submodular. Next, observe that for any o ∈ O , f a ( ∅ ) = f b ( ∅ ) = f o ( ∅ ) = 1 /k . Hence InterlaceGreedy orFastInterlaceGreedy may choose a = a and b = b ; after this choice, the only way to increase f is by choosing elements of O . Hence a i , b i will be chosen in O until elements of O are exhausted,which results in k/ elements of O added to each of A and B . Thereafter, elements of D will bechosen, which do not affect the function value. This yields f ( A ) = f ( B ) ≤ /k + 1 / . D = E = { a } , and a similar situation arises, in which k/ elements of O are added to D, E , yielding f ( D ) = f ( E ) = f ( A ) . Hence InterlaceGreedy or FastInterlaceGreedy may return A , while f ( O ) = 1 . So f ( A ) f ( O ) ≤ /k + 1 / ≤ / ε . In this section, performance of FastInterlaceGreedy (

FIG ) is compared with that of state-of-the-artalgorithms on two applications of submodular maximization: cardinality-constrained maximum cutand network monitoring.

The following algorithms are compared. Source codefor the evaluated implementations of all algorithms is available at https://gitlab.com/kuhnle/non-monotone-max-cardinality . • FastInterlaceGreedy (Alg. 2) : FIG is implemented as speciﬁed in the pseudocode, withthe following addition: a stealing procedure is employed at the end, which uses submod-ularity to quickly steal elements from A, B, D, E into C in O ( k ) queries. This does notimpact the performance guarantee, as the value of C can only increase. The parameter δ isset to . , yielding approximation ratio of . . • Gupta et al. (2010) : The algorithm of Gupta et al. (2010) for cardinality constraint; asthe subroutine for the unconstrained maximization subproblems, the deterministic, linear-time / -approximation algorithm of Buchbinder et al. (2012) is employed. This yields anoverall approximation ratio of / for the implementation used herein. This algorithm isthe fastest determistic approximation algorithm in prior literature. • FastRandomGreedy (FRG ): The O (cid:0) nε ln ε (cid:1) randomized algorithm of Buchbinder et al.(2015) (Alg. 4 of that paper), with expected ratio /e − ε ; the parameter ε was set to0.3, yielding expected ratio of ≈ . as evaluated herein. This algorithm is the fastestrandomized approximation algorithm in prior literature. • BLITS : The O (cid:0) log n (cid:1) -adaptive algorithm recently introduced in Balkanski et al. (2018);the algorithm is employed as a heuristic without performance ratio, with the same parameterchoices as in Balkanski et al. (2018). In particular, ε = 0 . and 30 samples are used toapproximate the expections. Also, a bound on OPT is guessed in logarithmically manyiterations as described in Balkanski et al. (2018) and references therein.Results for randomized algorithms are the mean of 10 trials, and the standard deviation is representedin plots by a shaded region. Applications

Many applications with non-monotone, submodular objective functions exist. In thissection, two applications are chosen to demonstrate the performance of the evaluated algorithms. • Cardinality-Constrained Maximum Cut: The archetype of a submodular, non-monotonefunction is the maximum cut objective: given graph G = ( V, E ) , S ⊆ V , f ( S ) is deﬁnedto be the number of edges crossing from S to V \ S . The cardinality constrained version ofthis problem is considered in the evaluation. • Social Network Monitoring: Given an online social network, suppose it is desired to choose k users to monitor, such that the maximum amount of content is propagated through theseusers. Suppose the amount of content propagated between two users u, v is encoded asweight w ( u, v ) . Then f ( S ) = (cid:80) u ∈ S,v (cid:54)∈ S w ( u, v ) . In this section, results are presented for the algorithms on the two applications. In overview: interms of objective value,

FIG and Gupta et al. (2010) were about the same and outperformed BLITS Details of the stealing procedure are given in Appendix C.

FIG was the fastest algorithm by the metric of queries to the objective andwas faster than Gupta et al. (2010) by at least an order of magnitude.

200 400k2.55.07.510.012.5 V a l u e x FIGBlitsFRGGupta et al. (a) ER, Cut Value

200 400k10 N u m b e r o f Q u e r i e s FIGBlitsFRGGupta et al. (b) ER, Function Queries V a l u e x FIGBlitsFRGGupta et al. (c) BA, Cut Value N u m b e r o f Q u e r i e s FIGBlitsFRGGupta et al. (d) BA, Function Queries

250 500 750 1000k1234 V a l u e x FIGBlitsFRGGupta et al. (e) Total content monitored versusbudget k

250 500 750 1000k10 N u m b e r o f Q u e r i e s FIGBlitsFRGGupta et al. (f) Number of Queries versus bud-get k Figure 1: (a)–(d) : Objective value and runtime for cardinality-constrained maxcut on random graphs. (e)–(f) : Objective value and runtime for cardinality-constrained maxcut on ca-AstroPh with simu-lated amounts of content between users. In all plots, the x -axis shows the budget k . Cardinality Constrained MaxCut

For these experiments, two random graph models were em-ployed: an Erd˝os-Rényi (ER) random graph with , nodes and edge probability p = 1 / , and aBarabási–Albert (BA) graph with n = 10 , and m = m = 100 .On the ER graph, results are shown in Figs. 1(a) and 1(b); the results on the BA graph are shownin Figs. 1(c) and 1(d). In terms of cut value, the algorithm of Gupta et al. (2010) performed thebest, although the value produced by FIG was nearly the same. On the ER graph, the next best wasFRG followed by BLITS; whereas on the BA graph, BLITS outperformed FRG in cut value. Interms of efﬁciency of queries,

FIG used the smallest number on every evaluated instance, althoughthe number did increase logarithmically with budget. The number of queries used by FRG washigher, but after a certain budget remained constant. The next most efﬁcient was Gupta et al. (2010)followed by BLITS.

Social Network Monitoring

For the social network monitoring application, the citation networkca-AstroPh from the SNAP dataset collection was used, with n = 18 , users and , edges.Edge weights, which represent the amount of content shared between users, were generated uni-formly randomly in [1 , . The results were similar qualitatively to those for the unweighted Max-Cut problem presented previously. FIG is the most efﬁcient in terms of number of queries, and

FIG is only outperformed in solution quality by Gupta et al. (2010), which required more than an orderof magnitude more queries.

Effect of Stealing Procedure

In Fig. 2 above, the effect of removing the stealing procedure isshown on the random graph instances. Let C F IG be the solution returned by FIG, and C F IG ∗ bethe solution returned by FIG with the stealing procedure removed. Fig. 2(a) shows that on theER instance, the stealing procedure adds at most . to the solution value; however, on the BAinstance, Fig. 2(b) shows that the stealing procedure contributes up to increase in solutionvalue, although this effect degrades with larger k . This behavior may be explained by the interlacedgreedy process being forced to leave good elements out of its solution, which are then recoveredduring the stealing procedure. 9

00 200 300 400k1.0051.0101.015 f ( C F I G ) / f ( C F I G * ) (a) ER instance, n = 1000

500 1000 1500 2000k1.21.31.4 f ( C F I G ) / f ( C F I G * ) (b) BA instance, n = 10000 Figure 2: Effect of stealing procedure on solution quality of

FIG .10

Acknowledgements

The work of A. Kuhnle was partially supported by Florida State University and the InformaticsInstitute of the University of Florida. Victoria G. Crawford and the anonymous reviewers providedhelpful feedback which improved the paper.

References

Ashwinkumar Badanidiyuru and Jan Vondrák. Fast algorithms for maximizing submodular func-tions.

ACM-SIAM Symposium on Discrete Algorithms (SODA) , 2014.Eric Balkanski, Adam Breuer, and Yaron Singer. Non-monotone Submodular Maximization in Ex-ponentially Fewer Iterations. In

Advances in Neural Information Processing Systems (NeurIPS) ,2018.Niv Buchbinder and Moran Feldman. Constrained Submodular Maximization via a Non-symmetricTechnique. In arXiv preprint arXiv:1611.03253v1 , 2016.Niv Buchbinder and Moran Feldman. Deterministic Algorithms for Submodular Maximization.

ACM Transactions on Algorithms , 14(3), 2018a.Niv Buchbinder and Moran Feldman. Submodular Functions Maximization Problems – A Survey. InTeoﬁlo F. Gonzalez, editor,

Handbook of Approximation Algorithms and Metaheuristics . Secondedition, 2018b.Niv Buchbinder, Moran Feldman, Joseph Sefﬁ Naor, and Roy Schwartz. A Tight Linear Time (1 /2)-Approximation for Unconstrained Submodular Maximization. In

Symposium on Foundationsof Computer Science (FOCS) , 2012.Niv Buchbinder, Moran Feldman, Joseph (Sefﬁ) Naor, and Roy Schwartz. Submodular Maximiza-tion with Cardinality Constraints.

ACM-SIAM Symposium on Discrete Algorithms (SODA) , pages1433–1452, 2014.Niv Buchbinder, Moran Feldman, and Roy Schwartz. Comparing Apples and Oranges: Query Trade-off in Submodular Maximization. In

ACM-SIAM Symposium on Discrete Algorithms (SODA) ,2015.Chandra Chekuri, Shalmoli Gupta, and Kent Quanrud. Streaming Algorithms for Submodular Func-tion Maximization. In

International Colloquium on Automata, Languages, and Programming(ICALP) , 2015.Alina Ene and Huy L. Nguyen. Constrained Submodular Maximization: Beyond 1/e. In

Symposiumon Foundations of Computer Science (FOCS) , 2016.Alina Ene and Huy L. Nguyen. Parallel Algorithm for Non-Monotone DR-Submodular Maximiza-tion. In arXiv preprint arXiv 1905:13272 , 2019.Matthew Fahrbach, Vahab Mirrokni, and Morteza Zadimoghaddam. Non-monotone SubmodularMaximization with Nearly Optimal Adaptivity Complexity. In

International Conference on Ma-chine Learning (ICML) , 2019.Uriel Feige, Vahab S. Mirrokni, and Jan Vondrák. Maximizing Non-Monotone Submodular Func-tions.

SIAM Journal on Computing , 40(4):1133–1153, 2011.Moran Feldman, Christopher Harshaw, and Amin Karbasi. Greed is Good: Near-Optimal Submod-ular Maximization via Greedy Optimization. In

Conference on Learning Theory (COLT) , 2017.Moran Feldman, Amin Karbasi, and Ehsan Kazemi. Do less, get more: Streaming submodular max-imization with subsampling. In

Advances in Neural Information Processing Systems (NeurIPS) ,2018.M.L. Fisher, G.L. Nemhauser, and L.A. Wolsey. An analysis of approximations for maximizingsubmodular set functions-II.

Mathematical Programming , 8:73–87, 1978.11ennifer Gillenwater, Alex Kulesza, and Ben Taskar. Near-Optimal MAP Inference for Determinan-tal Point Processes. In

Advances in Neural Information Processing Systems (NeurIPS) , 2012.Anupam Gupta, Aaron Roth, Grant Schoenebeck, and Kunal Talwar. Constrained non-monotonesubmodular maximization: Ofﬂine and secretary algorithms. In

International Workshop on Inter-net and Network Economics (WINE) , 2010.Ehsan Kazemi, Marko Mitrovic, Morteza Zadimoghaddam, Silvio Lattanzi, and Amin Karbasi. Sub-modular Streaming in All its Glory: Tight Approximation, Minimum Memory and Low AdaptiveComplexity. In

International Conference on Machine Learning (ICML) , 2019.David Kempe, Jon Kleinberg, and Éva Tardos. Maximizing the spread of inﬂuence through a socialnetwork. In

ACM SIGKDD International Conference on Knowledge Discovery and Data Mining(KDD) , 2003.Jon Lee, Vahab Mirrokni, Viswanath Nagarajan, and Maxim Sviridenko. Maximizing NonmonotoneSubmodular Functions under Matroid or Knapsack Constraints.

Siam Journal of Discrete Math ,23(4):2053–2078, 2010.Jure Leskovec, Andreas Krause, Carlos Guestrin, Christos Faloutsos, Jeanne VanBriesen, and Na-talie Glance. Cost-effective Outbreak Detection in Networks. In

ACM SIGKDD InternationalConference on Knowledge Discovery and Data Mining (KDD) , 2007.Baharan Mirzasoleiman, Ashwinkumar Badanidiyuru, Amin Karbasi, Jan Vondrak, and AndreasKrause. Lazier Than Lazy Greedy. In

AAAI Conference on Artiﬁcial Intelligence (AAAI) , 2015.Baharan Mirzasoleiman, Ashwinkumar Badanidiyuru, and Amin Karbasi. Fast Constrained Sub-modular Maximization : Personalized Data Summarization. In

International Conference on Ma-chine Learning (ICML) , 2016.Baharan Mirzasoleiman, Stefanie Jegelka, and Andreas Krause. Streaming Non-Monotone Sub-modular Maximization: Personalized Video Summarization on the Fly. In

AAAI Conference onArtiﬁcial Intelligence , 2018.Joseph Sefﬁ Naor and Roy Schwartz. A Uniﬁed Continuous Greedy Algorithm for SubmodularMaximization. In

Symposium on Foundations of Computer Science (FOCS) , 2011.G L Nemhauser and L A Wolsey. Best Algorithms for Approximating the Maximum of a Submod-ular Set Function.

Mathematics of Operations Research , 3(3):177–188, 1978.G. L. Nemhauser, L. A. Wolsey, and M. L. Fisher. An analysis of approximations for maximizingsubmodular set functions-I.

Mathematical Programming , 14(1):265–294, 1978.Jan Vondrák. Symmetry and Approximability of Submodular Maximization Problems.

SIAM Jour-nal on Computing , 42(1):265–304, 2013. 12

Proof of Theorem 1

Proof of Theorem 1.

Lemma 2. f ( C ) ≥ f ( O \ { a } ) . Proof.

Let A = arg max i ∈ [ k +1] f ( A i ) . Let ˆ O = O \ A k = { o , . . . , o l − } be ordered such that foreach i ∈ [ l ] , o i (cid:54)∈ B i ; this ordering is possible since B = ∅ and l ≤ k . Also, for each i ∈ [ l ] , let ˆ O i = { o , . . . , o i − } , and let ˆ O = ∅ . Then f ( O ∪ A k ) − f ( A k ) = l − (cid:88) i =0 f o i ( ˆ O i ∪ A k ) ≤ l − (cid:88) i =0 f o i ( A i ) ≤ l − (cid:88) i =0 f a i ( A i ) = f ( A l ) , where the ﬁrst inequality follows from submodularity, the second inequality follows from the greedychoice a i = arg max x ∈ [ n ] \ ( A i ∪ B i ) f x ( A i ) and the fact that o i (cid:54)∈ B i . Hence f ( O ∪ A k ) ≤ f ( A l ) + f ( A k ) ≤ f ( A ) . (9)Let B = arg max i ∈ [ k +1] f ( B i ) . Let ˆ O = O \ ( { a } ∪ B k ) = { o , . . . , o l − } be ordered such thatfor each i ∈ [ l ] , o i (cid:54)∈ A i +1 ; this ordering is possible since A = { a } , a (cid:54)∈ ˆ O , and l ≤ k . Also, foreach i ∈ [ l ] , let ˆ O i = { o , . . . , o i − } , and let ˆ O = ∅ . Then f (( O \ { a } ) ∪ B k ) − f ( B k ) = l − (cid:88) i =0 f o i ( ˆ O i ∪ B k ) ≤ l − (cid:88) i =0 f o i ( B i ) ≤ l − (cid:88) i =0 f b i ( B i ) = f ( B l ) , where the ﬁrst inequality follows from submodularity, the second inequality follows from the greedychoice b i = arg max x ∈ [ n ] \ ( A i +1 ∪ B i ) f x ( B i ) and the fact that o i (cid:54)∈ A i +1 . Hence f (( O \ { a } ) ∪ B k ) ≤ f ( B l ) + f ( B k ) ≤ f ( B ) . (10)By inequalities (9), (10), the fact that A k ∩ B k = ∅ , and submodularity, it holds that f ( O \ { a } ) ≤ f ( O ∪ A k ) + f (( O \ { a } ∪ B k ) ≤ f ( A ) + f ( B )) ≤ f ( C ) . Lemma 3. f ( C ) ≥ f ( O ∪ { a } ) . Proof.

Let D = arg max i ∈ [ k +1] f ( A i ) . Let ˆ O = O \ D k = { o , . . . , o l − } be ordered such that foreach i ∈ [ l ] , o i (cid:54)∈ E i ; this ordering is possible since E = ∅ and l ≤ k . Also, for each i ∈ [ l ] , let13 O i = { o , . . . , o i − } , and let ˆ O = ∅ . Then f ( O ∪ D k ) − f ( D k ) = l − (cid:88) i =0 f o i ( ˆ O i ∪ D k ) ≤ l − (cid:88) i =0 f o i ( D i ) ≤ l − (cid:88) i =0 f d i ( D i ) = f ( D l ) , where the ﬁrst inequality follows from submodularity, the second inequality follows from the greedychoice d i = arg max x ∈ [ n ] \ ( D i ∪ E i ) f x ( D i ) and the fact that o i (cid:54)∈ E i . Hence f ( O ∪ D k ) ≤ f ( D l ) + f ( D k ) ≤ f ( D ) . (11)Let E = arg max i ∈ [ k +1] f ( E i ) . Let ˆ O = O \ E k = { o , . . . , o l − } be ordered such that for each i ∈ [ l ] , o i (cid:54)∈ D i +1 ; this ordering is possible since D = { a } , a (cid:54)∈ ˆ O (since a ∈ E k ), and l ≤ k .Also, for each i ∈ [ l ] , let ˆ O i = { o , . . . , o i − } , and let ˆ O = ∅ . Then f ( O ∪ E k ) − f ( E k ) = l − (cid:88) i =0 f o i ( ˆ O i ∪ E k ) ≤ l − (cid:88) i =0 f o i ( E i ) ≤ l − (cid:88) i =0 f e i ( E i ) = f ( E l ) , where the ﬁrst inequality follows from submodularity, the second inequality follows from the greedychoices e = arg max x ∈ [ n ] f ( x ) , and if i > , e i = arg max x ∈ [ n ] \ ( D i +1 ∪ E i ) f x ( E i ) and the factthat o i (cid:54)∈ D i +1 . Hence f (( O ∪ E k ) ≤ f ( E l ) + f ( E k ) ≤ f ( E ) . (12)By inequalities (11), (12), the fact that D k ∩ E k = { a } , and submodularity, it holds that f ( O ∪ { a } ) ≤ f ( O ∪ D k ) + f (( O ∪ E k ) ≤ f ( D ) + f ( E )) ≤ f ( C ) . The proof of the theorem follows from Lemmas 2, 3, and the fact that one of the statements a ∈ O or a (cid:54)∈ O must hold; hence, either O ∪ { a } = O or O \ { a } = O . B Proofs for Theorem 2

Proof of Inequality 2.

Let A = { a , . . . , a | A |− } be ordered as speciﬁed by FIG . Likewise, let B = { b , . . . , b | B |− } be ordered as speciﬁed by FIG . Lemma 4. O \ A = { o , . . . , o l − } can be ordered such that f o i ( A i ) ≤ (1 + 2 δ ) f a i ( A i ) , (13) if i ∈ [ | A | ] .Proof. Order o ∈ ( O \ A ) ∩ B = { o , . . . , o (cid:96) − } by the order in which these elements were addedinto B . Order the remaining elements of O \ A arbitrarily. Then, when a i was chosen by ADD , itholds that o i (cid:54)∈ B i . Also, it is true o i (cid:54)∈ A i ; hence o i was not added into some (possibly non-proper)subset A (cid:48) i of A i at the previous threshold value τ Ai (1 − δ ) . Hence f o i ( A i ) ≤ f o i ( A (cid:48) i ) < τ Ai (1 − δ ) , since o i (cid:54)∈ B i . Since f a i ( A i ) ≥ τ A i and δ < / , inequality (13) follows.14rder ˆ O = O \ A = { o , . . . , o l − } as indicated in the proof of Lemma 4, and let ˆ O i = { o , . . . , o i − } , if i ≥ , ˆ O = ∅ . Then f ( O ∪ A ) − f ( A ) = l − (cid:88) i =0 f o i ( ˆ O i ∪ A )= | A |− (cid:88) i =0 f o i ( ˆ O i ∪ A ) + l − (cid:88) i = | A | f o i ( ˆ O i ∪ A ) ≤ | A |− (cid:88) i =0 f o i ( A i ) + l − (cid:88) i = | A | f o i ( A ) ≤ | A |− (cid:88) i =0 (1 + 2 δ ) f a i ( A i ) + l − (cid:88) i = | A | f o i ( A ) ≤ (1 + 2 δ ) f ( A ) + δM, where any empty sum is deﬁned to be 0; the ﬁrst inequality follows by submodularity, the sec-ond follows from Lemma 4, and the third follows from the deﬁnition of A , and the facts that max x ∈ [ n ] \ B | A | f x ( A ) < εM/n and l − | A | ≤ k . Proof of Inequality 4.

As in the proof of Inequality 2, it sufﬁces to establish the following lemma.

Lemma 5. O \ D = { o , . . . , o l − } can be ordered such that f o i ( D i ) ≤ (1 + 2 δ ) f d i ( D i ) , (14) for i ∈ [ | D | ] .Proof. Order o ∈ ( O \ D ) ∩ E = { o , . . . , o (cid:96) − } by the order in which these elements were addedinto E . Order the remaining elements of O \ D arbitrarily. Then, when d i was chosen by ADD , itholds that o i (cid:54)∈ E i . Also, it is true o i (cid:54)∈ D i ; hence o i was not added into some (possibly non-proper)subset D (cid:48) i of D i at the previous threshold value τ Di (1 − δ ) . Hence f o i ( D i ) ≤ f o i ( D (cid:48) i ) < τ Di (1 − δ ) , since o i (cid:54)∈ E i . Since f d i ( D i ) ≥ τ D i and δ < / , inequality (14) follows. Proof of Inequality 5.

As in the proof of Inequality 2, it sufﬁces to establish the following lemma.

Lemma 6. O \ E = { o , . . . , o l − } can be ordered such that f o i ( E i ) ≤ (1 + 2 δ ) f e i ( E i ) , (15) for i ∈ [ | E | ] .Proof. Order o ∈ ( O \ E ) ∩ D = { o , . . . , o (cid:96) − } by the order in which these elements were addedinto D . Order the remaining elements of O \ E arbitrarily. Then, when e i was chosen by ADD , itholds that o i (cid:54)∈ D i +1 , since D = { a } and a = d (cid:54)∈ O \ E . Also, it is true o i (cid:54)∈ E i ; hence o i wasnot added into some (possibly non-proper) subset E (cid:48) i of E i at the previous threshold value τ Ei (1 − δ ) .Hence f o i ( E i ) ≤ f o i ( E (cid:48) i ) < τ Ei (1 − δ ) , since o i (cid:54)∈ D i +1 . Since f e i ( E i ) ≥ τ E i and δ < / , inequality(15) follows. 15 Stealing Procedure for FastInterlaceGreedy

In this section, an O ( k ) procedure is described, which may improve the quality of the solution foundby FastInterlaceGreedy (a similar procedure could also be employed for InterlaceGreedy).Let A, B, C, D, E have their values at the termination of FastInterlaceGreedy. Then calculate thesets G = { B c = f ( C ) − f ( C \ { c } ) : c ∈ C } and H = { A x = f ( C ∪ { x } ) − f ( C ) : x ∈ A ∪ B ∪ D ∪ E } . Then sort G = ( B c , . . . , B c k ) in non-decreasing order and sort H = ( A x , . . . , A x l ) in non-increasing order. Computing and sorting these sets requires O ( k log k ) time (and only O ( k ) queries to f ).Finally, iterate through the elements of G in the sorted order, and if B c i < A x i then C is assigned C \ { c i } ∪ { x i } if this assignment increases the value f ( C ) . D Proof for Tight Examples

Proof of Prop. 1.

Submodularity will be veriﬁed by checking the inequality f ( S ) + f ( T ) ≥ f ( S ∪ T ) + f ( S ∩ T ) (16)for all S, T ⊆ U . • case a ∈ S ∩ T , b (cid:54)∈ T ∪ S . Then Ineq. (16) becomes | S ∩ O | k + | T ∩ O | k + 2 k ≥ | S ∩ T ∩ O | k + | ( S ∪ T ) ∩ O | k + 2 k , which holds. • case a ∈ S \ T , b ∈ T \ S . Then Ineq. (16) becomes | S ∩ O | k + | T ∩ O | k + 2 k ≥ | S ∩ T ∩ O | k , which holds. • case a ∈ S \ T , b ∈ S \ T . Then Ineq. (16) becomes | T ∩ O | k ≥ | S ∩ T ∩ O | k , which holds. • case a ∈ S \ T , b ∈ S ∩ T . Then Ineq. (16) becomes | T ∩ O | k + 1 k ≥ | S ∩ T ∩ O | k + 1 k , which holds. • case a ∈ S ∩ T , b ∈ S ∩ T . Then Ineq. (16) becomes ≥ , which holds • case a (cid:54)∈ S ∪ T , b (cid:54)∈ S ∪ T . Then Ineq. (16) becomes | S ∩ O | + | T ∩ O | ≥ | ( S ∪ T ) ∩ O | + | ( S ∩ T ) ∩ O | , which holds. • case a ∈ S \ T , b (cid:54)∈ S ∪ T . Then Ineq. (16) becomes | S ∩ O | k + 1 k + | T ∩ O | k ≥ | ( S ∪ T ) ∩ O | k + 1 k + | ( S ∩ T ) ∩ O | k ,k ,