Adaptive Sampling for Fast Constrained Maximization of Submodular Function
Francesco Quinzan, Vanja Dosko?, Andreas Göbel, Tobias Friedrich
AAdaptive Sampling for Fast Constrained Maximizationof Submodular Functions
Francesco Quinzan Vanja Doskoč Andreas Göbel Tobias Friedrich
Hasso Plattner InstitutePotsdam, Germany Hasso Plattner InstitutePotsdam, Germany Hasso Plattner InstitutePotsdam, Germany Hasso Plattner InstitutePotsdam, Germany
Abstract
Several large-scale machine learning tasks,such as data summarization, can be ap-proached by maximizing functions that sat-isfy submodularity. These optimization prob-lems often involve complex side constraints,imposed by the underlying application. Inthis paper, we develop an algorithm withpoly-logarithmic adaptivity for non-monotonesubmodular maximization under general sideconstraints. The adaptive complexity of aproblem is the minimal number of sequentialrounds required to achieve the objective.Our algorithm is suitable to maximize a non-monotone submodular function under a p -system side constraint, and it achieves a ( p + O (cid:0) √ p (cid:1) ) -approximation for this problem,after only poly-logarithmic adaptive roundsand polynomial queries to the valuation or-acle function. Furthermore, our algorithmachieves a ( p + O (1)) -approximation whenthe given side constraint is a p -extendible sys-tem.This algorithm yields an exponential speed-up, with respect to the adaptivity, over anyother known constant-factor approximationalgorithm for this problem. It also competeswith previous known results in terms of thequery complexity. We perform various experi-ments on various real-world applications. Wefind that, in comparison with commonly usedheuristics, our algorithm performs better onthese instances. Several machine learning optimization problems consistof maximizing submodular functions. Examples include subset selection [Das and Kempe, 2018], data summa-rization [Lin and Bilmes, 2010, Mirzasoleiman et al.,2016], and Bayesian experimental design [Chaloner andVerdinelli, 1995, Krause et al., 2008]. These problemsoften involve constraints imposed by the underlyingapplication. For instance, in video summarization tasksseveral constraints on the solution space arise based onqualitative features and contextual information [Mirza-soleiman et al., 2016].The problem of maximizing a submodular function isNP-hard [Feige, 1998]. However, several approxima-tion algorithms for this problem have been discoveredover the years. For monotone submodular functions,the classical result of Nemhauser et al. [1978] showsthat a simple greedy algorithm provides a (1 − /e ) -approximation guarantee for the maximization of a monotone submodular function under a uniform con-straint. If an additional matroid constraint is imposedon the solution space, then greedy achieves a (1 / -approximation guarantee on this problem [Fisher et al.,1978]. A constant-factor approximation guaranteecan also be achieved in the case of a knapsack con-straint [Sviridenko, 2004].More complex constraints require more complex heuris-tics. Several algorithms have been discovered, to max-imize a monotone submodular function under gen-eral side constraints such as p -systems and multipleknapsacks [Badanidiyuru and Vondrák, 2014, Chekuriand Pál, 2005]. These algorithms include stream-ing algorithms [Badanidiyuru et al., 2014, Chekuriet al., 2015, Chakrabarti and Kale, 2015], central-ized algorithms [Badanidiyuru and Vondrák, 2014,Mirzasoleiman et al., 2015], and distributed algo-rithms [Mirzasoleiman et al., 2013, Kumar et al., 2015].Many algorithms have also been proposed, to maximize non-monotone submodular functions under a variety ofconstraints [Feldman et al., 2011, Chekuri et al., 2014,Gupta et al., 2010, Lee et al., 2009, Feige et al., 2011,Buchbinder et al., 2015]. These algorithms yield goodapproximation guarantees, but their run time is poly-nomial in the number of data-points, and polynomial a r X i v : . [ c s . D S ] F e b daptive Sampling for Fast Constrained Maximization of Submodular Functions Table 1: Results for non-monotone submodular maximization with a p -system side constraint. Here, n is theproblem size, r is the maximum size of a feasible solution, and p the the parameter for the side constraint. Theresults on the adaptivity for previously known algorithms follow from the adaptivity of the greedy algorithm.Note also that all bounds on the adaptivity and query complexity for p -systems are parameterized by p . Whetherit is possible to obtain bounds independent of p for this problem remains an open question. p -systemsAlgorithm Approx. Adaptivity Query Complexity rep-sampling [this work] ≈ p + O (cid:0) √ p (cid:1) O (cid:16) √ p log n log rp log r (cid:17) O (cid:16) √ pn log n log rp log r (cid:17) FastSGS [Feldman et al., 2020] ≈ p + O (cid:0) √ p (cid:1) O ( pn log n ) O ( pn log n ) . simultaneousGreedys [Feldman et al., 2020] p + O (cid:0) √ p (cid:1) O (cid:0) √ pr (cid:1) O ( prn ) repeatedGreedy [Feldman et al., 2017] p + O (cid:0) √ p (cid:1) O (cid:0) √ pr (cid:1) O (cid:0) √ pnr (cid:1) fantom [Mirzasoleiman et al., 2016] ≈ p O ( pr ) O ( pnr ) repeatedGreedy [Gupta et al., 2010] ≈ p O ( pr ) O ( pnr ) p -extendable systems rep-sampling [this work] ≈ p + O (1) O (cid:0) log n log r (cid:1) O (cid:0) n log n log r (cid:1) FastSGS [Feldman et al., 2020] ≈ p + O (1) O (cid:0) p n log n (cid:1) O (cid:0) p n log n (cid:1) . simultaneousGreedys [Feldman et al., 2020] p + O (1) O ( pr ) O (cid:0) p rn (cid:1) sampleGreedy [Feldman et al., 2017] p + O (1) O ( r ) O ( nr ) p -matchoids Parallel Greedy † [Chekuri and Quanrud, 2019] ≈ p +41+ o (1) O (log n log r ) O ( n log n log r ) † The Parallel Greedy algorithm requires access to the rank oracle for the underlying p -matchoid system. This oracle isstrictly less general then the independence oracle required by all other algorithms in Table 1. in the number of additional side constraints.Recently, algorithms were discovered to maximize anon-monotone submodular function under very generalside constraints [Mirzasoleiman et al., 2016, Feldmanet al., 2017]. These constant-factor approximation algo-rithms scale polynomially in the number of data-points,but also in the number of additional side constraints.In some cases, approximation algorithms do not ex-hibit increasingly worse run time in the number of con-straints. This is the case when maximizing a submodu-lar function under p -extendible systems or p -matchoid side constraints [Feldman et al., 2017, Chekuri andQuanrud, 2019]. These side constraints are strictlyless general than those studied in Mirzasoleiman et al.[2016], but they are general enough to capture a varietyof interesting applications. Submodular functions are learnable in the standard PAC and
PMAC models [Valiant, 1984, Balcan and Har-vey, 2011]: given a collection of sampled sets and theirsubmodular function values, it is possible to produce asurrogate that mimics the behavior of that function, onsamples drawn from the same distribution. However,submodular objectives cannot be optimized from thetraining data we use to learn them [Balkanski et al.,2017, Rosenfeld et al., 2018]. The reason is that, whenlearning from samples, resulting surrogate functionscan be inapproximable, and their global optima can befar away from the true optimum.Using an adaptive sampling framework [Thompson,1990], it is possible to design algorithms that reacha constant-factor approximation guarantee in poly-logarithmic adaptive rounds for submodular maximiza-tion, both in the monotone [Balkanski and Singer, rancesco Quinzan, Vanja Doskoč, Andreas Göbel, Tobias Friedrich
Our contribution.
Focusing on sampling tech-niques, we study the problem of maximizing a non-monotone submodular function, to which we have ora-cle access. Furthermore, we consider general p -systemand p -extendible system side constraints for this prob-lem.Our algorithm has access to the side constraint struc-ture via an oracle. Standard oracle models in theliterature are: the independence oracle , which takes asinput a set and returns whether that set is a feasiblesolution; the rank oracle , that returns the maximumcardinality of any feasible solution contained in a giveninput set; and the span oracle , which for an input set S and a point { e } it returns whether or not S ∪ { e } hasa higher rank than S . In this work, we assume accessto the independence oracle, which is the most generaloracle model of the three.In this work, we develop the first algorithm with poly-logarithmic adaptivity suitable to maximize a non-monotone submodular function under a p -system sideconstraint and a p -extendible system. In contrast to allprevious algorithms with low adaptivity, our algorithmonly requires access to the independence oracle forthe side constraints. This algorithm achieves strongapproximation guarantees and run time, competingwith known algorithms for this problem (see Table 1).We study the performance of our algorithm in two real-world applications, video summarization and Bayesianexperimental design. We test our algorithm againstother commonly used heuristics for this problem, andshow that our algorithm comes out on top.Our paper is organized as follows. We define the prob-lem in Section 2, and we describe our algorithm inSection 3. Our theoretical analysis is presented in Sec-tions 4-6. Applications and experiments are discussedin Sections 7-9. We conclude in Section 10. Submodularity.
In this paper, we study optimiza-tion problems that can be approached by maximizingan oracle function that, given a solution set, estimatesits quality. We assume that oracle functions are sub-modular . Definition 1 (Submodularity) . Given a finite set V ,we call a set function f : 2 V → R ≥ submodular if forall S, U ⊆ V we have that f ( S ) + f ( U ) ≥ f ( S ∪ U ) + f ( S ∩ U ) . Note that we only consider functions that do not attainnegative values. This is because submodular functionswith negative values cannot be maximized, even ap-proximately (see Feige et al. [2011]). p -Systems. We study the problem of maximizing asubmodular function under additional side constraints,defined as a p -system side constraint. As discussed,i.e., in Mirzasoleiman et al. [2016], Gupta et al. [2010],these constraints are significantly more general thanstandard matroid intersections, and they arise in var-ious domains, such as movie recommendation, videosummarization, and revenue maximization.Given a collection of feasible solutions I over a groundset V and a set T ⊆ V , we denote with I | T a collectionconsisting of all sets S ⊆ T that are feasible in I .Furthermore, a base for I is any maximum feasible set U ∈ I . We define p -systems as follows. Definition 2. A p -system I over a ground set V is acollection of subsets of V fulfilling the following threeaxioms: • ∅ ∈ I ; • for any two sets S ⊆ Ω ⊆ V , if Ω ∈ I then S ∈ I ; • for any set T ⊆ V and any bases S, U ∈ I | T itholds | S | ≤ p | U | . The second defining axiom is referred to as subset-closure or downward-closed property. With this nota-tion, we study the following problem.
Problem 1.
Given a submodular function f : 2 V → R ≥ and a p -system I , find a set S ⊆ V maximizing f ( S ) such that S ∈ I . p -extendible Systems. We also consider a family ofside constraints of intermediate generality, commonlyreferred to as p -extendible systems. These side con-straints are strictly less general than p -systems, butthey capture various types of constraints found in prac-tical applications.Our main motivation in studying these constraints isthat they admit algorithms that obtain strong approxi-mation guarantees, in much less time than in the caseof the p -systems. Hence, algorithms for p -extendiblesystems scale much better than for general p -systems.These p -extendible systems were first studied by Mestre[2006], and they are defined as follows. daptive Sampling for Fast Constrained Maximization of Submodular Functions Algorithm 1: rand-sequence ( X, S, I ) A ← ∅ ; while X (cid:54) = ∅ do sort the points { x i } i = X randomly; η ← max { j : S ∪ A ∪ { x i } i ≤ j ∈ I} ; A ← A ∪ { x , . . . , x η } ; X ← { e ∈ X \ ( S ∪ A ) : S ∪ A ∪ e ∈ I} ; end while return A ; Definition 3. A p -extendible system I over a groundset V is a p -system, that fulfills the following additionalaxiom: for every pair of sets S, Ω ∈ I with S ⊂ Ω , andfor every element e / ∈ S , there exists a set U ⊆ Ω \ S of size | U | ≤ p such that Ω \ U ∪ { e } ∈ I . These side constraints generalize matroid intersectionsand p -matchoids. While being strictly less generalthan p -systems, this definition captures many interest-ing constraints, such as the intersection of matroids[Mestre, 2006]. In this paper, we also study the follow-ing problem. Problem 2.
Given a submodular function f : 2 V → R ≥ and a p -extendible system I , find a set S ⊆ V maximizing f ( S ) such that S ∈ I . Adaptivity.
An algorithm is T -adaptive if everyquery f ( S ) for the f -value of a solution S occurs ata round i ∈ [ T ] such that S is independent of the val-ues f ( S (cid:48) ) of all other queries at round i , with at mostpolynomial queries at each round in the problem size.The query complexity is the number of calls to theevaluation oracle function. Notation.
For any submodular evaluation oraclefunction f : 2 V → R and sets S, U ⊆ V , we de-fine the marginal value of S with respect to U as f ( U | S ) = f ( S ∪ U ) − f ( S ) .Throughout the paper, we always use the notationintroduced in Problem 1: we denote with f the eval-uation oracle function, with V the ground set, andwith I the p -system side constraint. We denote with opt a solution to Problem 1, and we denote with n the size of the ground set V , i.e., n is the number ofsingletons in our solution space. We also denote with r the maximum size of a feasible solution.The notation introduced in Algorithm 1-3 is used con-sistently throughout the paper. Our method consists of three parts (see Algorithms 1-3). We call these algorithms rand-sequence , rand-sampling , and rep-sampling respectively. These Algorithm 2: rand-sampling ( f, V, I , λ, ε, ϕ ) . S ← ∅ ; X ← arg max e { f ( e ) : e ∈ V ∧ e ∈ I} ; δ ← f ( X ) , δ ← λf ( X ) ; while δ ≥ δ do while X (cid:54) = ∅ do { a j } j ∈ J ← rand-sequence ( X, S, I ) ; η ← binary-search ( J, min { j ∈ J : | X j | < (1 − ε ) | X |} ) with X j = { e ∈ X : f ( e | S ∪ { a , . . . , a j − } ) ≥ δ ∧ S ∪ { a , . . . , a j − } ∪ e ∈ I} A ← unif-sampling ( { a , . . . , a η − } , ϕ ) ; X ← X η ; S ← S ∪ A ; end while δ ← (1 − ε ) δ ; X ← { e ∈ V : f ( e | S ) ≥ δ ∧ S ∪ e ∈ I} ; end while return S ;algorithms also call the binary-search and unif-sampling sub-routines. The following is a descriptionof each algorithm and sub-routine. rand-sequence . It is based on the work of Karpet al. [1988]. Given as input a ground set X , a currentsolution S , and a p -system I , this algorithm finds arandom set A such that S ∪ A is a base for I . rand-sampling . This algorithms generalizes asampling algorithm proposed in Balkanski et al. [2019]to non-monotone submodular maximization. This algo-rithm requires as input an oracle function f , a groundset V , a p -system or p -extendible system I , and param-eters λ, ε, ϕ . The parameter λ determines the totalnumber of iterations for the rand-sampling , the pa-rameter ε determines the rate with which the variable δ decreases, whereas ϕ determines the distributionfor the unif-sampling sub-routine. For a constant δ , points are added to the current solution yielding amarginal contribution upper-bounded by δ . Note thatat each adaptive step, the rand-sampling uses the binary-search and the unif-sampling sub-routine.If Algorithm 2 reaches an iteration with X = ∅ , thenit decreases the value of δ so that points with lowermarginal contribution can be added to the currentsolution. binary-search . This sub-routine is just the stan-dard binary search algorithm. It is used to locate anindex η such that η = min { j : | X j | ≤ (1 − ε ) | X |} ,with X j , where the index j spans over the set J . Thissub-routine uses the fact that, due to submodularity,it holds | X j | ≤ | X j +1 | for all j ∈ J . rancesco Quinzan, Vanja Doskoč, Andreas Göbel, Tobias Friedrich Algorithm 3: rep-sampling ( f, V, I , ε, ϕ , ϕ , m ) λ ← ε ( p + 1) /m ; for j ≤ m iterations do Ω j ← rand-sampling ( f, V, I , λ, ε, ϕ ) ; Λ j ← unif-sampling (Ω j , ϕ ) ; V ← V \ Ω j ; end for return arg max j { f (Ω j ) , f (Λ j ) } ; unif-sampling . For a given input set and proba-bility ϕ , this algorithm samples points of the input setindependently, with probability ϕ . rep-sampling . This algorithm requires as inputan oracle function f a ground set V , a p -system or p -extendible system I and parameters λ, m, ε and ϕ , ϕ .At each step, the rep-sampling calls Algorithm 2 tofind a partial solution Ω j . Then, Algorithm 3 samples asubset of Ω j , where each point is drawn independentlywith probability ϕ . Afterwards, the rep-sampling removes all points of Ω j from the ground set, and itruns the rep-sampling on the resulting ground set.This procedure is iterated m times. In this section, we discuss theoretical run time analysisresults for Problem 1. We remark that all proofs aredeferred to the appendix . Approximation guaran-tees for Algorithm 3 follow from the following generaltheorem.
Theorem 1.
Fix constants ε ∈ (0 , , m ≥ , ϕ = 1 ,and ϕ = 1 / . Denote with Ω ∗ the output of Algorithm3. Then, f ( opt ) ≤ m (cid:18) (1 + ε )( p + 1)(1 − ε ) ( m −
1) + 2 (cid:19) E [ f (Ω ∗ ) ] . A proof of this theorem is given in Appendix B-C.We estimate the number of adaptive rounds until Algo-rithm 3 reaches the desired approximation guarantee.The following lemma holds.
Lemma 1.
Fix constants ε ∈ (0 , , ϕ , ϕ ∈ [0 , and m ≥ . Then Algorithm 3 terminates af-ter O (cid:16) mε log (cid:16) rpε (cid:17) log r log n (cid:17) rounds of adaptivity.Furthermore, Algorithm 3 has query complexity of O (cid:16) mnε log (cid:16) rpε (cid:17) log r log n (cid:17) . A proof of this result is given in Appendix D. Thefollowing lemma follows from Theorem 1 and Lemma 1.
Lemma 2.
Fix a constant ε ∈ (0 , , and define pa-rameters m = 1+ (cid:100) (cid:112) ( p + 1) / (cid:101) , ϕ = 1 , and ϕ = 1 / .Denote with Ω ∗ the optimal solution found by Algorithm3. Then, f ( opt ) ≤ − ε (1 − ε ) (cid:16) p + 2 (cid:112) p + 1) + 5 (cid:17) E [ f (Ω ∗ ) ] . Furthermore, with this parameter choice Algo-rithm 3 terminates after O (cid:16) √ pε log n log (cid:16) rpε (cid:17) log r (cid:17) rounds of adaptivity, and its query complexity is O (cid:16) √ pnε log n log (cid:16) rpε (cid:17) log r (cid:17) . A proof is given in Appendix E. We remark that thereexists an algorithm with constant adaptivity for uncon-strained non-monotone submodular maximization thatachieves an approximation guarantee arbitrarily closeto / (see Chen et al. [2019]). Using this algorithm asa sub-routine in line 4 of Algorithm 3 yields a constant-factor improvement over the approximation guaranteeof Lemma 2, without affecting the upper-bound on theadaptivity. However, this algorithm requires access toa continuous extension of the value oracle f , whereasAlgorithm 3 only requires access to f . In this section, we perform the theoretical analysis forthe rep-sampling , when maximizing a non-monotonesubmodular function under a p -extendible system sideconstraint, as in Problem 2. We prove that, withdifferent sets of input parameters, our algorithm hasadaptivity and query complexity that is not dependenton p . Again, all proofs are deferred to the appendix.The following theorem holds. Lemma 3.
Fix parameters ε ∈ (0 , , m = 1 , ϕ =( p + 1) − , and ϕ ∈ [0 , . Denote with Ω ∗ the outputof Algorithm 3. Then, f ( opt ) ≤ (1 + ε )( p + 1) p (1 − ε ) E [ f (Ω ∗ ) ] . With this parameter choice, Algorithm 3 terminatesafter O (cid:0) ε − log n log (cid:0) rε (cid:1) log r (cid:1) rounds of adaptivity,and it requires O (cid:0) nε log n log (cid:0) rε (cid:1) log r (cid:1) function evalu-ations. For a proof of this result see Appendix G. The proofof this lemma is based on the work of Feldman et al.[2017], together with the fact that Algorithm 2 yieldsexpected marginal increase lower-bounded by the bestpossible greedy improvement, up to a multiplicativeconstant.We remark that Lemma 3 also holds when side con-straints are p -matchoids and the intersections of ma-troids, since p -extendible systems are a generalizationof both. daptive Sampling for Fast Constrained Maximization of Submodular Functions We conclude our analysis with a general discussion onthe performance of Algorithm 3 in the number of callsto the independence oracle for the p -system constraint.The independence oracle takes as input a set S , andreturns as output a Boolean value, true if the given setis independent in I and false otherwise. The followinglemma holds. Lemma 4.
Fix parameters ε ∈ (0 , , m ≥ ,and ϕ , ϕ ∈ [0 , . Then Algorithm 3 requires ex-pected O (cid:16) m √ nε log (cid:16) rpε (cid:17) log r log n (cid:17) rounds of indepen-dent calls to the oracle for the p -system constraint.Furthermore, the total number of calls to the indepen-dence system is O (cid:16) mn / ε log (cid:16) rpε (cid:17) log r log n (cid:17) . A proof of this result is given in Appendix F, and itfollows from the work of Karp et al. [1988]. Note thatthe rounds of independent calls to the oracle are sub-linear, but not poly-logarithmic in the problem size.The reason is that Algorithm 1 requires O ( √ n ) roundsof independent calls to the oracle for the p -system. Weare not aware of any algorithm that finds a base in lessthan O ( √ n ) rounds. Furthermore, it is well-knownthat there is no algorithm that obtains an approxima-tion guarantee that is constant in the problem size forProblem 1, than ˜Ω( n / ) steps of independent calls tothe oracle for the p -system constraint (see Karp et al.[1988], Balkanski et al. [2019]).For a p -system I , the rank of a set S is the maximumcardinality of its intersection with a maximum indepen-dent set in I . Given access to an oracle that returnsthe rank of a set in I , it is possible to design an al-gorithm that finds a maximum independent set of a p -system in O (cid:0) log n (cid:1) rounds of independent calls tothe rank oracle (see Karp et al. [1988]). However, thiswork focuses on general constraints where the rank ofa set is not known. In our set of experiments, we implement the rep-sampling as describe in Algorithm 3. We alwaystest our algorithm against these algorithms:• fantom . This algorithm, which iterates a den-sity greedy algorithm multiple times, is studiedin Gupta et al. [2010] and Mirzasoleiman et al.[2016].• repeatedGreedy . This algorithm, studied inFeldman et al. [2017], consists of iterating a greedy algorithm multiple times. It uses Algorithm 1 inBuchbinder et al. [2015] as a sub-routine.•
FastSGS . This algorithm is studied in Feldmanet al. [2020], and it is essentially a fast implementa-tion of the simultaneousGreedys
Feldman et al.[2020]. This algorithm updates multiple solutionsconcurrently, and it picks the best of them.• sampleGreedy . This algorithm is specificallydesigned to handle p -extendible systems (see Feld-man et al. [2017]). This algorithm samples pointsindependently at random, and then it builds agreedy solution over the resulting set.Note that these algorithms only require access to theindependence oracle for the side constraints. In ourexperiments we do not consider algorithms that requireaccess to the rank oracle, since they are impractical forour applications. We perform two sets of experiments,on the following applications:• Video Summarization . This problem asks tofind a set of representative frames for a given video.We use Determinantal Point Processes to selecta diverse set of frame. In order to get bettersummaries, we employ a face-recognition tool toidentify faces in each segment. This experiment isdescribed in Section 8, and the results are displayedin Figure 1.•
Bayesian D-Optimality . Here, the goal is to de-sign an experiment that maximizes the expectedutility of the outcome, using preliminary obser-vations. We use observations from the BerkeleyEarth data-set to select thermal stations aroundthe world, to measure the temperature with. Thisexperiment is described in Section 9, and the re-sults are displayed in Figure 2.The code and the datasets are available upon request.
We study an application of our setting to a data sum-marization task: Given a video consisting of orderedframes, choose a subset of frames that gives a descrip-tive overview of the video. An effective way to selecta diverse set of items is to apply Determinantal PointProcesses [Macchi, 1975]. For a thorough survey onDeterminantal Point Processes and their applications,we refer the reader to Kulesza and Taskar [2012].For a set of items V = { , . . . , n } , a Determinan-tal Point Process (DPP) defines a discrete probabil-ity distribution over all subsets S ⊆ V as Pr ( S ) = rancesco Quinzan, Vanja Doskoč, Andreas Göbel, Tobias Friedrich fantomrepeatedGreedyfastSGSrep-sampling (1)sampleGreedyrep-sampling (2)
FastSGS , and it has better adaptivity thanthe greedy algorithms. The solution quality for the sampleGreedy and rep-sampling , with parametersas in Lemma 3, is worse on these instances.
Bayesian experimental design provides a general frame-work to select a set of experiments, that maximize theexpected utility of the outcome. Formally, we want toestimate the parameter θ of a function y = f θ ( x ) + w ,where w is an error. In this framework, the input x isgenerated by a set of experiments. Assuming that pa-rameters are equipped with a prior, Bayesian optimalitycriteria are useful in identifying the right experimentsto perform, in order to generate the input x .We focus on linear regressions of the form y = θ T X + w ,with y, w ∈ R n , θ ∈ R m and X ∈ R m × n . Furthermore,we assume independent and homoscedastic noise. Weapproach experimental design with the D-optimalitycriterion, although other methods can be used to thisend [Krause et al., 2008]. This criterion consists of daptive Sampling for Fast Constrained Maximization of Submodular Functions fantomrepeatedGreedyfastSGSrep-sampling (1)sampleGreedyrep-sampling (2)
10 Conclusion
In this paper, we develop the first algorithm for non-monotone submodular maximization under p -systemand p -extendible system side constraints, with poly-logarithm adaptivity (see Lemma 2 and Theorem 3).This algorithm also competes with previous knownresults in terms of the query complexity and approxi-mation guarantee (see Table 1).We consider two applications and study the perfor-mance of our algorithm against several other algorithmssuitable for this problem. We observe that our algo-rithms has superior adaptivity, and that it competesin terms of the query complexity (see Figure 1-2). rancesco Quinzan, Vanja Doskoč, Andreas Göbel, Tobias Friedrich
11 Acknowledgements
We would like to thank Christopher Weyand for helpingthe authors to develop some of the code used for theexperiments. We would like to thank Martin Schirneckfor useful discussion on previous related work. This re-search has been partly funded by the Federal Ministryof Education and Research of Germany in the frame-work of KI-LAB-ITSE (project number 01IS19066).
References
Ashwinkumar Badanidiyuru and Jan Vondrák. Fastalgorithms for maximizing submodular functions. In
Proc. of SODA , pages 1497–1514, 2014.Ashwinkumar Badanidiyuru, Baharan Mirzasoleiman,Amin Karbasi, and Andreas Krause. Streaming sub-modular maximization: massive data summarizationon the fly. In
Proc. of KDD , pages 671–680, 2014.Maria-Florina Balcan and Nicholas J. A. Harvey. Learn-ing submodular functions. In proc. of STOC , pages793–802, 2011.Eric Balkanski and Yaron Singer. Approximation guar-antees for adaptive sampling. In
Proc. of ICML ,pages 393–402, 2018a.Eric Balkanski and Yaron Singer. The adaptive com-plexity of maximizing a submodular function. In
Proc. of STOC , pages 1138–1151, 2018b.Eric Balkanski, Aviad Rubinstein, and Yaron Singer.The limitations of optimization from samples. In
Proc. of STOC , pages 1016–1027, 2017.Eric Balkanski, Adam Breuer, and Yaron Singer. Non-monotone submodular maximization in exponentiallyfewer iterations. In
Proc. of NeurIPS , pages 2359–2370, 2018.Eric Balkanski, Aviad Rubinstein, and Yaron Singer.An optimal approximation for submodular maximiza-tion under a matroid constraint in the adaptive com-plexity model. In
Proc. of STOC , pages 66–77, 2019.Niv Buchbinder, Moran Feldman, Joseph Naor, andRoy Schwartz. Submodular maximization with car-dinality constraints. In
Proc. of SODA , pages 1433–1452, 2014.Niv Buchbinder, Moran Feldman, Joseph Naor,and Roy Schwartz. A tight linear time (1/2)-approximation for unconstrained submodular maxi-mization.
SIAM Journal of Computing , 44(5):1384–1402, 2015.Amit Chakrabarti and Sagar Kale. Submodular maxi-mization meets streaming: matchings, matroids, andmore.
Mathematical Programming , 154(1-2):225–247,2015. Kathryn Chaloner and Isabella Verdinelli. Bayesianexperimental design: A review.
Statistical Science ,10:273–304, 1995.Chandra Chekuri and Martin Pál. A recursive greedyalgorithm for walks in directed graphs. In
Proc. ofFOCS , pages 245–253, 2005.Chandra Chekuri and Kent Quanrud. Parallelizinggreedy for submodular set function maximization inmatroids and beyond. In Moses Charikar and EdithCohen, editors,
Proc. of STOC , pages 78–89, 2019.Chandra Chekuri, Jan Vondrák, and Rico Zenklusen.Submodular function maximization via the multi-linear relaxation and contention resolution schemes.
SIAM Journal of Computing , 43(6):1831–1879, 2014.Chandra Chekuri, Shalmoli Gupta, and Kent Quan-rud. Streaming algorithms for submodular functionmaximization. In
Proc. of ICALP , pages 318–330,2015.Lin Chen, Moran Feldman, and Amin Karbasi. Un-constrained submodular maximization with constantadaptive complexity. In
Proc. of STOC , pages 102–113, 2019.Abhimanyu Das and David Kempe. Approximate sub-modularity and its applications: Subset selection,sparse approximation and dictionary selection.
Jour-nal of Machine Learning Research , 19:3:1–3:34, 2018.Michal Derezinski, Feynman Liang, and Michael W.Mahoney. Bayesian experimental design using reg-ularized determinantal point processes. In
Proc. ofAISTATS , volume 108, pages 3197–3207, 2020.Alina Ene and Huy L. Nguyen. Submodular max-imization with nearly-optimal approximation andadaptivity in nearly-linear time. In
Proc. of SODA ,pages 274–282, 2019.Alina Ene, Huy L. Nguyen, and Adrian Vladu. Sub-modular maximization with matroid and packingconstraints in parallel. In
Proceedings of STOC ,pages 90–101, 2019.Matthew Fahrbach, Vahab S. Mirrokni, and MortezaZadimoghaddam. Non-monotone submodular maxi-mization with nearly optimal adaptivity and querycomplexity. In
Proc. of ICML , pages 1833–1842,2019a.Matthew Fahrbach, Vahab S. Mirrokni, and MortezaZadimoghaddam. Submodular maximization withnearly optimal approximation, adaptivity and querycomplexity. In
Proc. of SODA , pages 255–273, 2019b.Uriel Feige. A threshold of ln n for approximating setcover. Journal of the ACM , 45(4):634–652, 1998.Uriel Feige, Vahab S. Mirrokni, and Jan Vondrák. Max-imizing non-monotone submodular functions.
SIAMJournal of Computing , 40(4):1133–1153, 2011. daptive Sampling for Fast Constrained Maximization of Submodular Functions
Moran Feldman, Joseph Naor, and Roy Schwartz. Non-monotone submodular maximization via a structuralcontinuous greedy algorithm - (extended abstract).In
Proc. of ICALP , pages 342–353, 2011.Moran Feldman, Christopher Harshaw, and Amin Kar-basi. Greed is good: Near-optimal submodular maxi-mization via greedy optimization. In
Proc. of COLT ,pages 758–784, 2017.Moran Feldman, Amin Karbasi, and Ehsan Kazemi. Doless, get more: Streaming submodular maximizationwith subsampling. In
Proc. of NeurIPS , pages 730–740, 2018.Moran Feldman, Christopher Harshaw, and Amin Kar-basi. Simultaneous greedys: A swiss army knifefor constrained submodular maximization.
CoRR ,abs/2009.13998, 2020.M. Fisher, George Nemhauser, and Laurence Wolsey.An analysis of approximations for maximizing sub-modular set functions - II.
Mathematical Program-ming , 14:73–87, 03 1978.Tobias Friedrich, Andreas Göbel, Frank Neumann,Francesco Quinzan, and Ralf Rothenberger. Greedymaximization of functions with bounded curvatureunder partition matroid constraints. In
Proc. ofAAAI , pages 2272–2279, 2019.Boqing Gong, Wei-Lun Chao, Kristen Grauman, andFei Sha. Diverse sequential subset selection for super-vised video summarization. In
Proc. of NIPS , pages2069–2077, 2014.Anupam Gupta, Aaron Roth, Grant Schoenebeck, andKunal Talwar. Constrained non-monotone submod-ular maximization: Offline and secretary algorithms.In
Proc. of WINE , pages 246–257, 2010.Richard M. Karp, Eli Upfal, and Avi Wigderson. Thecomplexity of parallel search.
Journal of ComputerSystem Science , 36(2):225–253, 1988.Andreas Krause, Ajit Paul Singh, and Carlos Guestrin.Near-optimal sensor placements in gaussian pro-cesses: Theory, efficient algorithms and empiricalstudies.
Journal of Machine Learning Research , 9:235–284, 2008.Alex Kulesza and Ben Taskar. k-dpps: Fixed-sizedeterminantal point processes. In
Proc. of ICML ,pages 1193–1200, 2011.Alex Kulesza and Ben Taskar. Determinantal pointprocesses for machine learning.
Foundations andTrends in Machine Learning , 5(2-3):123–286, 2012.Ravi Kumar, Benjamin Moseley, Sergei Vassilvitskii,and Andrea Vattani. Fast greedy algorithms inmapreduce and streaming.
ACM Transaction onParallel Computing , 2(3):14:1–14:22, 2015. Jon Lee, Vahab S. Mirrokni, Viswanath Nagarajan, andMaxim Sviridenko. Non-monotone submodular max-imization under matroid and knapsack constraints.In
Proc. of STOC , pages 323–332, 2009.Wenzheng Li, Paul Liu, and Jan Vondrák. A polyno-mial lower bound on adaptive complexity of submod-ular maximization.
CoRR , abs/2002.09130, 2020.Hui Lin and Jeff Bilmes. Multi-document summa-rization via budgeted maximization of submodularfunctions. In
Proc. of NAACL-HLT , pages 912–920,2010.Odile Macchi. The coincidence approach to stochasticpoint processes.
Advances in Applied Probability , 7(1):83–122, 1975. doi: 10.2307/1425855.Julián Mestre. Greedy in approximation algorithms.In
Proc. of ESA , pages 528–539, 2006.Baharan Mirzasoleiman, Amin Karbasi, Rik Sarkar,and Andreas Krause. Distributed submodular maxi-mization: Identifying representative elements in mas-sive data. In
Proc. of NIPS , pages 2049–2057, 2013.Baharan Mirzasoleiman, Ashwinkumar Badanidiyuru,Amin Karbasi, Jan Vondrák, and Andreas Krause.Lazier than lazy greedy. In
Proc. of AAAI , pages1812–1818, 2015.Baharan Mirzasoleiman, Ashwinkumar Badanidiyuru,and Amin Karbasi. Fast constrained submodularmaximization: Personalized data summarization. In
Proc. of ICML , pages 1358–1367, 2016.Baharan Mirzasoleiman, Stefanie Jegelka, and AndreasKrause. Streaming non-monotone submodular maxi-mization: Personalized video summarization on thefly. In Sheila A. McIlraith and Kilian Q. Weinberger,editors,
Proc. of AAAI , pages 1379–1386, 2018.George L. Nemhauser, Laurence A. Wolsey, and Mar-shall L. Fisher. An analysis of approximations formaximizing submodular set functions - I.
Mathemat-ical Programming , 14(1):265–294, 1978.Nir Rosenfeld, Eric Balkanski, Amir Globerson, andYaron Singer. Learning to optimize combinatorialfunctions. In
Proc. of ICML , pages 4371–4380, 2018.Benjamin Sapp and Ben Taskar. Modec: Multimodaldecomposable models for human pose estimation. In
Proc. of CVPR , 2013.P. Sebastiani and H.P. Wynn. Maximum entropysampling and optimal bayesian experimental design.
Journal of the Royal Statistical Society: Series B(Statistical Methodology) , 62(1):145–157, 2000.Maxim Sviridenko. A note on maximizing a submod-ular set function subject to a knapsack constraint.
Operation Research Letters , 32(1):41–43, 2004. rancesco Quinzan, Vanja Doskoč, Andreas Göbel, Tobias Friedrich
Steven K. Thompson. Adaptive cluster sampling.
Jour-nal of the American Statistical Association , 85(412):1050–1059, 1990.Leslie G. Valiant. A theory of the learnable.
Commu-nication of the ACM , 27(11):1134–1142, 1984. daptive Sampling for Fast Constrained Maximization of Submodular Functions
AppendixA Notation
Here, we discuss the notation that will be used throughout the proofs in the Appendix, which is also discussed inthe main body of the paper.For any submodular function f : 2 V → R and sets S, U ⊆ V , we define the marginal value of S with respect to U as f ( U | S ) = f ( S ∪ U ) − f ( S ) . Note that, if f only attains non-negative values, it holds that f ( U ) ≥ f ( U | S ) for all S, U ⊆ V .We always use the notation introduced in Problem 1, and we always denote with opt a solution to Problem 1.Furthermore, we always denote with n the size of the ground set V , i.e., n is the number of singletons in oursolution space. We denote with r the maximum size of any feasible solution in I . This quantity is sometimescalled the rank.We denote with m, ε the parameters as in Algorithm 3. Sets Ω j and Λ j are as in Algorithm 3, and we denotewith V j the ground set for the rand-sampling algorithm, during the j -th iteration of the for-loop lines 2-6 ofAlgorithm 3. Furthermore δ , δ , and λ are as in Algorithm 2 and Algorithm 3. We denote with n the problemsize, and we denote with r the maximum size of a feasible solution in I . B Proof of Lemma 5
In this section, we prove a lemma that is useful to prove the desired approximation guarantee. Throughout thesection, we always implicitly assume that parameters for Algorithm 2 and Algorithm 3 are as in Theorem 1, i.e., ϕ = 1 and ϕ = 1 / . The following lemma holds. Lemma 5.
It holds ( p + 1) E [ f (Ω j ) ] + λr E [ f (Ω j ) ] ≥ (1 − ε ) E [ f (Ω j ∪ ( opt ∩ V j ) ] , for all sets Ω j . On a high level, we prove that δ is an upper-bound for the best possible improvement up to a multiplicativeconstant, and that the marginal contribution of any point added to the current solution does not exceed δ inexpected value, up to a multiplicative constant. We then combine this fact with the defining properties of the p -system to prove the claim, which holds for non-monotone functions. With this lemma, we prove that Algorithm3 yields, in expectation, a constant-factor approximation guarantee for Problem 1. B.1 Preliminary Results
In order to prove Lemma 5, we need the following technical proposition.
Proposition 1 (Proposition 2.2 in Fisher et al. [1978]) . Let { x , . . . , x m } , { y , . . . , y m } be two sequences ofnon-negative real numbers. Suppose that it holds i (cid:88) j =1 x i ≤ i, for all i ∈ [ m ] and y i ≥ y i +1 for all i ∈ [ m − . Then m (cid:88) j =1 y i ≥ m (cid:88) j =1 x i y i . B.2 Additional Lemmas
We first prove the following result, which follows from Lemma 2 in Balkanski et al. [2019].
Lemma 6.
Fix an index j , and let δ, Ω be as in Algorithm 2, as it runs over the set V j . Then it holds δ ≥ (1 − ε ) sup { e ∈ V j \ Ω: Ω ∪ e ∈I} f ( e | Ω) . rancesco Quinzan, Vanja Doskoč, Andreas Göbel, Tobias Friedrich Proof.
We prove the claim by induction on the iterations of Algorithm 2. The base case trivially holds, due to thedefinition of δ . Suppose that at some point the current solution Ω is updated to Ω ∪ { a , . . . , a η } . We have that sup { e ∈ V j \ (Ω ∪{ a ,...,a η } ): Ω ∪{ a ,...,a η }∪ e ∈I} f ( e | Ω ∪ { a , . . . , a η } ) ≤ sup { e ∈ V j \ (Ω ∪{ a ,...,a η } ): Ω ∪{ a ,...,a η }∪ e ∈I} f ( e | Ω) ≤ sup { e ∈ V j \ Ω: Ω ∪ e ∈I} f ( e | Ω) ≤ δ, where the first inequality uses the submodularity property of f ; the second inequality holds since { e ∈ V j \ (Ω ∪{ a , . . . , a η } ) : Ω ∪ { a , . . . , a η } ∪ e ∈ I} ⊆ { e ∈ V j \ Ω : Ω ∪ e ∈ I} ; the last inequality follows due to the inductivehypothesis.We now show that the claim holds when δ is updated to δ (cid:48) = (1 − ε ) δ . At this point, it holds X = ∅ , and eachpoint e ∗ ∈ V j \ Ω such that Ω ∪ e ∗ ∈ I was discarded during a previous iteration. Denote with Ω (cid:48) the solution atthe iteration when e ∗ was discarded, and let { a , . . . , a η } be the next set of points added to Ω (cid:48) . Since e ∗ wasdiscarded, then one of the following two conditions must hold:1. Ω (cid:48) ∪ { a , . . . , a η } ∪ e ∗ / ∈ I ;2. f ( e ∗ | Ω (cid:48) ∪ { a , . . . , a η } ) ≤ δ .In the first case, due to the downward-closed property of I it holds Ω ∪ e ∗ / ∈ I , which contradicts the definition of e ∗ . In the latter case it holds f ( e ∗ | Ω) ≤ f ( e ∗ | Ω (cid:48) ∪ { a , . . . , a η } ) ≤ δ = δ (cid:48) − ε where the first inequality uses the fact that f is submodular. The claim follows.We also need an additional lemma, to prove that Algorithm 3 gives a constant-factor approximation for Problem1. Lemma 7.
Fix an index j , and let δ, X be as in Algorithm 2, as it optimizes the function f over the set V j .Denote with { a i } i the points of Ω j , sorted as they were added to it. It holds E a i [ f ( a i | { a , . . . , a i − } ) ] ≥ (1 − ε ) δ. Proof.
First, suppose that the constant δ is updated after the point a i − is added to the current solution. Inthat case, by definition of the set X , every point e ∈ X \ { a , . . . , a i − } such that { a , . . . , a i − } ∪ e ∈ I yields f ( e | { a , . . . , a i − } ) > δ . Hence, the claim holds.Suppose now that the constant δ is not updated after the point a i − is added to the current solution. Define theset X I i − := { e ∈ X : { a , . . . , a i − } ∪ e ∈ I} . We first claim that the point a i is chosen uniformly at random overthe set X I i − . In fact, if a i is added to the current solution, then there exists a set { x , . . . , x η } as in Line 5 ofAlgorithm 1 such that a i ∈ { x , . . . , x η } . Let j ≤ η be an index such that a i = x j . Due to the downward-closedproperty of I it holds { e : A ∪ { x , . . . , x j , e } ∈ I} ⊆ X \ { x , . . . , x j } . Hence, a i is chosen uniformly at randomamong all points e such that A ∪ { x , . . . , x j − , e } = { a , . . . , a j − , e } ∈ I .Then Pr a i ( f ( a i | { a , . . . , a i − } ) > δ ) ≥ | X i − | (cid:12)(cid:12) X I i − (cid:12)(cid:12) ≥ | X i − || X | , where we have used that X I i − ⊆ X i − and that the point e is chosen uniformly at random over the set X I i − .We now prove that | X i − | / | X | ≥ (1 − ε ) . To this end, we first note that | X i | ≥ | X i +1 | , for all indices i . Fix apoint e ∈ X i +1 . Then, this point yields { a , . . . , a i } ∪ a ∈ I and f ( a | { a , . . . , a i } ) ≥ δ . By the downward-closedproperty of I we get { a , . . . , a i − } ∪ a ∈ I , and by submodularity we get f ( a | { a , . . . , a i − } ) ≥ δ . Hence, X i +1 ⊆ X i and | X i | ≥ | X i +1 | as claimed. It follows that the binary-search sub-routine of Algorithm 2terminates whenever η = min { i : | X i | ≤ (1 − ε ) | X |} , which implies that | X i − | > (1 − ε ) | X | . The claim followssince it holds E a i [ f ( a i | { a , . . . , a i − } ) ] ≥ Pr a i ( f ( a i | { a , . . . , a i − } ) > δ ) δ . daptive Sampling for Fast Constrained Maximization of Submodular Functions B.3 Proof of Lemma 5
Using Proposition 1, Lemma 6-7, we can prove Lemma 5. This proof uses ideas in Gupta et al. [2010].
Proof of Lemma 5.
Denote with { a i } i the points of Ω j in the order that they were added to Ω j . Define the set W δ := { e ∈ opt ∩ V j : f ( e | Ω j ) ≥ δ } . Note that this set consists of all points of opt ∩ V j such that their marginal contribution is above δ , when addedto the current solution. First, fix a set Ω j , and suppose that f ( a i | { a , . . . , a i − } ) ≥ δ for all indices i . Define thesets A i := { e ∈ W δ \ { a , . . . , a i } : { a , . . . , a i } ∪ e ∈ I} . Since the system I is downward-closed, then A i ⊆ A i − .Define the sets D i := A i − \ A i . Note that these sets consist of all points in W δ that yield a feasible solutionwhen added to { a , . . . , a i − } , but that violate side constraints when added to { a , . . . , a i } .We now claim that the set { a , . . . , a i } is a maximal independent set for { a , . . . , a i } ∪ ( D ∪ · · · ∪ D i ) = { a , . . . , a i } ∪ ( W δ \ A i ) . To this end, note that the set { a , . . . , a i } is independent by definition, and that any point e ∈ ( W δ \ A i ) \{ a , . . . , a i } is such that { a , . . . , a i } ∪ e / ∈ I . Hence { a , . . . , a i } is maximal as claimed. Note also that D ∪ · · · ∪ D i ⊆ W δ is an independent set, due to the subset-closure of I . Since I is a p -system, then it holds | D | + · · · + | D i | = | D ∪ · · · ∪ D i | ≤ p |{ a , . . . , a i }| = pi. (1)Furthermore, using submodularity and Lemma 6 it holds | D i | δ ≥ (1 − ε ) | D i | sup e f ( e | { a , . . . , a i − } ) ≥ (1 − ε ) f ( D i | { a , . . . , a i − } ) ≥ (1 − ε ) f ( D i | Ω j ) , where the last inequality follows from submodularity. Combining this with (1) and Proposition 1, we get p (cid:88) i f ( a i | { a , . . . , a i − } ) ≥ (cid:88) i | D i | δ ≥ (1 − ε ) (cid:88) i f ( D i | Ω j ) ≥ f ( W δ | Ω j ) , where the last inequality uses submodularity. Hence, rearranging yields pf (Ω j ) ≥ f (Ω j ∪ W δ ) − f (Ω j ) = f ( W δ | Ω j ) . If we unfix the set Ω j and take the expected value, and using Lemma 7 we get p E [ f (Ω j ) ] ≥ (1 − ε ) E [ f ( W δ | Ω j ) ] . (2)Using Lemma 7 again, we have that E a i [ f ( a i | { a , . . . , a i − } ) ] ≥ , for all points a i added to Ω j . It followsthat E [ f (Ω j ) ] ≥ E [ f ( a ) ] , with a the first point added to Ω j . Then, from the definition of δ it follows that λ E [ f (Ω j ) ] ≥ λ E [ f ( a ) ] = E [ δ ] . Hence, using submodularity and the linearity of the expected value, we get λr E [ f (Ω j ) ] ≥ r E [ δ ] ≥ E (cid:88) e ∈ ( opt ∩ V j ) \ W δ f ( e | Ω j ) ≥ E [ f (( opt ∩ V j ) \ W δ | Ω j ) ] , (3)where we have used submodularity.Combining (2) with (3) and using submodularity again we get p E [ f (Ω j ) ] + λr E [ f (Ω j ) ] ≥ (1 − ε ) E [ f ( opt ∩ V j | Ω j ) ] . The claim follows by rearranging. C Proof of Theorem 5
C.1 Preliminary results.
In our analysis we consider the following well-known result. rancesco Quinzan, Vanja Doskoč, Andreas Göbel, Tobias Friedrich
Lemma 8 (Theorem 2.1 in Feige et al. [2011]) . Let U ⊆ V be a set chosen uniformly at random. Then it holds E [ f ( U ) ] ≥ f ( O ) / , with O ⊆ V the subset attaining the maximum f -value. Furthermore, we also consider the following properties of submodular functions.
Lemma 9 (Lemma 10 in Feldman et al. [2017]) . For any fixed m -tuple of mutually disjoint sets Ω j it holds ( m − f ( opt ) ≤ (cid:80) j ≤ m f (Ω j ∪ opt ) . Lemma 10 (Lemma 11 in Feldman et al. [2017]) . Let f : 2 V → R ≥ be a non-negative submodular function. Forevery three sets A, B, C ⊆ V it holds f ( A ∪ ( B ∩ C )) + f ( B \ C ) ≥ f ( A ∪ B ) . C.2 Proof of Theorem 1
Using Lemma 8-10 together with Lemma 5, we can prove Theorem 1.
Proof of Theorem 1.
Fix an m -tuple of sets Ω j for Algorithm 3, and consider the sets opt \ Ω j , for all j ≤ m . Notethat it holds V j = V \∪ i ≤ j Ω i . Hence, f ( opt \ V i ) = f ( opt \ ( V \∪ i ≤ j Ω i )) = f ( ∪ i ≤ j ( opt ∩ Ω j )) ≤ (cid:80) i ≤ j f ( opt ∩ Ω j ) ,where the last inequality uses submodularity.Unfixing the sets Ω j and taking the expected value yields E [ f ( opt \ V i ) ] ≤ (cid:88) i ≤ j E [ f ( opt ∩ Ω j ) ] , (4)for all indices j ≤ m . We have that it holds ( m − f ( opt ) ≤ (cid:88) j ≤ m E [ f ( opt ∪ Ω j ) ] ≤ (cid:88) j ≤ m E [ f (Ω j ∪ ( opt ∩ V j )) ] + (cid:88) j ≤ m E [ f ( opt \ V j ) ] ≤ m p + 1(1 − ε ) E [ f (Ω j ) ] + m λr (1 − ε ) E [ f (Ω j ) ] + (cid:88) j ≤ m (cid:88) i ≤ j E [ f ( opt ∩ Ω j ) ] ≤ m ( p + 1) 1 + ε (1 − ε ) E [ f (Ω j ) ] + (cid:88) j ≤ m (cid:88) i ≤ j E [ f ( opt ∩ Ω j ) ] ≤ m ( p + 1) 1 + ε (1 − ε ) E [ f (Ω j ) ] + 4 (cid:88) j ≤ m (cid:88) i ≤ j E [ f (Λ j ) ] , ≤ m ( p + 1) 1 + ε (1 − ε ) E [ f (Ω ∗ ) ] + 2 m ( m − E [ f (Ω ∗ ) ] , where the first inequality follows from Lemma 9; the second inequality follows from Lemma 10; the third inequalityfollows from (4) and Lemma 5; the fourth inequality follows from Lemma 8; the last inequality follows since f (Ω ∗ ) is maximum over f (Ω j ) and f (Λ j ) . The claim follows by rearranging the inequality above. D Proof of Lemma 1
We now prove an upper-bound on the run time for Algorithm 3.
Lemma 1.
Fix constants ε ∈ (0 , , ϕ , ϕ ∈ [0 , and m ≥ . Then Algorithm 3 terminates af-ter O (cid:16) mε log (cid:16) rpε (cid:17) log r log n (cid:17) rounds of adaptivity. Furthermore, Algorithm 3 has query complexity of O (cid:16) mnε log (cid:16) rpε (cid:17) log r log n (cid:17) .Proof. First, note that Algorithm 1 requires no function evaluation, and it always return a sequence { a i } i oflength at most r .At each step of Algorithm 2, the binary-search sub-routine requires O (log( r )) iterations, sine the set J hassize at most r . Each iteration of this sub-routine requires O (1) rounds of adaptivity, and O (log( r )) functionevaluations. daptive Sampling for Fast Constrained Maximization of Submodular Functions Note also that the while-loop, lines 5-10 of Algorithm 2 terminates after at most O (cid:0) ε − log n (cid:1) iterations. In fact,we have that | X | ≤ n , and that at each iteration the size of the new set X decreases of a multiplicative factorof (1 − ε ) . Similarly, the outer while-loop, lines 4-13 of Algorithm 2 terminates after at most O (cid:0) ε − log( r/pε ) (cid:1) iterations.Hence, each call of Algorithm 2 requires O (cid:0) mε − log( r/ ( pε )) log( ε − log r ) log n ) (cid:1) rounds of adaptivity. Similarly,since the binary-search sub-routine requires O ( n log( r )) function evaluations, then the query complexity is O (cid:0) mnε − log( r/ ( pε )) log( r ) log n ) (cid:1) . E Proof of Lemma 2
We perform the run time analysis for an optimal choice of the parameter m . We have that the following theoremholds. Lemma 2.
Fix a constant ε ∈ (0 , , and define parameters m = 1 + (cid:100) (cid:112) ( p + 1) / (cid:101) , ϕ = 1 , and ϕ = 1 / .Denote with Ω ∗ the optimal solution found by Algorithm 3. Then, f ( opt ) ≤ − ε (1 − ε ) (cid:16) p + 2 (cid:112) p + 1) + 5 (cid:17) E [ f (Ω ∗ ) ] . Furthermore, with this parameter choice Algorithm 3 terminates after O (cid:16) √ pε log n log (cid:16) rpε (cid:17) log r (cid:17) rounds ofadaptivity, and its query complexity is O (cid:16) √ pnε log n log (cid:16) rpε (cid:17) log r (cid:17) .Proof. We start with the approximation guarantee. Denote with Ω ∗ an approximate solution found by Algorithm3, and let opt be the optimal solution for Problem 1. Then, from Theorem 1 we get f ( opt ) ≤ m (cid:18) (1 + ε )( p + 1)(1 − ε ) ( m −
1) + 2 (cid:19) E [ f (Ω ∗ ) ] Substituting m = 1 + (cid:100) (cid:112) ( p + 1) / (cid:101) and rearranging yields f ( opt ) ≤ ε (1 − ε ) p + 2 (cid:38)(cid:114) p + 12 (cid:39) + ( p + 1) (cid:38)(cid:114) p + 12 (cid:39) − + 3 E [ f (Ω ∗ ) ] ≤ ε (1 − ε ) p + 2 (cid:32)(cid:114) p + 12 + 1 (cid:33) + ( p + 1) (cid:32)(cid:114) p + 12 (cid:33) − + 3 E [ f (Ω ∗ ) ]= (1 + ε )( p + 2 (cid:112) p + 1) + 5)(1 − ε ) E [ f (Ω ∗ ) ] . Hence, the claim on the approximation guarantee follows. The upper-bounds on the adaptivity, depth, and totalnumber of calls to the valuation oracle function follow directly from Lemma 1.
F Proof of Lemma 4
In this section, we perform the run time analysis for Algorithm 3 with respect to the calls to the independenceoracle for the p -system. The following lemma holds. Lemma 4.
Fix parameters ε ∈ (0 , , m ≥ , and ϕ , ϕ ∈ [0 , . Then Algorithm 3 requires expected O (cid:16) m √ nε log (cid:16) rpε (cid:17) log r log n (cid:17) rounds of independent calls to the oracle for the p -system constraint. Further-more, the total number of calls to the independence system is O (cid:16) mn / ε log (cid:16) rpε (cid:17) log r log n (cid:17) . In order to prove this Lemma, we use the following well-known result.
Theorem 2 (Theorem 6 in Karp et al. [1988]) . Algorithm 1 terminates after O ( √ r ) steps. Combining Theorem 2 with Lemma 1, we prove Lemma 4 as follows. rancesco Quinzan, Vanja Doskoč, Andreas Göbel, Tobias Friedrich
Proof.
We first observed that the independence oracle for the p -system is called by Algorithm 1, and also byAlgorithm 2.Since at each iteration of Algorithm 1, queries to the oracle for the p -system are independent, the from Theorem 2it follows that Algorithm 1 requires O ( √ r ) rounds of independent calls to the oracle for the p -system. Furthermore,all calls to the independence oracle for the p -system in Algorithm 2 are independent. Combining these observationswith Lemma 1 it follows that Algorithm 3 requires O (cid:0) m √ n/ε log( n ) log( r/ε ) (cid:1) rounds of independent calls tothe oracle for the p -system.The claim follows, since at each round at most O ( n ) calls to the oracle for the p -system are executed in parallel. G Proof of Theorem 3
G.1 Preliminary Results.
In order to prove Theorem 3 we use the following well-known result.
Lemma 11 (Lemma 2.2 in Buchbinder et al. [2014]) . Let Ω ⊆ V be a set such that each element appears in Ω with probability at most k . Then it holds E [ f ( U ) ] ≥ (1 − k ) f ( ∅ ) . G.2 Additional Lemmas.
In this section, we prove the following theorem.
Lemma 3.
Fix parameters ε ∈ (0 , , m = 1 , ϕ = ( p + 1) − , and ϕ ∈ [0 , . Denote with Ω ∗ the output ofAlgorithm 3. Then, f ( opt ) ≤ (1 + ε )( p + 1) p (1 − ε ) E [ f (Ω ∗ ) ] . With this parameter choice, Algorithm 3 terminates after O (cid:0) ε − log n log (cid:0) rε (cid:1) log r (cid:1) rounds of adaptivity, and itrequires O (cid:0) nε log n log (cid:0) rε (cid:1) log r (cid:1) function evaluations. In order to prove this theorem, we introduce additional notation. First of all, since m = 1 , we need not specifythe index on the input search space V i and solution Ω i of Algorithm 2, and we simply use the notation V = V and Ω = Ω . Again we define | V | = n . Furthermore, we define an ordering of the points { v i } i = V , with v i the i -th point sampled by Algorithm 2 during run time. All points of V that are not sampled during run time, areplaced at the end of the sequence { v i } i in random order. Furthermore, we define the sets T i = { v , . . . , v i } ∩ Ω ,and we define the sequence { v ∗ i } as v ∗ i := max { v ∈ V \ T i − : T i − ∪ v ∈I} f ( v | T i − ) . For each point v ∈ V , denote with X v an indicator function such that X v = 1 if v is sampled as part of a randomfeasible sequence { a , . . . , a η } , and X v = 0 otherwise. We also consider a sequence { O i } ni =0 of sets O i ⊆ V definedrecursively as• O := ∅ ;• if v i ∈ Ω , then O i ⊆ opt \ ( T i − ∪ i − j =0 O j ) is a set of minimum size such that ( opt \ ( ∪ ij =0 O j )) ∪ ( T i − ∪ v i ) ∈ I ;• if v i / ∈ Ω , X v i = 1 , and v i ∈ opt \ ( ∪ i − j =0 O j ) , then O i = { v i } ;• if v i / ∈ Ω and v i / ∈ opt \ ( ∪ i − j =0 O j ) , or if X v i = 0 , then O i = ∅ .Finally, we define the set O := ( opt \ ( ∪ ni =0 O i )) ∪ T n = ( opt \ ( ∪ ni =0 O i )) ∪ Ω .Following this notation, we first prove the following lemma. Lemma 12.
Fix all random decisions of Algorithm 2. Then it holds f (Ω) + | O \ Ω | δ ≥ f (Ω ∪ opt ) − n (cid:88) i =0 | O i \ Ω | f ( v ∗ i | T i − } ) . daptive Sampling for Fast Constrained Maximization of Submodular Functions Proof.
First, we prove that it holds f ( S ) + | O \ S | δ ≥ f ( O ) . (5)To this end, note that since ( opt \ ( ∪ ni =0 O i )) ∪ Ω ∈ I , then it holds { v } ∪ Ω ∈ I for all v ∈ O \ Ω . Hence, by thetermination criterion of Algorithm 2, we have that f ( v | Ω) ≤ δ . f ( O ) ≤ f (Ω) + (cid:88) v ∈ O \ Ω f ( v | Ω) ≤ f (Ω) + | O \ Ω | δ , where we have used submodularity. Then (5) follows.Next, we prove that it holds f ( O ) ≥ f (Ω ∪ opt ) − n (cid:88) i =1 | O i \ Ω | f ( v ∗ i | T i − ) . (6)Note that the claim of this lemma follows by combining (5) and (6).To prove (6), we first observe that the sets O i \ Ω are mutually disjoint, and we can write O = (Ω ∪ opt ) \ ( ∪ ni =1 ( O i \ Ω)) . Using this equality, we have that f ( O ) = f (Ω ∪ opt ) − n (cid:88) i =1 f ( O i \ Ω | (Ω ∪ opt ) \ ( ∪ i − j =0 ( O j \ Ω))) ≥ f (Ω ∪ opt ) − n (cid:88) i =1 f ( O i \ Ω | T i − ) ≥ f (Ω ∪ opt ) − n (cid:88) i =1 (cid:88) v ∈ O \ Ω f ( v | T i − ) , where the first equation is the telescopic sum, the second one uses submodularity, together with the fact that T i − ⊆ Ω , and the third one use submodularity again. Then (6) follows from the definition of v ∗ i .Next, using Lemma 12 we prove the following result. Lemma 13.
It holds E v i [ | O i \ Ω | f ( v ∗ i | T i − ) ] ≤ (1 − ε ) pp + 1 E v i [ X v i f ( v i | T i − ) ] . Proof.
We first observe that if X i = 0 , then the claim holds since | O i \ Ω | = ∅ . Hence, we prove the claim byconditioning on the event {X v i = 1 } . In this case, the point v i is added to Ω w.p. ( p + 1) − .If v i is not added to the current solution, then | O i | ≤ , and conditioning on the event { v i / ∈ Ω } we get E a i [ | O i \ Ω | f ( v ∗ i | T i − ) | X v i = 1 , v i / ∈ Ω ] ≤ E a i [ X v i f ( v ∗ i | T i − ) | X v i = 1 , v i / ∈ Ω ] ≤ pp + 1 E a i [ X v i f ( v ∗ i | T i − ) | X v i = 1 ] , where we have used that Pr ( a i / ∈ S ) = 1 − ( p + 1) − . Furthermore, if the solution v i is added to Ω , then the set O i has size at most | O i | ≤ p , since I is a p -extendible system. Hence, E a i [ | O i \ Ω | f ( v ∗ i | T i − ) | X v i = 1 , v i ∈ Ω ] ≤ p E a i [ X v i f ( v ∗ i | T i − ) | X v i = 1 , v i ∈ Ω ] ≤ pp + 1 E a i [ X v i f ( v ∗ i | T i − ) | X v i = 1 ] . The claim follows combining the two chains of inequalities above, together with Lemma 6 and Lemma 7.We also need the following lemma, to prove the main theorem. rancesco Quinzan, Vanja Doskoč, Andreas Göbel, Tobias Friedrich
Lemma 14.
It holds n (cid:88) i =1 E [ | O i \ Ω | f ( v ∗ i | T i − ) ] ≤ p E [ f (Ω) ] . Proof.
For each v i ∈ V , let G v i be a random variable whose value is equal to in the increase in the value of thecurrent solution when added to it. Note that if v i yields X v i = 0 , then G v i = 0 because it cannot be added to thecurrent solution. Then, E v i [ G v i ] = Pr ( v i ∈ Ω) E v i [ X v i f ( v i | T i − ) ] = 1 p + 1 E a i [ X v i f ( v i | T i − ) ] . (7)The claim follows using Lemma 13, and using the law of total probability and linearity of the expected value. G.3 Proof of Theorem 3.
We now have all necessary tools to prove the main theorem.
Proof of Theorem 3.
Combining Lemma 12, taking the expected value, and combining with Lemma 14 we get p + 1(1 − ε ) E [ f (Ω) ] + E [ | O \ Ω | δ ] ≥ E [ f (Ω ∪ opt ) ] . (8)Denote with { a i } i the points of Ω in the order that they were added to Ω . From Lemma 7 and submodularity, wehave that E a i [ f ( a i | { a , . . . , a i − } ) ] ≥ , for all points a i added to Ω j . It follows that E [ f (Ω j ) ] ≥ E [ f ( a ) ] ,with a the first point added to Ω j . Hence, from the definition of δ , and since | O \ Ω | ≤ r due to feasibility, weget ε E [ f (Ω) ] ≥ E [ | O \ Ω | δ ] . Substituting in (8) we get ( p + 1)(1 + ε )(1 − ε ) E [ f (Ω) ] ≥ E [ f (Ω ∪ opt ) ] . To conclude the proof, we observe that the function g ( S ) = f ( S ∪ opt ) is a submodular function. Since eachelement of V appears in Ω w.p. at most ( p + 1) − , then by Lemma 11 we get E [ f (Ω ∪ opt ) ] = E [ g (Ω) ] ≥ pp + 1 g ( ∅ ) = pp + 1 f ( opt ) ..