[PDF] Maximum Coverage in the Data Stream Model: Parameterized and Generalized

Abstract

We present algorithms for the Max-Cover and Max-Unique-Cover problems in the data stream model. The input to both problems are m subsets of a universe of size n and a value k\in [m]. In Max-Cover, the problem is to find a collection of at most k sets such that the number of elements covered by at least one set is maximized. In Max-Unique-Cover, the problem is to find a collection of at most k sets such that the number of elements covered by exactly one set is maximized. Our goal is to design single-pass algorithms that use space that is sublinear in the input size. Our main algorithmic results are: If the sets have size at most d, there exist single-pass algorithms using \tilde{O}(d^{d+1} k^d) space that solve both problems exactly. This is optimal up to polylogarithmic factors for constant d. If each element appears in at most r sets, we present single pass algorithms using \tilde{O}(k^2 r/\epsilon^3) space that return a 1+\epsilon approximation in the case of Max-Cover. We also present a single-pass algorithm using slightly more memory, i.e., \tilde{O}(k^3 r/\epsilon^{4}) space, that 1+\epsilon approximates Max-Unique-Cover. In contrast to the above results, when d and r are arbitrary, any constant pass 1+\epsilon approximation algorithm for either problem requires \Omega(\epsilon^{-2}m) space but a single pass O(\epsilon^{-2}mk) space algorithm exists. In fact any constant-pass algorithm with an approximation better than e/(e-1) and e^{1-1/k} for Max-Cover and Max-Unique-Cover respectively requires \Omega(m/k^2) space when d and r are unrestricted. En route, we also obtain an algorithm for a parameterized version of the streaming Set-Cover problem.

Full PDF

MMaximum Coverage in the Data Stream Model:Parameterized and Generalized

Andrew McGregor

University of Massachusetts, [email protected]

David Tench

Stony Brook [email protected]

Hoa T. Vu

San Diego State [email protected]

Abstract

We present algorithms for the

Max Coverage and

Max Unique Coverage problems in the data streammodel. The input to both problems are m subsets of a universe of size n and a value k ∈ [ m ].In Max Coverage , the problem is to ﬁnd a collection of at most k sets such that the number ofelements covered by at least one set is maximized. In Max Unique Coverage , the problem is toﬁnd a collection of at most k sets such that the number of elements covered by exactly one set ismaximized. These problems are closely related to a range of graph problems including matching,partial vertex cover, and capacitated maximum cut. In the data stream model, we assume k is givenand the sets are revealed online. Our goal is to design single-pass algorithms that use space that issublinear in the input size. Our main algorithmic results are:If the sets have size at most d , there exist single-pass algorithms using O ( d d +1 k d ) space thatsolve both problems exactly. This is optimal up to polylogarithmic factors for constant d .If each element appears in at most r sets, we present single pass algorithms using ˜ O ( k r/(cid:15) )space that return a 1 + (cid:15) approximation in the case of Max Coverage . We also present a single-pass algorithm using slightly more memory, i.e., ˜ O ( k r/(cid:15) ) space, that 1 + (cid:15) approximates Max Unique Coverage .In contrast to the above results, when d and r are arbitrary, any constant pass 1 + (cid:15) approximationalgorithm for either problem requires Ω( (cid:15) − m ) space but a single pass O ( (cid:15) − mk ) space algorithmexists. In fact any constant-pass algorithm with an approximation better than e/ ( e −

1) and e − /k for Max Coverage and

Max Unique Coverage respectively requires Ω( m/k ) space when d and r areunrestricted. En route, we also obtain an algorithm for a parameterized version of the streaming Set Cover problem.

Theory of computation → Sketching and sampling; Theory ofcomputation → Approximation algorithms analysis; Theory of computation → Parameterizedcomplexity and exact algorithms

Keywords and phrases

Data streams, maximum coverage, maximum unique coverage, set cover

Funding

This work was partially supported by NSF grants CCF-1934846, CCF-1908849, andCCF-1637536.

Problem Description.

We consider the

Max Coverage and

NP-Hard problem that has a wide range of applications includ-ing facility and sensor allocation [52], information retrieval [5], inﬂuence maximization inmarketing strategy design [48], and the blog monitoring problem [64]. It is well-known thatthe greedy algorithm, which greedily picks the set that covers the most number of uncoveredelements, is a e/ ( e −

1) approximation and that unless P = NP , this approximation factor isthe best possible in polynomial time [30]. Max Unique Coverage was ﬁrst studied in the oﬄine setting by Demaine et al. [25]. Amotivating application for this problem was in the design of wireless networks where wewant to place base stations that cover mobile clients. Each station could cover multipleclients but unless a client is covered by a unique station the client would experience toomuch interference. Demaine et al. [25] gave a polynomial time O (log k ) approximation.Furthermore, they showed that Max Unique Coverage is hard to approximate within afactor O (log σ n ) for some constant σ under reasonable complexity assumptions. Erlebachand van Leeuwen [29] and Ito et al. [40] considered a geometric variant of the problem andMisra et al. [62] considered the parameterized complexity of the problem. This problem isalso closely related to Minimum Membership Set Cover where one has to cover every elementand minimizes the maximum overlap on any element [26, 53].In the streaming set model, Max Coverage and the related

Set Cover problem have bothreceived a signiﬁcant amount of attention [7, 15, 27, 36, 38, 39, 61, 64]. The most relevant resultis a single-pass 2 + (cid:15) approximation using ˜ O ( k(cid:15) − ) space [8,61] although better approximationis possible in a similar amount of space if multiple passes are permitted [61] or if the stream israndomly ordered [2, 63]. In this paper, we almost exclusively consider single-pass algorithmswhere the sets arrive in an arbitrary order.The unique coverage problem has not been studied in the data stream model although it,and Max Coverage , are closely related to various graph problems that have been studied.

Relationship to Graph Streaming.

There are two main variants of the graph stream model.In the arbitrary order model , the stream consists of the edges of the graph in arbitrary order.In the adjacency list model , all edges that include the same node are grouped together. Bothmodels generalize naturally to hypergraphs where each edge could consists of more than twonodes. The arbitary order model has been more heavily studied than the adjacency list modelbut there has still been a signiﬁcant amount of work in the latter model [6,7,11,36,42,50,57–59].For further details, see a recent survey on work on the graph stream model [56].To explore the relationship between

Max Coverage and

Max Unique Coverage and vari-ous graph stream problems, it makes sense to introduce to additional parameters beyond m (the number of sets) and n (the size of the universe). Speciﬁcally, throughout the paper welet d denote the maximum cardinality of a set in the input and let r denote the maximummultiplicity of an element in the universe where the multiplicity is the number of sets anelement appears. Then an input to

Max Coverage and

Max Unique Coverage can deﬁne a(hyper)graph in one of the following two natural ways: First Interpretation:

A sequence of (hyper-)edges on a graph with n nodes of maximum That is, ﬁnd the minimum number of sets that cover the entire universe. Note that d and r are dual parameters in the sense that if the input is { S , . . . , S m } and we deﬁne T i = { j : i ∈ S j } then d = max j | S j | and r = max i | T i | . . McGregor, D. Tench, and H. T. Vu 3 degree r (where the degree of a node v corresponds to how many hyperedges include thatnode) and m hyperedges where each hyperedge has size at most d . In the case whereevery set has size d = 2, the hypergraph is an ordinary graph , i.e., a graph where everyedge just has two endpoints. With this interpretation, the graph is being presented inthe arbitrary order model. Second Interpretation:

A sequence of adjacency lists (where the adjacency list for a givennode includes all the hyperedges that include that node) on a graph with m nodes ofmaximum degree d and n hyperedges of maximum size r . In this interpretation, if everyelement appears in exactly r = 2 sets, then this corresponds to an ordinary graph whereeach element corresponds to an edge and each set corresponds to a node. With thisinterpretation, the graph is being presented in the adjacency list model.Under the ﬁrst interpretation, the Max Coverage problem and the

Max Unique Coverage problem when all sets have size exactly 2 naturally generalize the problem of ﬁnding amaximum matching in an ordinary graph in the sense that if there exists a matching withat least k edges, the optimum solution to either Max Coverage and

Max Unique Coverage will be a matching. There is a large body of work on graph matchings in the data streammodel [3, 12, 23, 24, 28, 31, 34, 35, 43, 44, 49–51, 55, 66] including work speciﬁcally on solvingthe problem exactly if the matching size is bounded [18, 20]. More precisely,

Max Coverage corresponds to the partial vertex cover problem [54]: what is the maximum number ofedges that can be covered by selecting k nodes. For larger sets, the Max Coverage and

Max Unique Coverage are at least as hard as ﬁnding partial vertex covers and matching inhypergraphs.Under the second interpretation, when all elements have multiplicity 2, then the problem

Max Unique Coverage corresponds to ﬁnding the capacitated maximum cut, i.e., a set ofat most k vertices such that the number of edges with exactly one endpoint in this set ismaximized. In the oﬄine setting, Ageev and Sviridenko [1] and Gaur et al. [33] presented a2 approximation for this problem using linear programming and local search respectively.The (uncapacitated) maximum cut problem was been studied in the data stream model byKapralov et al. [45–47]; a 2-approximation is trivial in logarithmic space but improving onthis requires space that is polynomial in the size of the graph. The capacitated problem is aspecial case of the problem of maximizing a non-monotone sub-modular function subjectto a cardinality constraint. This general problem has been considered in the data streammodel [8, 13, 16, 37] but in that line of work it is assumed that there is oracle access to thefunction being optimized, e.g., given any set of nodes, the oracle will return the number ofedges cut. Alaluf et al. [4] presented a 2+ (cid:15) approximation in this setting, assuming exponentialpost-processing time. In contrast, our algorithm does not assume an oracle while obtaining a1 + (cid:15) approximation (and also works for the more general problem Max Unique Coverage ). Our main results are the following single-pass streaming algorithms : (A) Bounded Set Cardinality. If all sets have size at most d , there exists a ˜ O ( d d +1 k d ) spacedata stream algorithm that solves Max Unique Coverage and

Max Coverage exactly. Weshow that this is nearly optimal in the sense that any exact algorithm requires Ω( k d )space for constant d . It suﬃces to count the number of edges M since there is always a cut whose size is at least M/ Throughout we use ˜ O to denote that logarithmic factors of m and n are being omitted. (B) Bounded Multiplicity. If every element appears in at most r sets, we present the fol-lowing algorithms:(B1) Max Unique Coverage : There exists a 1 + (cid:15) approximation using ˜ O ( (cid:15) − k r )space.(B2) Max Coverage : There exists a 1 + (cid:15) approximation algorithm using ˜ O ( (cid:15) − k r )space.In contrast to the above results, when d and r are arbitrary, any constant pass 1 + (cid:15) approximation algorithm for either problem requires Ω( (cid:15) − m ) space [6]. We also generalize oflower bound for

Max Coverage [61] to

Max Unique Coverage to show that any constant-passalgorithm with an approximation better than e − /k requires Ω( m/k ) space. We also presenta single-pass algorithm with an O (log min( k, r )) approximation for Max Unique Coverage using ˜ O ( k ) space, i.e., the space is independent of r and d but the approximation factordepends on r . This algorithm is a simple combination of a Max Coverage algorithm dueto McGregor and Vu [61] and an algorithm for

Max Unique Coverage in the oﬄine settingdue to Demaine et al. [25]. Finally, our

Max Coverage result (B2) algorithm also yieldsa new multi-pass result for a parameterized version of the streaming

Set Cover problem.We will also show that results (A) and (B2) can also be made to handle stream deletions.The generalization for result (A) that we present requires space that scales with k d ratherthan k d . However, in subsequent work we have shown that space the scales with k d is alsosuﬃcient in the insert/delete setting. Technical Summary.

Our results are essentially streamable kernelization results, i.e., thealgorithm “prunes” the input (in the case of

Max Unique Coverage and

Max Coverage thiscorresponds to ignoring some of the input sets) to produce a “kernel” in such a way that a)solving the problem optimally on the kernel yields a solution that is as good (or almost asgood) as the optimal solution on the original input and b) the kernel can be constructedin the data stream model and is suﬃciently smaller than the original input such that it ispossible to ﬁnd an optimal solution for the kernel in signiﬁcantly less time than it wouldtake to solve on the original input. In the ﬁeld of ﬁxed parameter tractability, the mainrequirement is that the kernel can be produced in polynomial time. In the growing bodyof work on streaming kernelization [17–19] the main requirement is that the kernel can beconstructed using small space in the data stream model. Our results ﬁts in with this lineof work and the analysis requires numerous combinatorial insights into the structure of theoptimum solution for

Max Unique Coverage and

Max Coverage .Our technical contributions can be outlined as follows.Result (A) relies on a key combinatorial lemma. This lemma provides a rule to discardsets such that there is an optimum solution that does not contain any of the discardedsets. Furthermore, the number of stored sets can be bounded in terms of k and d .Result (B1) uses the observation that each set of any optimal solution intersects somemaximal collection of disjoint sets. The main technical step is to demonstrate that storinga small number of intersecting sets, in terms of k and r , suﬃces to preserve the optimalsolution. The lower bound result by Assadi [6] was for the case of

Max Coverage but we will explain that it alsoapplies in the case of

Max Unique Coverage . . McGregor, D. Tench, and H. T. Vu 5 Result (B2) is based on a very simple idea of ﬁrst collecting the largest O ( rk/(cid:15) ) sets andthen solving the problem optimally on these sets. This can be done in a space eﬃcientmanner using existing sketch for F estimation in the case of Max Coverage . While theapproach is simple, showing that it yields the required approximations requires somework that builds on a recent result by Manurangsi [54]. We also extend the algorithm tothe model where sets can be inserted and deleted.

Comparison to Related Work.

In the context of streaming algorithms, for the

Max Coverage problem, McGregor and Vu [60] showed that any approximation better than e/ ( e −

1) requiresΩ( m/k ) space. For the more general problem of streaming submodular maximization subjectto a cardinality constraint, Feldman et al. [32] very recently showed a stronger lower boundthat any approximation better than 2 requires Ω( m ) space. Our results provide a route tocircumvent these bounds via parameterization on k, r, and d .Result (B2) also leads to a parameterized algorithm for streaming Set Cover . This newalgorithm uses ˜ O ( rk n δ + n ) space which improves upon the algorithm by Har-Peled etal. [36] that uses ˜ O ( mn δ + n ) space, where k is an upper bound for the size of the minimumset cover, in the case rk (cid:28) m . Both algorithms use O (1 /δ ) passes and yield an O (1 /δ )approximation.In the context of oﬄine parameterized algorithms, Bonnet et al. [10] showed that Max Coverage is ﬁxed-parameter tractable in terms of k and d . However, their branching-search algorithm cannot be implemented in the streaming setting. Misra et al. [62] showedthat the maximum unique coverage problem in which the aim is to maximize the number ofuniquely covered elements u (without any restriction on the number of sets) admits a kernelof size 4 u . On the other hand, they showed that the budgeted version of this problem (whereeach element has a proﬁt and each set has a cost and the goal is maximize the proﬁt subjectto a budget constraint) is W [1]-hard when parameterized by the budget . In this context,our result shows that a parameterization on both the maximum set size d and the budget k is possible (at least when all costs and proﬁts are unit). Throughout the paper, m will denote the number of sets, n will denote the size of the universe,and k will denote the maximum number of sets that can be used in the solution. Given inputsets S , S , . . . , S m ⊆ [ n ], let d = max i | S i | be the maximum set size and let r = max j |{ i : j ∈ S i }| be the maximum number of sets that contain the same element.Suppose C is a collection of sets. We let F ( C ) (and G ( C )) be the set of elements covered(and uniquely covered) by an optimal solution in C . Furthermore, let f ( C ) = | F ( C ) | and In the

Max Unique Coverage problem that we consider, all costs and proﬁts are one and the budget is k . g ( C ) = | G ( C ) | . In other words, f ( C ) is the maximum number of elements that can becovered by k sets. Similarly, g ( C ) is the maximum number of elements that can be uniquelycovered by k sets. Furthermore, let ψ ( C ) and ˜ ψ ( C ) be the set of elements covered anduniquely covered respectively by the sets in C .To ease the notation, if C is a collection of set and S is a set, we often use C − S todenote C \ { S } and C + S to denote C ∪ { S } .We use M to denote the collection of all sets in the stream. Therefore, the optimal valueto Max Coverage and

Max Unique Coverage are f ( M ) and g ( M ) respectively.Throughout this paper, we say an algorithm is correct with high probability if theprobability of failure is inversely polynomial in m . Coverage Sketch.

Given a vector x ∈ R n , F ( x ) is deﬁned as the number of elementsof x which are non-zero. If given a subset S ⊂ { , . . . , n } , we deﬁne x S ∈ { , } n to bethe characteristic vector of S (i.e., x i = 1 iﬀ i ∈ S ) then given sets S , S , . . . note that F ( x S + x S + . . . ) is exactly the number of elements covered by S ∪ S ∪ . . . . We will usethe following result for estimating F . (cid:73) Theorem 1 ( F Sketch [9, 21]) . Given a set S ⊆ [ n ] , there exists an ˜ O ( (cid:15) − log δ − ) -spacealgorithm that constructs a data structure M ( S ) (called an F sketch of S ). The sketch hasthe property that the number of distinct elements in a collection of sets S , S , . . . , S t can beapproximated up to a (cid:15) factor with probability at least − δ provided the collection of F sketches M ( S ) , M ( S ) , . . . , M ( S t ) . Note that if we set δ (cid:28) / (poly( m ) · (cid:0) tk (cid:1) ) in the above result we can try each collection of k sets amongst S , S , . . . , S t and get a 1 + (cid:15) approximation for the coverage of each collectionwith high probability. Unique Coverage Sketch.

For unique coverage, our sketch of a set corresponds to sub-sampling the universe via some hash function h : [ n ] → { , } where h is chosen randomlysuch that for each i , Pr [ h ( i ) = 1] = p for some appropriate value p . Speciﬁcally, ratherprocessing an input set S , we process S = { i ∈ S : h ( i ) = 1 } . Note that | S | has size p | S | inexpectation. This approach was use by McGregor and Vu [61] in the context of Max Coverage and it extends easily to

Max Unique Coverage ; see Section 7. The consequence is that ifthere is a streaming algorithm that ﬁnds a t approximation, we can turn that algorithm intoa t (1 + (cid:15) ) approximation algorithm in which we can assume that OPT = O ( (cid:15) − k log m ) withhigh probability by running the algorithm on a subsampled sets rather than the original sets.Note that this also allows us to assume input sets have size O ( (cid:15) − k log m ) since | S | ≤ OPT.Hence each “sketches” set can be stored using B = O ( (cid:15) − k log m log n ) bits. An Algorithm with ˜ O ( (cid:15) − mk ) Memory.

We will use the above sketches in a more interestingcontext later in the paper, but note that they immediately imply a trivial algorithmic result.Consider the naive algorithm that stores every set and ﬁnds the best solution; note thatthis requires exponential time. We note that since we can assume OPT = O ( (cid:15) − k log m ),each set has size at most O ( (cid:15) − k log m ). Hence, we need ˜ O ( (cid:15) − mk ) memory to store all thesets. This approach was noted in [61] in the context of Max Coverage but also apples to

Max Unique Coverage . We will later show that for a 1 + (cid:15) approximation, the above trivialalgorithm is optimal up to polylogarithmic factors for constant k . . McGregor, D. Tench, and H. T. Vu 7 Algorithm.

Our algorithm, though perhaps non-intuitive, is simple to state: Initialize X to be an empty collection of sets. Let b = d ( k − Let X a be the sub-collection of X that contains sets of size a . For each set S in the stream: Suppose | S | = a . Add S to X if there does not exist T ⊆ S that occurs as a subset of ( b + 1) d −| T | sets of X a . Post-processing: Return the best solution C in X . Analysis.

Our algorithm relies on the following combinatorial lemma. (cid:73)

Lemma 2.

Let W = { S , S , . . . } be a collection of distinct sets where each S i ⊆ [ n ] and | S i | = a . Suppose for all T ⊆ ψ ( W ) with | T | ≤ a there exist at most ‘ | T | := ( b + 1) a −| T | sets in W that contain T . Furthermore, suppose there exists a set T ∗ such that this inequalityis tight. Then, for all B ⊆ ψ ( W ) disjoint from T ∗ with | B | ≤ b there exists a set Y ∈ W such that T ∗ ⊆ Y and | Y ∩ B | = 0 . Proof. If | T ∗ | = a then T ∗ ∈ W , then we can simply set Y = T ∗ . Henceforth, assume | T ∗ | < a . Consider the ‘ | T ∗ | sets in W that are supersets of T ∗ . Call this collection W . Forany x ∈ B , there are at most ‘ | T ∗ | +1 sets that include T ∗ ∪ { x } . Since there are b choices for x , at most b‘ | T ∗ | +1 = b ( b + 1) a −| T ∗ |− < ( b + 1) a −| T ∗ | = ‘ | T ∗ | sets in W contain an element in B . Hence, at least one set Y in W does not contain anyelement in B . (cid:74) We show that the algorithm indeed obtains an exact kernel for the problems. Recall that M is the collection of all sets in the stream, i.e., the optimal solution has size f ( M ). (cid:73) Theorem 3.

The output of the algorithm is optimal. In particular, f ( C ) = f ( M ) and g ( C ) = g ( M ) . Proof.

Recall that X is the collection of all stored sets. We deﬁne C i = M \ { the ﬁrst i sets in the stream that are not stored in X } . Clearly, f ( C ) = f ( M ). Now, suppose there exists i ≥ f ( C i ) < f ( M ). Let i be the smallest such index. Let O be an optimal solution of C i − (note that O is also anoverall optimal solution based on the minimal assumption on i ). Let S be the i th set thatwas not stored in X . If S / ∈ O then we have a contradiction since f ( C i ) = f ( C i − ) = f ( M ).Thus, assume S ∈ O . Suppose | S | = a . (cid:66) Claim 4.

There exists Y in X a such that f ( O − S + Y ) ≥ f ( O ). Proof.

Note that S was not stored because there existed T ∗ ⊆ S such that T ∗ was a subsetof ( b + 1) d −| T ∗ | sets in X a . Consider the set B = ψ ( O ) \ S . Clearly, B ∩ T ∗ = ∅ and | B | ≤ d ( k − Y in X a such that Y ∩ B = ∅ . Let Y = Y \ S and S = S \ Y. Note that | Y | = | S | since | Y | = | S | . Deﬁne indicatorvariables α z = 1 iﬀ z ∈ ψ ( O − S + Y ) and β z = 1 iﬀ z ∈ ψ ( O ). Note that( z ∈ Y ∩ S or z Y ∪ S ) = ⇒ ( α z = β z ) , ( z ∈ Y ) = ⇒ ( α z = 1) , ( z ∈ Y ) = ⇒ ( β z = 0) , where the last equation uses the fact that Y is disjoint from ψ ( O ). Then | ψ ( O − S + Y ) | = X z ∈ Y α z + X z ∈ Y ∩ S α z + X z ∈ S α z + X z Y ∪ S α z ≥ | Y | + X z ∈ Y β z ! + X z ∈ Y ∩ S β z + −| S | + X z ∈ S β z ! + X z Y ∪ S β z = X z ∈ Y β z + X z ∈ Y ∩ S β z + X z ∈ S β z + X z Y ∪ S β z = | ψ ( O ) | . (cid:74) Thus, f ( C i ) ≥ f ( O ) = f ( M ) which is a contradiction. Hence, there is no such i and theclaim follows. The proof for unique coverage is almost identical: for the analogous claim wedeﬁne indicator variables ˜ α z = 1 iﬀ z ∈ ˜ ψ ( O − S + Y ) and ˜ β z = 1 iﬀ z ∈ ˜ ψ ( O ). The proofgoes through with α and β replaced by ˜ α and ˜ β since it is still the case that( z ∈ Y ∩ S or z Y ∪ S ) = ⇒ (cid:0) ˜ α z = ˜ β z (cid:1) , ( z ∈ Y ) = ⇒ (˜ α z = 1) , ( z ∈ Y ) = ⇒ (cid:0) ˜ β z = 0 (cid:1) , where now the last two equations use the fact that Y is disjoint from ψ ( O ). (cid:74)(cid:73) Lemma 5.

The space used by the algorithm is ˜ O ( d d +1 k d ) . Proof.

Recall that one of the requirements for a set S to be added to X is that the numberof sets in X | S | that are supersets of any subset of S of size t is at most ( b + 1) d − t . Thisincludes the empty subset and since every set in X | S | is a superset of the empty set, wededuce that | X | S | | ≤ ( b + 1) d = O (( dk ) d ). Since each set needs ˜ O ( d ) bits to store, and | X | = P da =1 | X a | ≤ O ( d d k d ), the total space is ˜ O ( d d +1 k d ). (cid:74) We summarize the above as a theorem. (cid:73)

Theorem 6.

There exist deterministic single-pass algorithms using ˜ O ( k d d d +1 ) space thatyields an exact solution to Max Coverage and

Max Unique Coverage . Handling Insertion-Deletion Streams.

We outline another exact algorithm that works forinsertion-deletion streams, however with a worse space bound ˜ O (( kd ) d ), in Section 6.1. (cid:73) Theorem 7.

There exist randomized single-pass algorithms using ˜ O ( d d k d ) space and allow-ing deletions that w.h.p. yield an exact solution to Max Coverage and

Max Unique Coverage . This improves upon our earlier result in the ICDT version of the paper that uses ˜ O (( dk ) d ) space. . McGregor, D. Tench, and H. T. Vu 9 In this section, we present a variety of diﬀerent approximation algorithms where the spaceused by the algorithm is independent of d but, in some cases, may depend on r . The ﬁrstalgorithm uses ˜ O ( (cid:15) − k r ) memory and obtains a 1 + (cid:15) approximation to both problems.The second algorithm uses ˜ O ( (cid:15) − k r ) memory and obtains a 1 + (cid:15) approximation to MaxCoverage and a 2 + (cid:15) approximation to

Max Unique Coverage ; it can also be extended tostreams with deletions. (cid:15)

Approximation

Given a collection of sets C = { S , S , . . . , S m } , we say a sub-collection C ⊂ C is a matching if the sets in C are mutually disjoint. C is a maximal matching if there does not exist S ∈ C \ C such that S is disjoint from all sets in C . (cid:73) Lemma 8.

For any input C , let O ⊂ C be an optimal solution for either the Max Coverage or Max Unique Coverage problem. Let M i be a maximal matching amongst the input set ofsize i . Then every set of size i in O intersects with some set in M i . Proof.

Let S ∈ O have size i . If it was disjoint from all sets in M i then it could be added to M i and the resulting collection would still be a matching. This violates the assumption that M i is maximal. (cid:74) The next lemma extends the above result to show that we can potentially remove manysets from each M i and still argue that there is an optimal solution for the original instanceamongst the sets that intersect a set in some M i . (cid:73) Lemma 9.

Consider an input of sets of size at most d . For i ∈ [ d ] , let M i be a maximalmatching amongst the input set of size i and let M i be an arbitrary subset of M i of size min( k + dk, | M i | ) . Let D i be the collection of all sets that intersect a set in M i . Then S i ( D i ∪ M i ) contains an optimal solution to both the Max Unique Coverage and

Max Coverage problem.

Proof. If | M i | = | M i | for all 1 ≤ i ≤ d then the result follows from Lemma 8. If not, let j = max { i ∈ [ d ] : | M i | > | M i |} . Let O be an optimal solution and let O i be all the sets in O of size i . We know that every set in O d ∪ O d − ∪ . . . ∪ O j +1 is in [ i ≥ j +1 ( D i ∪ M i ) = [ i ≥ j +1 ( D i ∪ M i ) . Hence, the number of elements (uniquely) covered by O is at most the number of elements(uniquely) covered by O d ∪O d − ∪ . . . ∪O j +1 plus kj since every set in O j ∪ . . . ∪O (uniquely)covers at most j additional elements. But we can (uniquely) cover at least the number ofelements (uniquely) covered by O d ∪ O d − ∪ . . . ∪ O j +1 plus kj . This is because M j contains k + dk disjoint sets of size j and at least k + dk − kd = k of these are disjoint from all sets in O d ∪ O d − ∪ . . . ∪ O j +1 . Hence, there is a solution amongst S i ≥ j ( D i ∪ M i ) that is at leastas good as O and hence is also optimal. (cid:74) The above lemma suggests an exact algorithm that stores the sets in S i ( D i ∪ M i ) andﬁnd the optimum solution among these sets. In particular, we construct matchings of eachsize greedily up to the appropriate size and store all intersecting sets. Note that since eachelement belongs to at most r sets, the total space is ˜ O ( d kr ). Applying the sub-samplingframework, we have d ≤ OPT = O ( k/(cid:15) log m ) and the approximation factor becomes 1 + (cid:15) . (cid:73) Theorem 10.

There exists a randomized one-pass algorithm using ˜ O ( (cid:15) − k r ) space thatﬁnds a (cid:15) approximation to Max Unique Coverage and

Max Coverage . (cid:15) Approximation for Maximum Coverage

In this section, we generalize the approach of Manurangsi [54] and combine that with the F -sketching technique to obtain a 1 + (cid:15) approximation using ˜ O ( (cid:15) − k r ) space for maximumcoverage. This saves a factor k/(cid:15) and the generalized analysis might be of independentinterest. Let OPT = ψ ( O ) denote the optimal coverage of the input stream.Manurangsi [54] showed that for the maximum k -vertex cover problem, the Θ( k/(cid:15) ) verticeswith highest degrees form a 1 + (cid:15) approximation kernel for the maximum k vertex coverageproblem. That is, there exist k vertices among those that cover (1 − (cid:15) ) OPT edges. We nowconsider a set system in which an element belongs to at most r sets (this can also be viewedas a hypergraph where each set corresponds to a vertex and each element corresponds to ahyperedge; we then want to ﬁnd k vertices that touch as many hyperedges as possible).We begin with the following lemma that generalizes the aforementioned result in [54].We may assume that m > rk/(cid:15) since otherwise, we can store all the sets. (cid:73) Lemma 11.

Suppose m > d rk/(cid:15) e . Let K be the collection of d rk/(cid:15) e sets with largest sizes(tie-broken arbitrarily). There exist k sets in K that cover (1 − (cid:15) ) OPT elements. Proof.

Let O denote the collection of k sets in some optimal solution. Let O in = O ∩ K and O out = O \ K . We consider a random subset Z ⊂ K of size |O out | . We will show thatthe sets in Z ∪ O in cover (1 − (cid:15) ) OPT elements in expectation; this implies the claim.Let [ E ] denote the indicator variable for event E . We rewrite | ψ ( Z ∪ O in ) | = | ψ ( O in ) | + | ψ ( Z ) | − | ψ ( O in ) ∩ ψ ( Z ) | . Furthermore, the probability that we pick a set S in K to add to Z is p := |O out || K | ≤ kkr/(cid:15) = (cid:15)r . Next, we upper bound E (cid:2) | ψ ( O in ) ∩ ψ ( Z ) | (cid:3) . We haveE (cid:2) | ψ ( O in ) ∩ ψ ( Z ) | (cid:3) ≤ X u ∈ ψ ( O in ) X S ∈ K : u ∈ S Pr [ S ∈ Z ] ≤ X u ∈ ψ ( O in ) rp ≤ | ψ ( O in ) | · (cid:15) . We lower bound E [ | ψ ( Z ) | ] as follows.E [ | ψ ( Z ) | ] ≥ E  X S ∈ K  | S | [ S ∈ Z ] − X S ∈ K \{ S } | S ∩ S | [ S ∈ Z ∧ S ∈ Z ]  ≥ X S ∈ K  | S | p − X S ∈ K \{ S } | S ∩ S | p  ≥ X S ∈ K (cid:0) | S | p − ( r − | S | p (cid:1) ≥ p (1 − pr ) X S ∈ K | S | ≥ p (1 − (cid:15) ) X S ∈ K | S | . (1)In the above derivation, the second inequality follows from the observation thatPr [ S ∈ Z ∧ S ∈ Z ] ≤ p . The third inequality is because P S ∈ K \{ S } | S ∩ S | ≤ ( r − | S | since each element belongsto at most r sets. . McGregor, D. Tench, and H. T. Vu 11 For all S ∈ K , we must have | S | ≥ P Y ∈O out | Y ||O out | ≥ | ψ ( O out ) ||O out | . Thus,E [ | ψ ( Z ) | ] ≥ p (1 − (cid:15) ) | K | | ψ ( O out ) ||O out | = p (1 − (cid:15) ) | ψ ( O out ) | p = (1 − (cid:15) ) | ψ ( O out ) | . Putting it together,E (cid:2) | ψ ( Z ∪ O in ) | (cid:3) ≥ | ψ ( O in ) | + (1 − (cid:15) ) | ψ ( O out ) | − | ψ ( O in ) | · (cid:15) ≥ (1 − (cid:15) ) OPT . (cid:74) With the above lemma in mind, the following algorithm’s correctness is immediate. Store F -sketches of the d kr/(cid:15) e largest sets, where the failure probability of the sketchesis set to n ) ( mk ) . At the end of the stream, return the k sets with the largest coverage based on theestimates given by the F -sketches.We restate our result as a theorem. (cid:73) Theorem 12.

There exists a randomized one-pass, ˜ O ( k r/(cid:15) ) -space, algorithm that withhigh probability ﬁnds a (cid:15) approximation to Max Coverage . Obtaining a (cid:15) approximation to

Max Unique Coverage . We note that ﬁnding the bestsolution to

Max Unique Coverage in K will yield a 2 + (cid:15) approximation. This is a worseapproximation than that of the previous subsection. However, we save a factor of k/(cid:15) inmemory. Furthermore, this approach also allows us to handle streams with deletions.To see that we get a 2+ (cid:15) approximation to Max Unique Coverage . Note that g ( Z ∪O in ) ≥ (cid:0) g ( O in ) + g ( Z ) (cid:1) . Furthermore, a similar derivation shows E (cid:2) | ˜ ψ ( Z ) | (cid:3) ≥ (1 − (cid:15) ) | ˜ ψ ( O out ) | .Speciﬁcally, in the derivation in Eq. 1, we can simply replace ψ with ˜ ψ . This gives us g ( K ) ≥ (1 / − (cid:15) ) g ( O ). Extension to Insert/Delete Streams.

The result can be extended to the case where setsare inserted and deleted. For the full details, see Section 6.2. O (log min( k, r )) Approximation for Unique Coverage

We now present an algorithm whose space does not depend on r but the result comes at thecost of increasing the approximation factor to O (log(min( k, r ))). It also has the feature thatthe running time is polynomial in k in addition to being polynomial in m and n .The basic idea is as follows: We consider an existing algorithm that ﬁrst ﬁnds a 2.01approximation C to Max Coverage . It then ﬁnds the best solution of

Max Unique Coverage among the sets in C . (cid:73) Theorem 13.

There exists a randomized one-pass, ˜ O ( k ) -space, algorithm that with highprobability ﬁnds a O (log min( k, r )) approximation to Max Unique Coverage . Proof.

From previous work [8,61], we can ﬁnd a 2 .

01 approximation C to Max Coverage using˜ O ( k ) memory. Note that their algorithm maintains a collection C of k sets during the stream.Demaine et al. [25] proved that that if Q is the best solution to Max Unique Coverage among the sets in C , then Q is an O (log min( k, r )) approximation to Max Unique Coverage .In fact, they presented a polynomial time algorithm to ﬁnd Q from C such that the numberof uniquely covered elements is at leastΩ(1 / log k ) · | ψ ( C ) | ≥ Ω(1 / log k ) · / . · f ( M ) ≥ Ω(1 / log k ) · g ( M ) . Note that storing each set in C requires ˜ O ( d ) memory. Hence, the total memory is ˜ O ( kd ).Applying the sub-sampling framework, we obtain an ˜ O ( k ) memory algorithm. (cid:74) We parameterize the set cover problem as follows. Given a set system, either A) output a setcover of size αk if OPT ≤ k where α the approximation factor or B) correctly declare that aset cover of size k does not exist. (cid:73) Theorem 14.

For < δ < , there exists a randomized, O (1 /δ ) -pass, ˜ O ( rk n δ + n ) -space,algorithm that with high probability ﬁnds a O (1 /δ ) approximation to the parameterized SetCover problem.

Proof.

In each pass, we run the algorithm in Theorem 12 with parameters k and (cid:15) = 1 /n δ/ on the remaining uncovered elements. The space use is ˜ O ( rk n δ + n ). Here, we needadditional ˜ O ( n ) space to keep track of the remaining uncovered elements.Note that if OPT ≤ k , after each pass, the number of uncovered elements is reduced by afactor 1 /n δ/ . This is because if n is the number of uncovered elements at the beginningof a pass, then after that pass, we cover all but at most n /n δ/ of those elements. After i passes, the number of remaining uncovered elements is O ( n − iδ/ ); we therefore use at most O (1 /δ ) passes until we are done. At the end, we have a set cover of size O ( k/δ ).If after ω (1 /δ ) passes, there are still remaining uncovered elements, we declare that sucha solution does not exist. (cid:74) Our algorithm improves upon the algorithm by Har-Peled et al. [36] that uses ˜ O ( mn δ + n )space for when rk (cid:28) m . Both algorithms yield an O (1 /δ ) approximation and use O (1 /δ )passes. As observed earlier, any exact algorithm for either the

Max Coverage or Max Unique Coverage problem on an input where all sets have size d will return a matching of size k if one exists.However, by a lower bound due to Chitnis et al. [18] we know that determining if there existsa matching of size k in a single pass requires Ω( k d ) space. This immediately implies thefollowing theorem. (cid:73) Theorem 15.

Any single-pass algorithm that solves

Max Coverage or Max Unique Coverage exactly with probability at least / requires Ω( k d ) space. e − /k approximation The strategy is similar to previous work on

Max Coverage [60, 61]. However, we need toargue that the relevant probabilistic construction works for all collections of fewer than k sets since the unique coverage function is not monotone. . McGregor, D. Tench, and H. T. Vu 13 We make a reduction from the communication problem k -player set disjointness, denotedby DISJ ( m, k ). In this problem, there are k players where the i th player has a set S i ⊆ [ m ].It is promised that exactly one of the following two cases happens a) NO instance: All thesets are pairwise disjoint and b) YES instance: There is a unique element v ∈ [ m ] such that v ∈ S i for all i ∈ [ k ] and all other elements belong to at most one set. The (randomized)communication complexity (in the one-way model or the blackboard model), for some largeenough constant success probability, of the above problem is Ω( m/k ) even if the players mayuse public randomness [14]. We can assume that | S ∪ S ∪ . . . ∪ S k | ≥ m/ (cid:73) Theorem 16.

Any constant-pass randomized algorithm with an approximation better than e − /k to Max Unique Coverage requires Ω( m/k ) space. Proof.

For each i ∈ [ m ], let P i be a random partition of [ n ] into k sets V i , . . . , V ik such thatan element in the universe U = [ n ] belongs to exactly one of these sets uniformly at random.In particular, for all i ∈ [ m ] and v ∈ U ,Pr (cid:2) v ∈ V ij ∧ ( ∀ j = j, v / ∈ V ij ) (cid:3) = 1 /k . The partitions are chosen independently using public randomness before receiving theinput. For each player j , if i ∈ S j , then they put V ij in the stream. Note that the streamconsists of Θ( m ) sets.If the input is a NO instance, then for each i ∈ [ m ], there is at most one set V ij in thestream. Therefore, for each element v ∈ [ n ] and any collection of ‘ ≤ k sets V i j , . . . , V i ‘ j ‘ inthe stream,Pr (cid:2) v is uniquely covered by V i j , . . . , V i ‘ j ‘ (cid:3) = ‘/k · (1 − /k ) ‘ − ≤ ‘/k · e − ( ‘ − /k . Therefore, in expectation, µ ‘ := E (cid:2) g ( { V i j , . . . , V i ‘ j ‘ } ) (cid:3) ≤ ‘/k · e − ( ‘ − /k n . By an applicationof Hoeﬀding’s inequality,Pr h g ( { V i j ∪ . . . ∪ V i ‘ j ‘ } ) > µ ‘ + (cid:15)e − ( k − /k · n i ≤ exp (cid:16) − (cid:15) e − ‘ − /k n (cid:17) ≤ exp (cid:0) − Ω( (cid:15) n ) (cid:1) ≤ m k . The last inequality follows by letting n = Ω( (cid:15) − k log m ). The following claim shows thatfor large k , in expectation, picking k sets is optimal in terms of unique coverage. (cid:73) Lemma 17.

The function g ( ‘ ) = ‘/k · e − ( ‘ − /k n is increasing in the interval ( −∞ , k ] anddecreasing in the interval [ k, + ∞ ) . Proof.

We take the partial derivative of g with respect to ‘∂g∂‘ = e (1 − ‘ ) /k ( k − ‘ ) k · n and observe that it is non-negative if and only if ‘ ≤ k . (cid:74) By appealing to the union bound over all (cid:0) m (cid:1) + . . . + (cid:0) mk − (cid:1) + (cid:0) mk (cid:1) ≤ O ( m k +1 ) possiblecollections ‘ ≤ k sets, we deduce that with high probability, for all collections of ‘ ≤ k sets S , . . . , S ‘ , g ( { S , . . . , S ‘ } ) ≤ µ ‘ + (cid:15)e − ( k − /k · n ≤ ‘/k · e − ( ‘ − /k n + (cid:15)e − ( k − /k · n ≤ (1 + (cid:15) ) e − /k n . If the input is a YES instance, then clearly, the maximum k -unique coverage is n . This isbecause there exists i such that i ∈ S ∩ . . . ∩ S k and therefore V i , . . . , V ik are in the streamand these sets uniquely cover all elements.Therefore, any constant pass algorithm that returns better than a e − /k / (1 + (cid:15) ) approx-imation to Max Unique Coverage for some large enough constant success probability impliesa protocol to solve

DISJ ( m, k ). Thus, Ω( m/k ) space is required. (cid:74) (cid:15) approximation Assadi [6] presents a Ω( m/(cid:15) ) lower bound for the space required to compute a 1 + (cid:15) approximation for Max Coverage when k = 2, even when the stream is in a random orderand the algorithm is permitted constant passes. This is proved via a reduction to multipleinstances of the Gap-Hamming Distance problem on a hard input distribution, where aninput with high maximum coverage corresponds to a YES answer for some Gap-HammingDistance instance, and a low maximum coverage corresponds to a NO answer for all GHDinstances. This hard distribution has the additional property that high maximum coverageinputs also have high maximum unique coverage, and low maximum coverage inputs havelow maximum unique coverage. Therefore, the following corollary holds: (cid:73) Corollary 18.

Any constant-pass randomized algorithm with an approximation factor (cid:15) for

Max Unique Coverage requires Ω( m/(cid:15) ) space. Consider coloring the elements of a universe with a 2-wise hash-function such that eachelement is equally likely to get one of c = 10 d k colors.We say a set has color P if the colors of its elements are all diﬀerent and form the set P .Then, via ‘ sampling [41], use ˜ O ( c d ) space to sample a set (if one exists) that is colored P (i.e., for each color in P there is exactly one element in the sampled set with this color) foreach subset P ⊆ { , , . . . , c } of size at most d . (cid:73) Deﬁnition 19.

Let C be a collection of at most k sets where each set have size at most d .Say a set S in C is good with respect to C if the elements of S receive diﬀerent colors andthey are all diﬀerent from the colors received by elements in ( ∪ S ∈ C S ) \ S . For any good set S in the collection, let r ( S ) be the set found by the sampling algorithmthat is colored the same as set S . We call r ( S ) the replacement for S . (cid:73) Lemma 20.

Removing sets S , S , . . . , S g that are good with respect to (w.r.t.) C from C and replacing them by r ( S ) , r ( S ) , . . . , r ( S g ) yields a new collection that (uniquely) coversat least the same number of elements as C . Proof.

Let R be the set of colors used to color elements in ∪ gi =1 S i and let R be the setof colors used to color elements in ( ∪ S ∈ C S ) \ ( ∪ gi =1 S i ). Because S , S , . . . , S g are goodsets, | R | = | ∪ gi =1 S i | and R ∩ R = ∅ . After replacing S , S , . . . , S g by r ( S ) , r ( S ) , . . . ,the multiplicity of an element with a color in R is unchanged. For any color in R , let e bethe element in ∪ gi =1 S i with this color. There will be at least one element with the same coloras e after the collection is transformed. It follows that the coverage of the collection does notdecrease: the removal of S , S , . . . , S g reduces the coverage by at most | ∪ gi =1 S i | but adding . McGregor, D. Tench, and H. T. Vu 15 r ( S ) , r ( S ) , . . . increases the coverage by at least | R | . To argue that the unique coverage ofthe collection does not decrease, note that if e had multiplicity 1 then the element with thesame color as e after the transformation also has multiplicity 1. (cid:74)(cid:73) Lemma 21.

For any C ⊆ C , Pr[ number of good sets in C is ≥ | C | / ≥ / . Proof.

First note that, a set is not good if one of its element shares a color with an elementin that set or in another set in the collection. By the union bound,Pr[set is not good] ≤ d ( dk ) /c = 1 / . Hence, for any subset C of C , E [number of bad sets in C ] ≤ | C | /

10 and the lemma followsvia Markov inequality. (cid:74)(cid:73)

Theorem 22.

After repeating the random coloring and sampling O (log k ) times, we havea collection of sets that includes the collection of size at most k that (uniquely) covers themaximum number of elements. Proof.

For the sake of analysis, let C be a collection of at most k sets with optimum (unique)coverage. Let C = C . Randomly color elements. Let C be the collection formed from C by replacing all setsin C that are good sets wrt C by their replacements. Remove all good sets (w.r.t. C )from C . Randomly color elements. Let C be the collection formed from C by replacing all setsin C that are good sets wrt C by their replacements. Remove all good sets (w.r.t. C )from C . . . . continue in this way for O (log k ) steps.In each step, the size of | C | decreases by a constant factor with constant probability byappealing to Lemma 21. Hence after O (log k ) steps | C | = 0. Note that the (unique) coverageof C O (log k ) is at least the (unique) coverage of C by Lemma 20. (cid:74) Noting that the O (log k ) colorings/sampling can be performed in parallel, we have a single-pass algorithm. We now explain how the approach using in Theorem 12 can be extended to the case wheresets may be inserted and deleted. In this setting, it is not immediately obvious how toselect the largest d rk/(cid:15) e sets; the approach used when sets are only inserted does not extend.Note that in this model we can set m to be the maximum number of sets that have beeninserted and not deleted at any preﬁx of the stream rather than the total number of setsinserted/deleted.However, we can extend the result as follows. Suppose the sketch of a set for ap-proximating maximum (unique) coverage requires B bits; recall from Section 2.2 that B = k(cid:15) − polylog( n, m ) suﬃces. We can encode such a sketch of a set S as an integer i ( S ) ∈ [2 B ]. Suppose we know that exactly d rk/(cid:15) e sets have size at least some threshold t .We will remove this assumption shortly. Consider the vector x ∈ [ N ] where N = 2 B that isinitially 0 and then is updated by a stream of set insertions/deletions as follows: When S is inserted, if | S | ≥ t , then x i ( S ) ← x i ( S ) + 1. When S is deleted, if | S | ≥ t , then x i ( S ) ← x i ( S ) − At the end of this process x ∈ { , , . . . , , m } B , ‘ ( x ) = d rk/(cid:15) e , and reconstruct the sketchesof largest ηk sets given x . Unfortunately, storing x explicitly in small space is not possiblesince, while we are promised that at the end of the stream ‘ ( x ) = d rk/(cid:15) e , during the streamit could be that x is an arbitrary binary string with m one’s and this requires Ω( m ) memoryto store. To get around this, it is suﬃcient to maintain a linear sketch of x itself that supportsparse recovery. For our purposes, the CountMin Sketch [22] is suﬃcient although otherapproaches are possible. The CountMin Sketch allows x to be reconstructed with probability1 − δ using a sketch of size O (log N + d rk/(cid:15) e log( d rk/(cid:15) e /δ ) log m ) = O ( d rk/(cid:15) e (cid:15) − polylog( n, m )) . To remove the assumption that we do not know t in advance, we consider values: t , t , . . . , t d log (cid:15) m e where t i = (1 + (cid:15) ) i . We deﬁne vector x , x , . . . ∈ { , , . . . , , m } B where x i is only updated when a set of size ≤ t i but > t i − is inserted/deleted. Then there exists i such that ≤ d rk/(cid:15) e sets have size ≤ t i − and the sketches of these sets can be reconstructed from x , . . . , x t i − . To ensure wehave d rk/(cid:15) e sets, we may need some additional sketches corresponding to sets of size > t i − and ≤ t i but unfortunately there could be m such sets and we are only guaranteed recoveryof x t i when it is sparse. However, if this is indeed the case we can still recover enough entriesof x t by ﬁrst subsampling the entries at the appropriate rate (we can guess sampling rate1 , / , / , . . . /m ) in the standard way. Note that we can keep track of ‘ ( x i ) exactly foreach i using O (log m ) space. Assuming we have v such that OPT / ≤ v ≤ OPT. Let h : [ n ] → { , } be a hash functionthat is Ω( (cid:15) − k log m )-wise independent. We run our algorithm on the subsampled universe U = { u ∈ U : h ( u ) = 1 } . Furthermore, letPr [ h ( u ) = 1] = p = ck log m(cid:15) v where c is some suﬃciently large constant. Let S = S ∩ U and let OPT be the optimalunique coverage value in the subsampled set system. The following result is from McGregorand Vu [61]. We note that the proof is the same except that the indicator variables nowcorrespond to the events that an element being uniquely covered (instead of being covered). (cid:73) Lemma 23.

With probability at least − / poly( m ) , we have that p OPT(1 + (cid:15) ) ≥ OPT ≥ p OPT(1 − (cid:15) ) Furthermore, if S , . . . , S k satisﬁes g ( { S , . . . , S k } ) ≥ p OPT(1 − (cid:15) ) /t then g ( { S , . . . , S k } ) ≥ OPT(1 /t − (cid:15) ) . We could guess v = 1 , , , . . . , n . One of the guesses must be between OPT / = O ( (cid:15) − k log m ). Furthermore, if we ﬁnd a 1 /t approximation on thesubsampled universe, then that corresponds to a 1 /t − (cid:15) approximation in the original . McGregor, D. Tench, and H. T. Vu 17 universe. We note that as long as v ≤ OPT and h is Ω( (cid:15) − k log m )-wise independent, wehave (see [65], Theorem 5):Pr [ g ( { S , . . . , S ‘ } ) = p · g ( { S , . . . , S ‘ } ) ± (cid:15)p OPT] ≥ − exp ( − Ω( k log m )) ≥ − /m Ω( k ) . This gives us Lemma 23 even for when v <

OPT /

2. However, if v ≤ OPT /

2, then OPT may be larger than O ( (cid:15) − k log m ), and we may use too much memory. To this end, wesimply terminate those instantiations. Among the instantiations that are not terminated, wereturn the solution given by the smallest guess. References Alexander A. Ageev and Maxim Sviridenko. Pipage rounding: A new method of constructingalgorithms with proven performance guarantee.

J. Comb. Optim. , 8(3):307–328, 2004. Shipra Agrawal, Mohammad Shadravan, and Cliﬀ Stein. Submodular secretary problemwith shortlists.

CoRR , abs/1809.05082, 2018. URL: http://arxiv.org/abs/1809.05082 , arXiv:1809.05082 . Kook Jin Ahn and Sudipto Guha. Linear programming in the semi-streaming model withapplication to the maximum matching problem.

Inf. Comput. , 222:59–79, 2013. URL: http://dx.doi.org/10.1016/j.ic.2012.10.006 , doi:10.1016/j.ic.2012.10.006 . Naor Alaluf, Alina Ene, Moran Feldman, Huy L. Nguyen, and Andrew Suh. Optimal streamingalgorithms for submodular maximization with cardinality constraints. In

ICALP , volume 168of

LIPIcs , pages 6:1–6:19. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2020. Aris Anagnostopoulos, Luca Becchetti, Ilaria Bordino, Stefano Leonardi, Ida Mele, and PiotrSankowski. Stochastic query covering for fast approximate document retrieval.

ACM Trans.Inf. Syst. , 33(3):11:1–11:35, 2015. Sepehr Assadi. Tight space-approximation tradeoﬀ for the multi-pass streaming set coverproblem. In

PODS , pages 321–335. ACM, 2017. Sepehr Assadi, Sanjeev Khanna, and Yang Li. Tight bounds for single-pass streamingcomplexity of the set cover problem. In

STOC , pages 698–711. ACM, 2016. Ashwinkumar Badanidiyuru, Baharan Mirzasoleiman, Amin Karbasi, and Andreas Krause.Streaming submodular maximization: massive data summarization on the ﬂy. In

KDD , pages671–680. ACM, 2014. Ziv Bar-Yossef, T. S. Jayram, Ravi Kumar, D. Sivakumar, and Luca Trevisan. Countingdistinct elements in a data stream. In

RANDOM , volume 2483 of

Lecture Notes in ComputerScience , pages 1–10. Springer, 2002. Édouard Bonnet, Vangelis Th. Paschos, and Florian Sikora. Parameterized exact and ap-proximation algorithms for maximum k -set cover and related satisﬁability problems. RAIROTheor. Informatics Appl. , 50(3):227–240, 2016. Vladimir Braverman, Rafail Ostrovsky, and Dan Vilenchik. How hard is counting trianglesin the streaming model? In

ICALP (1) , volume 7965 of

Lecture Notes in Computer Science ,pages 244–254. Springer, 2013. Marc Bury and Chris Schwiegelshohn. Sublinear estimation of weighted matchings in dynamicdata streams. In

Algorithms - ESA 2015 - 23rd Annual European Symposium, Patras, Greece,September 14-16, 2015, Proceedings , pages 263–274, 2015. URL: http://dx.doi.org/10.1007/978-3-662-48350-3_23 , doi:10.1007/978-3-662-48350-3_23 . Amit Chakrabarti and Sagar Kale. Submodular maximization meets streaming: matchings,matroids, and more.

Math. Program. , 154(1-2):225–247, 2015. Amit Chakrabarti, Subhash Khot, and Xiaodong Sun. Near-optimal lower bounds on the multi-party communication complexity of set disjointness. In

IEEE Conference on ComputationalComplexity , pages 107–117. IEEE Computer Society, 2003. Amit Chakrabarti and Anthony Wirth. Incidence geometries and the pass complexity ofsemi-streaming set cover. In

SODA , pages 1365–1373. SIAM, 2016. Chandra Chekuri, Shalmoli Gupta, and Kent Quanrud. Streaming algorithms for submodularfunction maximization. In

ICALP (1) , volume 9134 of

Lecture Notes in Computer Science ,pages 318–330. Springer, 2015. Rajesh Chitnis and Graham Cormode. Towards a theory of parameterized streaming algorithms.In , pages 7:1–7:15, 2019. URL: https://doi.org/10.4230/LIPIcs.IPEC.2019.7 , doi:10.4230/LIPIcs.IPEC.2019.7 . Rajesh Chitnis, Graham Cormode, Hossein Esfandiari, MohammadTaghi Hajiaghayi, AndrewMcGregor, Morteza Monemizadeh, and Sofya Vorotnikova. Kernelization via sampling withapplications to ﬁnding matchings and related problems in dynamic graph streams. In

SODA ,pages 1326–1344. SIAM, 2016. Rajesh Hemant Chitnis, Graham Cormode, Hossein Esfandiari, MohammadTaghi Hajiaghayi,and Morteza Monemizadeh. Brief announcement: New streaming algorithms for parameterizedmaximal matching & beyond. In

Proceedings of the 27th ACM on Symposium on Parallelism inAlgorithms and Architectures, SPAA 2015, Portland, OR, USA, June 13-15, 2015 , pages 56–58,2015. URL: https://doi.org/10.1145/2755573.2755618 , doi:10.1145/2755573.2755618 . Rajesh Hemant Chitnis, Graham Cormode, Mohammad Taghi Hajiaghayi, and MortezaMonemizadeh. Parameterized streaming: Maximal matching and vertex cover. In

SODA ,pages 1234–1251. SIAM, 2015. Graham Cormode, Mayur Datar, Piotr Indyk, and S. Muthukrishnan. Comparing data streamsusing hamming norms (how to zero in).

IEEE Trans. Knowl. Data Eng. , 15(3):529–540, 2003. Graham Cormode and S. Muthukrishnan. An improved data stream summary: the count-minsketch and its applications.

J. Algorithms , 55(1):58–75, 2005. URL: https://doi.org/10.1016/j.jalgor.2003.12.001 , doi:10.1016/j.jalgor.2003.12.001 . Michael Crouch and Daniel S. Stubbs. Improved streaming algorithms for weighted matching,via unweighted matching. In

Approximation, Randomization, and Combinatorial Optimization.Algorithms and Techniques, APPROX/RANDOM 2014, September 4-6, 2014, Barcelona, Spain ,pages 96–104, 2014. URL: http://dx.doi.org/10.4230/LIPIcs.APPROX-RANDOM.2014.96 , doi:10.4230/LIPIcs.APPROX-RANDOM.2014.96 . Michael S. Crouch, Andrew McGregor, and Daniel Stubbs. Dynamic graphs in the sliding-window model. In

Algorithms - ESA 2013 - 21st Annual European Symposium, Sophia Antipolis,France, September 2-4, 2013. Proceedings , pages 337–348, 2013. URL: http://dx.doi.org/10.1007/978-3-642-40450-4_29 , doi:10.1007/978-3-642-40450-4_29 . Erik D. Demaine, Uriel Feige, MohammadTaghi Hajiaghayi, and Mohammad R. Salavatipour.Combination can be hard: Approximability of the unique coverage problem.

SIAM J. Comput. ,38(4):1464–1483, 2008. Michael Dom, Jiong Guo, Rolf Niedermeier, and Sebastian Wernicke. Minimum membershipset covering and the consecutive ones property. In

SWAT , volume 4059 of

Lecture Notes inComputer Science , pages 339–350. Springer, 2006. Yuval Emek and Adi Rosén. Semi-streaming set cover.

ACM Trans. Algorithms , 13(1):6:1–6:22,2016. Leah Epstein, Asaf Levin, Julián Mestre, and Danny Segev. Improved approximation guaranteesfor weighted matching in the semi-streaming model.

SIAM J. Discrete Math. , 25(3):1251–1265,2011. URL: http://dx.doi.org/10.1137/100801901 , doi:10.1137/100801901 . Thomas Erlebach and Erik Jan van Leeuwen. Approximating geometric coverage problems. In

Proceedings of the Nineteenth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA2008, San Francisco, California, USA, January 20-22, 2008 , pages 1267–1276, 2008. URL: http://dl.acm.org/citation.cfm?id=1347082.1347220 . Uriel Feige. A threshold of ln n for approximating set cover. J. ACM , 45(4):634–652, 1998. . McGregor, D. Tench, and H. T. Vu 19 Joan Feigenbaum, Sampath Kannan, Andrew McGregor, Siddharth Suri, and Jian Zhang.On graph problems in a semi-streaming model.

Theor. Comput. Sci. , 348(2):207–216, 2005. doi:http://dx.doi.org/10.1016/j.tcs.2005.09.013 . Moran Feldman, Ashkan Norouzi-Fard, Ola Svensson, and Rico Zenklusen. The one-waycommunication complexity of submodular maximization with applications to streaming androbustness. In

STOC , pages 1363–1374. ACM, 2020. Daya Ram Gaur, Ramesh Krishnamurti, and Rajeev Kohli. Erratum to: The capacitated max k -cut problem. Math. Program. , 126(1):191, 2011. Ashish Goel, Michael Kapralov, and Sanjeev Khanna. On the communication and streamingcomplexity of maximum bipartite matching. In

Proceedings of the Twenty-Third AnnualACM-SIAM Symposium on Discrete Algorithms, SODA 2012, Kyoto, Japan, January 17-19,2012 , pages 468–485, 2012. URL: http://portal.acm.org/citation.cfm?id=2095157&CFID=63838676&CFTOKEN=79617016 . Venkatesan Guruswami and Krzysztof Onak. Superlinear lower bounds for multipass graphprocessing. In

Proceedings of the 28th Conference on Computational Complexity, CCC 2013,Palo Alto, California, USA, 5-7 June, 2013 , pages 287–298, 2013. URL: http://dx.doi.org/10.1109/CCC.2013.37 , doi:10.1109/CCC.2013.37 . Sariel Har-Peled, Piotr Indyk, Sepideh Mahabadi, and Ali Vakilian. Towards tight bounds forthe streaming set cover problem. In

PODS , pages 371–383. ACM, 2016. Chien-Chung Huang, Naonori Kakimura, and Yuichi Yoshida. Streaming algorithms formaximizing monotone submodular functions under a knapsack constraint. In

APPROX-RANDOM , volume 81 of

LIPIcs , pages 11:1–11:14. Schloss Dagstuhl - Leibniz-Zentrum fuerInformatik, 2017. Piotr Indyk, Sepideh Mahabadi, Ronitt Rubinfeld, Jonathan Ullman, Ali Vakilian, and AnakYodpinyanee. Fractional set cover in the streaming model. In

APPROX-RANDOM , volume 81of

LIPIcs , pages 12:1–12:20. Schloss Dagstuhl - Leibniz-Zentrum fuer Informatik, 2017. Piotr Indyk and Ali Vakilian. Tight trade-oﬀs for the maximum k-coverage problem inthe general streaming model. In

Proceedings of the 38th ACM SIGMOD-SIGACT-SIGAISymposium on Principles of Database Systems, PODS 2019, Amsterdam, The Netherlands,June 30 - July 5, 2019 , pages 200–217, 2019. URL: https://doi.org/10.1145/3294052.3319691 , doi:10.1145/3294052.3319691 . Takehiro Ito, Shin-Ichi Nakano, Yoshio Okamoto, Yota Otachi, Ryuhei Uehara, Takeaki Uno,and Yushi Uno. A 4.31-approximation for the geometric unique coverage problem on unitdisks.

Theor. Comput. Sci. , 544:14–31, 2014. Hossein Jowhari, Mert Saglam, and Gábor Tardos. Tight bounds for lp samplers, ﬁndingduplicates in streams, and related problems. In

PODS , pages 49–58. ACM, 2011. John Kallaugher, Andrew McGregor, Eric Price, and Sofya Vorotnikova. The complexity ofcounting cycles in the adjacency list streaming model. In

Proceedings of the 38th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems, PODS 2019, Amsterdam, TheNetherlands, June 30 - July 5, 2019 , pages 119–133, 2019. URL: https://doi.org/10.1145/3294052.3319706 , doi:10.1145/3294052.3319706 . Michael Kapralov. Better bounds for matchings in the streaming model. In

Proceedingsof the Twenty-Fourth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2013,New Orleans, Louisiana, USA, January 6-8, 2013 , pages 1679–1697, 2013. URL: http://dx.doi.org/10.1137/1.9781611973105.121 , doi:10.1137/1.9781611973105.121 . Michael Kapralov, Sanjeev Khanna, and Madhu Sudan. Approximating matching sizefrom random streams. In

Proceedings of the Twenty-Fifth Annual ACM-SIAM Symposiumon Discrete Algorithms, SODA 2014, Portland, Oregon, USA, January 5-7, 2014 , pages734–751, 2014. URL: http://dx.doi.org/10.1137/1.9781611973402.55 , doi:10.1137/1.9781611973402.55 . Michael Kapralov, Sanjeev Khanna, and Madhu Sudan. Streaming lower bounds for approx-imating MAX-CUT. In

SODA , pages 1263–1282. SIAM, 2015. Michael Kapralov, Sanjeev Khanna, Madhu Sudan, and Ameya Velingker. (1 + ω (1))-approximation to MAX-CUT requires linear space. In Proceedings of the Twenty-EighthAnnual ACM-SIAM Symposium on Discrete Algorithms, SODA 2017, Barcelona, Spain,Hotel Porta Fira, January 16-19 , pages 1703–1722, 2017. URL: https://doi.org/10.1137/1.9781611974782.112 , doi:10.1137/1.9781611974782.112 . Michael Kapralov and Dmitry Krachun. An optimal space lower bound for approximatingMAX-CUT.

CoRR , abs/1811.10879, 2018. URL: http://arxiv.org/abs/1811.10879 , arXiv:1811.10879 . David Kempe, Jon M. Kleinberg, and Éva Tardos. Maximizing the spread of inﬂuence througha social network.

Theory of Computing , 11:105–147, 2015. Christian Konrad. Maximum matching in turnstile streams. In

Algorithms - ESA 2015 -23rd Annual European Symposium, Patras, Greece, September 14-16, 2015, Proceedings , pages840–852, 2015. URL: http://dx.doi.org/10.1007/978-3-662-48350-3_70 , doi:10.1007/978-3-662-48350-3_70 . Christian Konrad, Frédéric Magniez, and Claire Mathieu. Maximum matching in semi-streaming with few passes. In

APPROX-RANDOM , volume 7408 of

Lecture Notes in ComputerScience , pages 231–242. Springer, 2012. Christian Konrad and Adi Rosén. Approximating semi-matchings in streaming and intwo-party communication. In

Automata, Languages, and Programming - 40th Interna-tional Colloquium, ICALP 2013, Riga, Latvia, July 8-12, 2013, Proceedings, Part I , pages637–649, 2013. URL: http://dx.doi.org/10.1007/978-3-642-39206-1_54 , doi:10.1007/978-3-642-39206-1_54 . Andreas Krause and Carlos Guestrin. Near-optimal observation selection using submodularfunctions. In

AAAI , pages 1650–1654. AAAI Press, 2007. Fabian Kuhn, Pascal von Rickenbach, Roger Wattenhofer, Emo Welzl, and Aaron Zollinger.Interference in cellular networks: The minimum membership set cover problem. In

COCOON ,volume 3595 of

Lecture Notes in Computer Science , pages 188–198. Springer, 2005. Pasin Manurangsi. A note on max k-vertex cover: Faster fpt-as, smaller approximate kerneland improved approximation. In , pages 15:1–15:21, 2019. URL: https://doi.org/10.4230/OASIcs.SOSA.2019.15 , doi:10.4230/OASIcs.SOSA.2019.15 . Andrew McGregor. Finding graph matchings in data streams.

APPROX-RANDOM , pages170–181, 2005. Andrew McGregor. Graph stream algorithms: a survey.

SIGMOD Record , 43(1):9–20, 2014. Andrew McGregor and Sofya Vorotnikova. Planar matching in streams revisited. In

APPROX-RANDOM , volume 60 of

LIPIcs , pages 17:1–17:12. Schloss Dagstuhl - Leibniz-Zentrum fuerInformatik, 2016. Andrew McGregor and Sofya Vorotnikova. Triangle and four cycle counting in the data streammodel. In

Proceedings of the 39th ACM SIGMOD-SIGACT-SIGAI Symposium on Principlesof Database Systems, PODS 2020, Portland, OR, USA, June 14-19, 2020 , pages 445–456,2020. URL: https://doi.org/10.1145/3375395.3387652 , doi:10.1145/3375395.3387652 . Andrew McGregor, Sofya Vorotnikova, and Hoa T. Vu. Better algorithms for counting trianglesin data streams. In

PODS , pages 401–411. ACM, 2016. Andrew McGregor and Hoa T. Vu. Better streaming algorithms for the maximum coverageproblem. In

ICDT , volume 68 of

LIPIcs , pages 22:1–22:18. Schloss Dagstuhl - Leibniz-Zentrumfuer Informatik, 2017. Andrew McGregor and Hoa T. Vu. Better streaming algorithms for the maximum coverageproblem.

Theory of Computing Systems , pages 1–25, 2018. Neeldhara Misra, Hannes Moser, Venkatesh Raman, Saket Saurabh, and Somnath Sik-dar. The parameterized complexity of unique coverage and its variants.

Algorithmica ,65(3):517–544, 2013. URL: https://doi.org/10.1007/s00453-011-9608-0 , doi:10.1007/s00453-011-9608-0 . . McGregor, D. Tench, and H. T. Vu 21 Ashkan Norouzi-Fard, Jakub Tarnawski, Slobodan Mitrovic, Amir Zandieh, Aidasadat Mousavi-far, and Ola Svensson. Beyond 1/2-approximation for submodular maximization on massivedata streams. In

ICML , volume 80 of

Proceedings of Machine Learning Research , pages3826–3835. PMLR, 2018. Barna Saha and Lise Getoor. On maximum coverage in the streaming model & application tomulti-topic blog-watch. In

SDM , pages 697–708. SIAM, 2009. Jeanette P. Schmidt, Alan Siegel, and Aravind Srinivasan. Chernoﬀ-hoeﬀding bounds forapplications with limited independence.

SIAM J. Discrete Math. , 8(2):223–250, 1995. Mariano Zelke. Weighted matching in the semi-streaming model.

Algorithmica , 62(1-2):1–20, 2012. URL: http://dx.doi.org/10.1007/s00453-010-9438-5 , doi:10.1007/s00453-010-9438-5doi:10.1007/s00453-010-9438-5