Optimal Streaming Algorithms for Graph Matching
aa r X i v : . [ c s . D S ] F e b Optimal Streaming Algorithms for Graph Matching
Jianer Chen ∗ Qin Huang † Iyad Kanj ‡ Ge Xia § Abstract
We present parameterized streaming algorithms for the graph matching problem in both the dynamicand the insert-only models. For the dynamic streaming model, we present a one-pass algorithm that, w.h.p. (with high probability), computes a maximum-weight k -matching of a weighted graph in ˜ O ( W k )space and that has ˜ O (1) update time, where W is the number of distinct edge weights. For the insert-onlystreaming model, we present a one-pass algorithm that runs in O ( k ) space and has O (1) update time,and that, w.h.p. , computes a maximum-weight k -matching of a weighted graph. The space complexityand the update-time complexity achieved by our algorithms for unweighted k -matching in the dynamicmodel and for weighted k -matching in the insert-only model are optimal.A notable contribution of this paper is that the presented algorithms do not rely on the aprioriknowledge/promise that the cardinality of every maximum-weight matching of the input graph is upperbounded by the parameter k . This promise has been a critical condition in previous works, and lifting itrequired the development of new tools and techniques. Emerging applications in big-data involve processing graphs of tremendous size [33]. For such applications,it is infeasible to store the graph when processing it. This issue has given rise to a new computational model,referred to as the graph streaming model . A graph stream S for an underlying graph G is a sequence ofelements of the form ( e, op ), where op is an operation performed to edge e . In the insert-only streamingmodel, each operation is an edge-insertion, while in the dynamic streaming model each operation is eitheran edge-insertion or an edge-deletion (with a specified weight if G is weighted). The graph streaming modeldemands performing the computation within limited space and time resources.The graph matching problem, both in unweighted and weighted graphs, is one of the most extensively-studied problems in the streaming model. There has been a vast amount of work on its approximation andparameterized complexity, as will be discussed shortly.A matching M in a graph G is a k -matching if | M | = k . A maximum-weight k -matching in a weightedgraph G is a k -matching whose weight is maximum over all k -matchings in G . In this paper, we study param-eterized streaming algorithms for the weighted and unweighted k -matching problem in both the dynamicand the insert-only streaming models. In these problems, we are given a graph stream and a parameter k ∈ N , and the goal is to compute a k -matching or a maximum-weight k -matching. We present results thatimprove several results in various aspects and that achieve optimal complexity upper bounds. Most of the previous works on the graph matching problem in the streaming model have focused on approxi-mating a maximum matching [1, 5, 6, 13, 15, 17, 21, 25, 26, 28, 29, 31, 32, 37, 42], with the majority of theseworks pertaining to the (simpler) insert-only model. The most relevant to ours are the works of [8, 9, 10, 16], ∗ Department of Computer Science and Engineering, Texas A&M University, College Station, TX 77843, USA. Email: [email protected]. † Department of Computer Science and Engineering, Texas A&M University, College Station, TX 77843, USA. Email: [email protected]. ‡ School of Computing, DePaul University, Chicago, IL 60604, USA. Email: [email protected] . § Department of Computer Science, Lafayette College, Easton, PA 18042, USA. Email: [email protected]. The notation ˜ O () hides a poly-logarithmic factor in the input size. hich studied parameterized streaming algorithms for the maximum matching problem. We survey theseworks next.Under the promise that the cardinality of every maximal matching at every instant of the stream is atmost k , the authors in [9, 10] presented a one-pass dynamic streaming algorithm that w.h.p. computes amaximal matching in an unweighted graph stream. Their algorithm runs in ˜ O ( k ) space and has ˜ O ( k )update time.The authors in [8] considered the problem of computing a maximum matching in the dynamic streamingmodel. For an unweighted graph G , under the promise that the cardinality of a maximum matching at everyinstant of the stream is at most k , they presented a sketch-based algorithm that w.h.p. computes a maximummatching of G , and that runs in ˜ O ( k ) space and has ˜ O (1) update time. They proved an Ω( k ) lower boundon the space complexity of any randomized algorithm for the parameterized maximum matching problem,even in the insert-only model, thus showing that the space complexity of their algorithm is optimal (moduloa poly-logarithmic factor); their lower bound result carries over to the k -matching problem. They extendedtheir algorithm to weighted graphs, and presented under the same promise an algorithm for computing amaximum-weight matching that runs in space ˜ O ( k W ) and has ˜ O (1) update time, where W is the numberof distinct edge weights.We remark that the previous work on the weighted matching problem in the streaming model [8], as wellas our current work, assumes that the weight of each edge remains the same during the stream. Other workson weighted graph streams make the same assumption [2, 3, 22, 27]. The reason behind this assumptionis that—as shown in this paper, if this assumption is lifted, we can derive a lower bound on the spacecomplexity of the k -matching problem that is at least linear in the size of the graph, and hence, can be muchlarger than the desirable space complexity.Fafianie and Kratsch [16] studied kernelization streaming algorithms in the insert-only model for the NP -hard d - Set Matching problem (among others), which for d = 2, is equivalent to the k -matching problemin unweighted graphs. Their result implies a one-pass kernelization streaming algorithm for k -matching inunweighted graphs that computes a kernel of size O ( k log k ), and that runs in O ( k ) space and has O (log k )update time.We mention that Chen et al. [7] studied algorithms for k -matching in unweighted and weighted graphsin the RAM model; their algorithms use limited computational resources and w.h.p. compute a k -matching.Clearly, the RAM model is very different from the streaming model. In order to translate their algorithm tothe streaming model, it would require Ω(n k ) space and multiple passes, where n is the number of vertices.However, we mention that (for the insert-only model), in one of the steps of our algorithm, we were inspiredby a graph operation for constructing a reduced graph, which was introduced in their paper.Finally, there has been some work on computing matchings in special graph classes, and with respect toparameters other than the cardinality of the matching (e.g., see [30, 34, 35, 36]).As is commonly the case in the relevant literature, we work under the assumption that each basic operationon words takes constant time and space. Results and Techniques for the Dynamic Model.
We give a one-pass sketch-based one-sided errorstreaming algorithm that, for a weighted graph G , if G contains a k -matching then, with probability at least1 − k ln(2 k ) , the algorithm computes a maximum-weight k -matching of G and if G does not contain a k -matching then the algorithm reports that correctly. The algorithm runs in ˜ O ( W k ) space and has ˜ O (1)update time, where W is the number of distinct weights in the graph. A byproduct of this result is a one-pass one-sided error streaming algorithm for unweighted k -matching running in ˜ O ( k ) space and having ˜ O (1)update time. For k -matching in unweighted graphs, the space and update-time complexity of our algorithmare optimal (modulo a poly-logarithmic factor of k ).The above results achieve the same space and update-time complexity as the results in [8], but generalizethem in the following ways. First, our algorithm can be used to solve the unweighted/weighted k -matchingproblem for any k ∈ N , whereas the algorithm in [8] can only be used to solve the maximum (resp. maximum-weight) matching problem and under the promise that the given parameter k is at least as large as thecardinality of every maximum (resp. maximum-weight) matching. In particular, if one wishes to compute amaximum-weight k -matching where k is smaller than the cardinality of a maximum-weight matching, then2he algorithm in [8] cannot be used or, in the unweighted case, incurs a complexity that depends on thecardinality of the maximum matching. Second, the correctness of the algorithm in [8] relies heavily on thepromise that the cardinality of every maximum (resp. maximum-weight) matching is at most k , in the sensethat, if this promise is not kept then, for certain instances of the problem, w.h.p. the subgraph returned bythe algorithm in [8] does not contain a maximum (resp. maximum-weight) matching. Third, if we are to workunder the same promise as in [8], which is that the cardinality of every maximum (resp. maximum-weight)matching is at most k , then w.h.p. , our algorithm computes a maximum (resp. maximum-weight) matching.Therefore, in these respects, our results generalize those in [8].Another byproduct of our result for weighted k -matching is a one-pass streaming approximation algorithmthat, for any ǫ > w.h.p. computes a k -matching that is within a factor of 1 + ǫ from a maximum-weight k -matching in G ; the algorithm runs in ˜ O ( k ǫ − log W ′ ) space and has ˜ O (1) update time, where W ′ is theratio of the maximum edge-weight to the minimum edge-weight in G . This result matches the approximationresult in [8], which achieves the same bounds, albeit under the aforementioned promise.We complement the above with a space lower-bound result showing that, if the restriction that theweight of each edge remains the same during the stream is lifted, which—as mentioned before–is a standardassumption, then even computing a 1-matching whose weight is within a (6 / k . This assumption is essential for their techniques to worksince it is used to upper bound the number of “large” vertices of degree at least 10 k by O ( k ), and the numberof “small” edges whose both endpoints have degree at most 10 k by O ( k ). These bounds allow the samplingof a set of edges that w.h.p. contains a maximum (resp. maximum-weight) matching.To remove the reliance on the promise, we prove a structural result that can be useful in its own right for k -subset problems (in which the goal is to compute a k -subset with certain prescribed properties from someuniverse U ). Intuitively, the result states that, for any k -subset S ⊆ U , w.h.p. we can compute k subsets T , . . . , T k of U that interact “nicely” with S . More specifically, (1) the sets T i , for i ∈ [ k ], are pairwisedisjoint, (2) S is contained in their union S i ∈ [ d ] T i , and (3) each T i contains exactly one element of S . Toprove the theorem, we show that we can randomly choose an O (log k )-wise independent hash function thatpartitions S “evenly”. We then show that we can randomly choose O ( k/ log k )-many hash functions, from aset of universal hash functions, such that there exist k integers p , . . . , p k , satisfying that T i is the pre-imageof p i under one of the chosen hash functions.We then apply the above result to obtain the sets T i of vertices that w.h.p. induce the edges of the desired k -matching. Afterwards, we use ℓ -sampling to select a smaller subset of edges induced by the vertices ofthe T i ’s that w.h.p. contains the desired k -matching. From this smaller subset of edges, a maximum-weight k -matching can be extracted. Results and Techniques for the Insert-Only Model.
We present a one-pass one-sided error algorithmfor computing a maximum-weight k -matching that runs in O ( k ) space and has O (1) update time. Thespace and update-time complexity of our algorithm are optimal.Our techniques rely on partitioning the graph (using hashing), and defining an auxiliary graph whosevertices are the different parts of the partition; the auxiliary graph is updated during the stream. Byquerying this auxiliary graph, the algorithm can compute a “compact” subgraph of size O ( k ) that, w.h.p. ,contains the edges of the desired k -matching. A maximum-weight k -matching can then be extracted fromthis compact subgraph.Fafianie and Kratsch [16] studied kernelization streaming algorithms in the insert-only model for the NP -hard d - Set Matching problem, which for d = 2, is equivalent to k -matching in unweighted graphs.Their result implies a one-pass kernelization streaming algorithm for k -matching that computes a kernel ofsize O ( k log k ) bits using O ( k ) space and O (log k ) update time. In comparison, our algorithm computesa compact subgraph, which is a kernel of the same size as in [16] (and from which a k -matching can beextracted); moreover, our algorithm treats the more general weighted case, and achieves a better updatetime of O (1) than that of [16], while matching their upper bound on the space complexity.3 Preliminaries
For a positive integer i , let [ i ] − denote the set of numbers { , , . . . , i − } , [ i ] denote the set of numbers { , . . . , i } , and x i y denote the binary representation of i . We write “u.a.r.” as an abbreviation for “uniformlyat random”. A parameterized problem Q is a subset of Σ ∗ × N , where Σ is a fixed, finite alphabet. Each instance is apair ( I, k ), where k ∈ N is called the parameter. A parameterized problem Q is kernelizable if there existsa polynomial-time reduction that maps an instance ( I, k ) of Q to another instance ( I ′ , k ′ ) such that (1) k ′ ≤ g ( k ) and | I ′ | ≤ g ( k ), where g is a computable function and | I ′ | is the length of the instance, and (2)( I, k ) is a yes-instance of Q if and only if ( I ′ , k ′ ) is a yes-instance of Q . The polynomial-time reduction iscalled the kernelization algorithm and the instance ( I ′ , k ′ ) is called the kernel of ( I, k ). We refer to [14] formore information.
All graphs discussed in this paper are undirected and simple. Let G be a graph. We write V ( G ) and E ( G )for the vertex-set and edge-set of G , respectively, and write uv for the edge whose endpoints are u and v .A matching M ⊆ E ( G ) is a set of edges such that no two distinct edges in M share the same endpoint. Amatching M is a k -matching if | M | = k . A weighted graph G is a graph associated with a weight function wt : E ( G ) −→ R ; we denote the weight of an edge e by wt ( e ). Let M be a matching in a weighted graph G . The weight of M , wt ( M ), is the sum of the weights of the edges in M , that is, wt ( M ) = P e ∈ M wt ( e ).A maximum-weight k -matching in a weighted graph G is a k -matching whose weight is maximum over all k -matchings in G . A graph stream S for an underlying graph G = ( V, E ) is a sequence of elements, each of the form ( e, op ),where op is an update to edge e ∈ E ( G ). Each update could be an insertion of an edge, a deletion of an edge,or in the case of a weighted graph an update to the weight of an edge in G (and would include the weightof the edge in that case). In the insert-only graph streaming model, a graph G = ( V, E ) is given as a stream S of elements in which each operation is an edge-insertion, while in the dynamic graph streaming model agraph G = ( V, E ) is given as a stream S of elements in which the operations could be either edge-insertionsor edge-deletions (with specified weights in case G is weighted).We assume that the vertex set V ( G ) contains n vertices, identified with the integers { , . . . , n − } forconvenience, and that the length of the stream S is polynomial in n . Therefore, we will treat v ∈ V as aunique number v ∈ [ n ] − . Without loss of generality, since the graph G is undirected, we will assume thatthe edges of the graph are of the form uv , where u < v . Since G can have at most (cid:0) n (cid:1) = n ( n − / n ( n − / − . At the beginning of the stream S , thestream corresponds to a graph with an empty edge-set. For weighted graphs, we assume that the weight ofan edge is specified when the edge is inserted or deleted. In a parameterized graph streaming problem Q , we are given an instance of the form ( S , k ), where S is agraph stream of some underlying graph G and k ∈ N , and we are queried for a solution for ( S , k ) either atthe end of S or after some arbitrary element/operation in S [10].A parameterized streaming algorithm A for Q generally uses a sketch , which is a data structure thatsupports a set of update operations [4, 10, 18, 23]. The algorithm A can update the sketch after readingeach element of S ; the time taken to update the sketch—after reading an element—is referred to as the update time of the algorithm. The space used by A is the space needed to compute and store the sketch, andthat needed to answer a query on the instance (based on the sketch). The time complexity of the algorithmis the time taken to extract a solution from the sketch when answering a query.4e will consider parameterized problems in which we are given a graph stream S and k ∈ N , and thegoal is to compute a k -matching or a maximum-weight k -matching in G (if one exists). We will considerboth the unweighted and weighted k -matching problems, referred to as p- Matching and p-
WT-Matching ,respectively, and in both the insert-only and the dynamic streaming models. For p-
WT-Matching , we willfollow the standard literature assumption [8], which is that the weight of every edge remains the samethroughout S ; we will consider in Subsection 4.1 a generalized version of this problem that allows the weightof an edge to change during the course of stream, and prove lower bound results for this generalization. Weformally define the problems under consideration: p-Matching Given:
A graph stream S of an unweighted graph G Parameter: k Goal:
Compute a k -matching in G or report that no k -matching existsThe parameterized Weighted Graph Matching (p-
WT-Matching ) problem is defined as fellows: p-WT-Matching
Given:
A graph stream S of a weighted graph G Parameter: k Goal:
Compute a maximum-weight k -matching in G or report that no k -matching existsClearly, if k > n/ G does not contain a k -matching. Therefore, we may assume henceforth that k ≤ n/ G ′ of the graph stream G such that w.h.p. G ′ contains a k -matching or a maximum-weight k -matching of G if and only if G contains one. In the case where the size of G ′ is a function of k , suchalgorithms are referred to as kernelization streaming algorithms [8]. We note that result in [8] also computesa subgraph containing the edges of the maximum (resp. maximum-weight) matching, without computing thematching itself, as there are efficient algorithms for extracting a maximum matching (resp. maximum-weight)or a k -matching (resp. maximum-weight k -matching) from that subgraph [19, 20, 40]. For any probabilistic events E , E , . . . , E r , the union bound states that Pr( S ri =1 E i ) ≤ P ri =1 Pr( E i ). Forany random variables X , . . . , X r whose expectations are well-defined, the linearity of expectation statesthat E [ P ri =1 X i ] = P ri =1 E [ X i ], where E [ X i ] is the expectation of X i . A set of discrete random variables { X , . . . , X j } is called λ -wise independent if for any subset J ⊆ { , . . . , j } with | J | ≤ λ and for any values x i , we have Pr( ∧ i ∈ J X i = x i ) = Q i ∈ J Pr( X i = x i ). A random variable is called a , if itonly takes one of the two values 0, 1. The following theorem bounds the tail probability of the sum of 0-1random variables with limited independence (see Theorem 2 in [39]): Theorem 2.1.
Given any 0-1 random variables X , . . . , X j , let X = P ji =1 X i and µ = E [ X ] . For any δ > , if the X i ’s are ⌈ µδ ⌉ -wise independent, then Pr( X ≥ µ (1 + δ )) ≤ ( e − µδ / if δ < e − µδ/ if δ ≥ ℓ -sampler Let 0 < δ < S = ( i , ∆ ) , . . . , ( i p , ∆ p ) , . . . be a stream of updates of an underlyingvector x ∈ R n , where i j ∈ [ n ] and ∆ j ∈ R . The j -th update ( i j , ∆ j ) updates the i j -th coordinate of x bysetting x i j = x i j + ∆ j . An ℓ -sampler for x = 0 either fails with probability at most δ , or conditioned onnot failing, for any non-zero coordinate x j of x , returns the pair ( j, x j ) with probability || x || , where || x || isthe ℓ -norm of x , which is the same as the number of non-zero coordinates of x . For more details, we referto [12].Based on the results in [12, 24], and as shown in [8], we can develop a sketch-based ℓ -sampler algorithmfor a dynamic graph stream that samples an edge from the stream. More specifically, the following resultwas shown in [8]: 5 emma 2.2 (Proof of Theorem 2.1 in [8]) . Let < δ < be a parameter. There exists a linear sketch-based ℓ -sampler algorithm that, given a dynamic graph stream, either returns FAIL with probability at most δ , orreturns an edge chosen u.a.r. amongst the edges of the stream that have been inserted and not deleted. This ℓ -sampler algorithm can be implemented using O (log n · log( δ − )) bits of space and ˜ O (1) update time, where n is the number of vertices of the graph stream. Let U be a universe of elements that we will refer to as keys . We can always identify the elements of U with the numbers 0 , . . . , | U | −
1; therefore, without loss of generality, we will assume henceforth that U = { , , . . . , | U | − } . For a set H of hash functions and a hash function h ∈ H , we write h ∈ u.a.r. H todenote that h is chosen u.a.r. from H . Let S ⊆ U and r be a positive integer. A hash function h : U −→ [ r ] − is perfect w.r.t. S if it is injective on S (i.e., no two distinct elements x, y ∈ S cause a collision).A set H of hash functions, each mapping U to [ r ] − , is called universal if for each pair of distinct keys x, y ∈ U , the number of hash functions h ∈ H for which h ( x ) = h ( y ) is at most |H| /r , or equivalently:Pr h ∈ u.a.r. H [ h ( x ) = h ( y )] ≤ r . Let p ≥ | U | be a prime number. A universal set of hash functions H from U to [ r ] − can be constructed asfollows (see chapter 11 in [11]): H = { h a,b,r | ≤ a ≤ p − , ≤ b ≤ p − } , where h a,b,r is defined as h a,b,r ( x ) = (( ax + b ) mod p ) mod r . Theorem 2.3 (Theorem 11.9 in [11]) . Let U be a universe and H be a universal set of hash functions, eachmapping U to [ r ] − . For any set S of r elements in U and any hash function h ∈ u.a.r. H , the probabilitythat h is perfect w.r.t. S is larger than / . A set H of hash functions, each mapping U to [ r ] − , is called κ -wise independent if for any κ distinct keys x , x , ..., x κ ∈ U , and any κ (not necessarily distinct) values a , a , ..., a κ ∈ [ r ] − , we havePr h ∈ u.a.r. H [ h ( x ) = a ∧ h ( x ) = a ∧ · · · ∧ h ( x κ ) = a κ ] = 1 r κ . Let F be a finite field. A κ -wise independent set H of hash functions can be constructed as follows (SeeConstruction 3.32 in [41]): H = { h a ,a ,...,a κ − : F → F } , where h a ,a ,...,a κ − ( x ) = a + a x + · · · + a κ − x κ − for a , . . . , a κ − ∈ F . Theorem 2.4 (Corollary 3.34 in [41]) . For every u, d, κ ∈ N , there is a family of κ -wise independent functions H = { h : { , } u → { , } d } such that choosing a random function from H takes space O ( κ · ( u + d )) . Moreover,evaluating a function from H takes time polynomial in u, d, κ . In this section, we prove a theorem that can be useful in its own right for subset problems, that is, problemsin which the goal is to compute a k -subset S ( k ∈ N ) of some universe U such that S satisfies certainprescribed properties. Intuitively, the theorem states that, for any k -subset S ⊆ U , w.h.p. we can compute k subsets T , . . . , T k of U that interact “nicely” with S . More specifically, (1) the sets T i , for i ∈ [ k ], arepairwise disjoint, (2) S is contained in their union S i ∈ [ d ] T i , and (3) each T i contains exactly one element of S . The above theorem will be used in Section 4 to design algorithms for p- Matching and p-
WT-Matching in the dynamic streaming model. Intuitively speaking, the theorem will be invoked to obtain the sets T i of vertices that w.h.p. induce the edges of the desired k -matching; however, these sets may not necessarily6onstitute the desired subgraph as they may not have “small” cardinalities. Sampling techniques will beused to select a smaller set of edges induced by the vertices of the T i ’s that w.h.p. contains the edges of the k -matching.To prove this theorem, we proceed in two phases. We give an intuitive description of these two phasesnext. We refer to Figure 1 for illustration.In the first phase, we choose a hashing function f u.a.r. from an O (ln k )-wise independent set of hashfunctions, which hashes U to a set of d = O ( k/ ln k ) integers. We use f to partition the universe U into d -many subsets U i , each consisting of all elements of U that hash to the same value under f . Afterwards, wechoose d families F , . . . , F d − of hash functions, each containing d = O (ln k ) functions, chosen indepen-dently and u.a.r. from a universal set of hash functions. The family F i , i ∈ [ d ] − , will be used restrictivelyto map the elements of U i . Since each family F i is chosen from a universal set of hash function, for thesubset S i = S ∩ U i , w.h.p. F i contains a hash function f i that is perfect w.r.t. S i ; that is, under the function f i the elements of S i are distinguished. This concludes the first phase of the process, which is described in Algorithm 1 . U Uf ∈ u.a.r. H U ... U j − U j U j +1 ... U d − x u s i n g h ∈ F j I j · d I j · d +1 u s i n g h d ∈ F j I ( j +1) · d − I ′ j − I ′ j I ′ j +1 G ( x ) Figure 1: Illustration for Algorithms 1 and 2.
Algorithm 1 : An algorithm for partitioning U and constructing families of hash functions Input: | U | , k ∈ N where | U | > Output:
A family of sets of hash functions let u and d be the unique positive integers satisfying 2 u − < | U | ≤ u and 2 d − < k ln k ≤ d choose f ∈ u.a.r. H , where H = { h : { , } u → { , } d } is a ⌈
12 ln k ⌉ -wise independent set of hashfunctions let H ′ be a set of universal hash functions from U to [ ⌈
13 ln k ⌉ ] − let F i , for i ∈ [2 d ] − , be a set of ⌈ k ⌉ hash functions chosen independently and u.a.r. from H ′ return { f, F , . . . , F d − } In the second phase, we define a relation G (from U ) that, for each x ∈ U , associates a set G ( x ) ofintegers. This relation extends the hash functions in the F j ’s above by (1) ensuring that elements in differentparts of U (w.r.t. the partitioning) are distinguished, in the sense that they are associated with subsets ofintegers that are contained in disjoint intervals of integers; and (2) maintaining the property that elementsof the same part U j that are distinguished under some function in F j remain so under the extended relation.7o do so, for each part U j , we associate an “offset” and create a large gap between any two (consecutive)offsets; we will ensure that all the elements in the same U j fall within the same interval determined by twoconsecutive offsets. To compute the set G ( x ), for an element x ∈ U j , we start with an offset o j that dependssolely on U j ( o j = j · d · d in Algorithm 2 ), and consider every function in the family F j correspondingto U j . For each such function h i , we associate an offset o ′ i ( o ′ i = ( i − · d in Algorithm 2 ), and for x and that particular function h i , we add to G ( x ) the value g ( j, i, x ) = o j + o ′ i + h i ( x ). The above phase isdescribed in Algorithm 2 . Algorithm 2 : An algorithm that defines the relation G from U to [ d · d · d ] − Input: x ∈ U , k ∈ N , { f, F , . . . , F d − } is computed by Algorithm 1, where | F | = · · · = | F d − | Output: a set G ( x ) let d = | F | = · · · = | F d − | and d = ⌈
13 ln k ⌉ G ( x ) = ∅ compute f ( x x y ) and let j be the integer such that x j y = f ( x x y ) for i = 1 to d do let h i be the i -th function in F j (assuming an arbitrary ordering on F j ) let g ( j, i, x ) = j · d · d + ( i − · d + h i ( x ) and let G ( x ) = G ( x ) ∪ { g ( j, i, x ) } return G ( x )Now that the relations G ( x ), for x ∈ U , have been defined, we will show in the following theoremthat, for any k -subset S of U , w.h.p. there exist k distinct elements i , . . . , i k − , such that their pre-images G − ( i ) , . . . , G − ( i k − ) are pairwise disjoint, contain all elements of S , and each pre-image contains exactlyone element of S ; those pre-images serve as the desired sets T i , for i ∈ [ k ].Consider Algorithm 1 and
Algorithm 2 , and refer to them for the terminologies used in the subsequentdiscussions. Let d = 2 d , d = ⌈ k ⌉ and d = ⌈
13 ln k ⌉ as defined in Algorithms 1 and . For i ∈ [ d · d · d ] − , define T i = { x ∈ U | i ∈ G ( x ) } . We define next two sequences of intervals, and provecertain properties about them, that will be used in the proof of Theorem 3.2. For q ∈ [ d · d ] − , let I q = { r | q · d ≤ r < ( q + 1) · d } . For t ∈ [ d ] − , let I ′ t = { r | t · d · d ≤ r < t · d · d + d · d } . Note thateach interval I ′ t is partitioned into the d -many intervals I q , for q = t · d , . . . , t · d + d − Lemma 3.1.
The following statements hold:(A) For any two distinct integers a, b ∈ I q , where q ∈ [ d · d ] − , we have T a ∩ T b = ∅ .(B) For t ∈ [ d ] − , we have G ( U t ) ⊆ I ′ t . Moreover, for any a ∈ I ′ t , b ∈ I ′ s , where s = t , we have T a ∩ T b = ∅ .Proof. To prove (A), we proceed by contradiction. Assume that there exists x ∈ U such that x ∈ T a ∩ T b .This implies that both a and b are in G ( x ). Without loss of generality, assume that Algorithm 2 adds a to G ( x ) in iteration i a of Steps 5–6 and adds b in iteration i b , where i a < i b . This implies that a = j · d · d + ( i a − · d + h i a ( x ) and b = j · d · d + ( i b − · d + h i b ( x ), where j is the integer suchthat x j y = f ( x x y ). Since h i a ( x ) < d and h i b ( x ) < d , it follows that a belongs to the interval I j · d + i a − and b belongs to the interval I j · d + i b − , which are two distinct intervals (since i a < i b ), contradicting theassumption that both a, b ∈ I q .To prove (B), let t ∈ [ d ] − , and let x ∈ U t . The set G ( x ) consists of the values g ( t, i, x ) = t · d · d +( i − · d + h i ( x ), for i = 1 , . . . , d . Since 0 ≤ h i ( x ) < d for any i ∈ [ d ] and any x ∈ U , it follows that t · d · d ≤ g ( t, i, x ) ≤ t · d · d + ( d − · d + d − < t · d · d + d · d , and hence, g ( t, i, x ) ∈ I ′ t . Thisproves that G ( U t ) ⊆ I ′ t .To prove that, for any a ∈ I ′ t , b ∈ I ′ s , where s = t , we have T a ∩ T b = ∅ , suppose not and let x ∈ T a ∩ T b .By the first part of the claim, we have a ∈ G ( x ) ⊆ I ′ t and b ∈ G ( x ) ⊆ I ′ s , which is a contradiction since I ′ t ∩ I ′ s = ∅ . Theorem 3.2.
For any subset S ⊆ U of cardinality k ≥ , with probability at least − k ln k , there exist k sets T i , . . . , T i k − such that: (1) | T i j ∩ S | = 1 for j ∈ [ k ] − , (2) S ⊆ ∪ j ∈ [ k ] − T i j , and (3) T i j ∩ T i l = ∅ for j = l ∈ [ k ] − . roof. For j ∈ [ d ] − , let U j be the set of elements in U whose image is x j y under f (defined in Step 2 of Algorithm 1 ), that is U j = { y ∈ U | f ( x y y ) = x j y } . Clearly, the sets U j , for j ∈ [ d ] − , partition theuniverse U . We will show that, with probability at least 1 − k ln k , there exist k sets T i , . . . , T i k − thatsatisfy conditions (1)–(3) in the statement of the theorem.Let S ⊆ U be any subset such that | S | = k . For j ∈ [ d ] − and y ∈ S , let X y,j be the random variabledefined as X y,j = 1 if f ( x y y ) = x j y and 0 otherwise. Let X j = P y ∈ S X y,j , and S j = { y ∈ S | f ( x y y ) = x j y } .Thus, | S j | = X j . Since f is ⌈
12 ln k ⌉ -wise independent, the random variables X y,j , for y ∈ S , are ⌈
12 ln k ⌉ -wise independent and Pr( X y,j = 1) = d . Thus, E [ X j ] = | S | · d . Since d = 2 d and 2 d − < k ln k ≤ d bydefinition, we have k ln k ≤ d < k ln k and ln k < E [ X j ] ≤ ln k . Applying Theorem 2.1 with µ = E [ X j ] and δ =
12 ln kE [ X j ] >
1, we get Pr( X j ≥ (1 + δ ) E [ X j ]) ≤ e − E [ X j ] δ/ = k . Since E [ X j ] ≤ ln k and δ =
12 ln kE [ X j ] , we have(1 + δ ) E [ X j ] ≤
13 ln k . Hence, Pr( X j ≥
13 ln k ) ≤ Pr( X j ≥ (1 + δ ) E [ X j ]) ≤ k . Let E denote the event that V i ∈ [ d ] − ( X i ≤
13 ln k ). By the union bound, we have Pr( E ) ≥ − d k ≥ − k ln k , where the last inequalityholds since d < k/ ln k .Assume that event E occurs, i.e., that | S j | ≤
13 ln k holds for j ∈ [ d ] − . Consider Step 4 in Algorithm 1 .Fix j ∈ [ d ] − , and let E j be the event that F j does not contain any perfect hash function w.r.t. S j . Let h bea hash function picked from H ′ u.a.r. Since | S j | ≤
13 ln k (by assumption), by Theorem 2.3, with probabilityat least 1 / h is perfect w.r.t. S j . Since F j consists of ⌈ k ⌉ hash functions chosen independently andu.a.r. from H ′ , we have Pr( E j ) ≤ (1 / ⌈ k ⌉ < k . Applying the union bound, we have Pr( ∪ j ∈ [ d ] − E j ) ≤ d k < k ln k . Let E ′ be the event that there exist d functions f , f , . . . , f d − such that f j ∈ F j and f j isperfect w.r.t. S j , j ∈ [ d ] − . Therefore, Pr( E ′ ) ≥ Pr( E )(1 − Pr( ∪ j ∈ [ d ] − E j )) ≥ − k ln k + k ln k ≥ − k ln k .Suppose that such a set { f , . . . , f d − } of functions exists. Let η ( q ) be the iteration number i in Step 5of Algorithm 2 during which f q ∈ F q is chosen, for q ∈ [ d ] − . We define the following (multi-)set B asfollows. For each q ∈ [ d ] − , and for element x ∈ S q , add to B the element g ( q, η ( q ) , x )) defined in Steps 5–6of Algorithm 2 (by { f, f , . . . , f k − } ). Observe that, by the definition of B , for every x ∈ S , there exists a ∈ B such that x ∈ T a . We will show next that B contains exactly k distinct elements, and that, for any a = b ∈ B , it holds that T a ∩ T b = ∅ . The above will show that the sets { T a | a ∈ B } satisfy conditions(1)–(3) of the theorem, thus proving the theorem.It suffices to show that for any two distinct elements of S , the corresponding elements added to B aredistinct. Let x and x be two distinct elements of S . Assume that x ∈ S j and x ∈ S l , where j, l ∈ [ d ] − .We distinguish two cases based on whether or not j = l .If j = l , we have g ( j, η ( j ) , x ) = j · d · d + ( η ( j ) − · d + f j ( x ) and g ( j, η ( j ) , x ) = j · d · d +( η ( j ) − · d + f j ( x ). Since f j is perfect w.r.t. S j , we have g ( j, η ( j ) , x ) = g ( j, η ( j ) , x ). Moreover, both g ( j, η ( j ) , x ) and g ( j, η ( j ) , x ) are in I j · d +( η ( j ) − (since 0 ≤ h j ( x ) , h j ( x ) < d ), where j · d + ( η ( j ) − ≤ ( d − · d + ( d − ∈ [ d · d ] − . By part (A) of Lemma 3.1, it holds that T g ( j,η ( j ) ,x ) ∩ T g ( j,η ( j ) ,x ) = ∅ .Suppose now that j = l . By definition of S j , S l , U j , U l , we have S j ⊆ U j and S l ⊆ U l . Consequently, g ( j, η ( j ) , x ) ∈ G ( U j ) and g ( l, η ( l ) , x ) ∈ G ( U l ) hold. By part (B) of Lemma 3.1, we have G ( U j ) ⊆ I ′ j and G ( U l ) ⊆ I ′ l . Therefore, g ( j, η ( j ) , x ) = g ( l, η ( l ) , x ). Moreover, T g ( j,η ( j ) ,x ) ∩ T g ( l,η ( l ) ,x ) = ∅ holds by part(B) of Lemma 3.1 as well. Theorem 3.3.
Algorithm 1 runs in space O ( k + (log k )(log | U | )) , and Algorithm 2 runs in space O (log k ) and in time polynomial in log | U | .Proof. In Algorithm 1 , since f is ⌈
12 ln k ⌉ -wise independent, storing f uses space O (ln k · max { u, d } ) = O ((log k )(log | U | )) (since k ≤ | U | ) by Theorem 2.4. Storing a universal hash function uses O (1) space, andthus storing { F , . . . , F d − } uses O ( d · d ) = O ( k ) space. Therefore, Algorithm 1 can be implemented inspace O ( k + (log k )(log | U | )).For Algorithm 2 , since G ( x ) contains exactly d elements, storing G ( x ) takes O ( d ) = O (ln k ) space. InStep 3, computing f ( x x y ) takes time polynomial in log | U | and log k by Theorem 2.4, since f is a ⌈
12 ln k ⌉ -wise independent hash function from { , } u to { , } d . Computing j in Step 3 takes time polynomial in d = O (log k ) since f ( x x y ) ∈ { , } d . Therefore, Step 3 can be performed in time polynomial in log | U | andlog k , and hence polynomial in log | U | (since k ≤ | U | ). Step 6 can be implemented in time polynomial inlog k , since | F j | = ⌈ k ⌉ . Altogether, Algorithm 2 takes time polynomial in log | U | . This completes theproof. 9 Dynamic Streaming Model
In this section, we present results on p-
Matching and p-
WT-Matching in the dynamic streaming model.The algorithm uses the toolkit developed in the previous section, together with the ℓ -sampling techniquediscussed in Section 2. We first give a high-level description of how the algorithm works.We will hash the vertices of the graph to a range R of size O ( k log k ). For each element ( e = uv, wt ( e ) , op ) ∈ S , we use the relation G , discussed in Section 3, and compute the two sets G ( u ) and G ( v ).For each i ∈ G ( u ) and each j ∈ G ( v ), we associate an instance of an ℓ -sampler primitive, call it C i,j,wt ( uv ) ,and update it according to the operation op . Recall that it is assumed that the weight of every edge doesnot change throughout the stream.The solution computed by the algorithm consists of a set of edges created by invoking each of the ˜ O ( W k ) ℓ -sampler algorithms to sample at most one edge from each C i,j,w , for each pair of i, j in the range R andeach edge-weight of the graph stream.The intuition behind the above algorithm (i.e., why it achieves the desired goal) is the following. Sup-pose that there exists a maximum-weight k -matching M in G , and let M = { u u , . . . , u k − u k − } . ByTheorem 3.2, w.h.p. there exist i , . . . , i k − in the range R such that u j ∈ T i j , for j ∈ [2 k − − , and suchthat the T i j ’s are pairwise disjoint. Consider the k ℓ -samplers C i j ,i j +1 ,wt ( u j u j +1 ) , where j ∈ [ k ] − . Then, w.h.p. , the k edges sampled from these k ℓ -samplers are the edges of a maximum-weight k -matching (sincethe T i j ’s are pairwise disjoint) whose weight equals that of M . Algorithm 3
The streaming algorithm A dynamic in the dynamic streaming model A dynamic -Preprocess: The preprocessing algorithmInput: n = | V ( G ) | and a parameter k ∈ N let C be a set of ℓ -sampling primitive instances and C = ∅ let { f, F , F , . . . , F d − } be the output of Algorithm 1 on input ( n, k ) A dynamic -Update: The update algorithmInput: The i -th update ( e i = uv, wt ( e ) , op ) ∈ S let G ( u ) be the output of Algorithm 2 on input ( u, k, { f, F , F , . . . , F d − } ) let G ( v ) be the output of Algorithm 2 on input ( v, k, { f, F , F , . . . , F d − } ) for i ∈ G ( u ) and j ∈ G ( v ) do if C i,j,wt ( uv ) / ∈ C then create the ℓ -sampler C i,j,wt ( uv ) feed h uv, op i to the ℓ -sampling algorithm C i,j,wt ( uv ) with parameter δ A dynamic -Query: The query algorithm after the i -th update let E ′ = ∅ for each C i,j,w ∈ C do apply the ℓ -sampler C i,j,w with parameter δ to sample an edge e if C i,j,w does not FAIL then set E ′ = E ′ ∪ { e } return a maximum-weight k -matching in G ′ = ( V ( E ′ ) , E ′ ) if any; otherwise, return ∅ Let S be a graph stream of a weighted graph G = ( V, E ) with W distinct weights, where W ∈ N , and let n = | V | and k ∈ N . Choose δ = k ln(2 k ) . Let A dynamic be the algorithm consisting of the sequence of threesubroutines/algorithms A dynamic -Preprocess , A dynamic -Update , and A dynamic -Query , where A dynamic -Preprocess is applied at the beginning of the stream, A dynamic -Update is applied after each operation,and A dynamic -Query is applied whenever the algorithm is queried for a solution after some update operation.Without loss of generality, and for convenience, we will assume that the algorithm is queried at the end ofthe stream S , even though the query could take place after any arbitrary operation. Lemma 4.1.
Let M ′ be the matching obtained by applying the algorithm A dynamic with A dynamic -Query invoked at the end of S . If G contains a k -matching then, with probability at least − k ln(2 k ) , M ′ is a aximum-weight k -matching of G .Proof. Suppose that G has a k -matching, and let M = { u u , . . . , u k − u k − } be a maximum-weight k -matching in G . (Note that we can assume that u j < u j +1 for every j ∈ [ k ] − ; see Section 2.) From Algorithm 1 and
Algorithm 2 , it follows that d = O ( k ln k ), d = O (ln k ), and d = O (ln k ). For i ∈ [ d · d · d ] − , let T i = { u ∈ V | i ∈ G ( u ) } . By Theorem 3.2, with probability at least 1 − k ) ln(2 k ) =1 − k ln(2 k ) , there exist i , i , . . . , i k − such that (1) u j ∈ T i j , j ∈ [2 k ] − , and (2) T i j ∩ T i l = ∅ for j = l ∈ [2 k ] − .Let E ′ be the above event. Then Pr( E ′ ) ≥ − k ln(2 k ) . By Step 6 of A dynamic -Update , u j u j +1 willbe fed into C i j ,i j +1 ,wt ( u j u j +1 ) for j ∈ [ k ] − . Hence, C i j ,i j +1 ,wt ( u j u j +1 ) is fed at least one edge for every j ∈ [ k ] − .Now, let us compute the sampling success probability in Step 4 of A dynamic -Query . Note that thisprobability involves both Step 6 of A dynamic -Update and Step 4 of A dynamic -Query . In Step 6 of A dynamic -Update and Step 4 of A dynamic -Query , we employ the ℓ -sampling primitive in Lemma 2.2. Let E bethe event that one edge is sampled successfully for each ℓ -sampler in { C i j ,i j +1 ,wt ( u j u j +1 ) | j ∈ [ k ] − } .By Lemma 2.2, one ℓ -sampler fails with probability at most δ . Hence, Pr( E ) ≥ − k · δ by the unionbound. Since δ = k ln(2 k ) , we get Pr( E ) ≥ − k ln(2 k ) . Hence, with probability at least 1 − k ln(2 k ) , E ′ will contain one edge e j sampled from C i j ,i j +1 ,wt ( u j u j +1 ) for each j ∈ [ k ] − . Note that e j may be u j u j +1 or any other edge with the same weight as u j u j +1 . Since T i j ∩ T i l = ∅ for every j = l ∈ [2 k ] − ,we have that the edges fed to C i a ,i a +1 ,wt ( u a u a +1 ) and C i b ,i b +1 ,wt ( u b u b +1 ) are vertex disjoint, for all a = b ∈ [ k ] − . Thus, { e , . . . , e k − } forms a maximum-weight k -matching of G . Applying the union bound,the probability that the graph G ′ contains a maximum-weight k -matching of G is at least 1 − Pr( ¯
E ∪ ¯ E ′ ) ≥ − Pr( ¯ E ′ ) − Pr( ¯ E ) ≥ − k ln(2 k ) − k ln(2 k ) = 1 − k ln(2 k ) . Since M ′ is a maximum-weight k -matchingof G ′ , M ′ is a maximum-weight k -matching of G as well. Theorem 4.2.
The algorithm A dynamic outputs a matching M ′ such that (1) if G contains a k -matchingthen, with probability at least − k ln(2 k ) , M ′ is a maximum-weight k -matching of G ; and (2) if G doesnot contain a k -matching then M ′ = ∅ . Moreover, the algorithm A dynamic runs in ˜ O ( W k ) space and has ˜ O (1) update time.Proof. First, observe that G ′ is a subgraph of G , since it consists of edges sampled from subsets of edges in G . Therefore, statement (2) in the theorem clearly holds true. Statement (1) follows from Lemma 4.1. Next,we analyze the update time of algorithm A dynamic .From Algorithm 1 and
Algorithm 2 , we have d = O ( k ln k ), d = O (ln k ), d = O (ln k ) and | F i | = O (ln k ) for i ∈ [ d ] − . Thus, |G ( u ) | = O (ln k ) holds for all u ∈ V . For the update time, it suffices toexamine Steps 1–6 of A dynamic -Update . By Theorem 3.3, Steps 1–2 take time polynomial in log n , whichis ˜ O (1). For Step 4, we can index C using a sorted sequence of triplets ( i, j, w ), where i, j ∈ [ d · d · d ] − and w ranges over all possible weights. Since d = O ( k ln k ), d = O (ln k ) and d = O (ln k ), we have | C | = O (( d · d · d ) · W ) = O ( W k ln k ). Using binary search on C , one execution of Step 4 takes time O (log W + log k ). Since |G ( u ) | = O (ln k ) for every u ∈ V , and since by Lemma 2.2 updating the sketch foran ℓ -sampler takes ˜ O (1) time, Steps 3–6 take time O (ln k ) · ( O (log W + log k ) + ˜ O (1)) = ˜ O (1). Therefore,the overall update time is ˜ O (1).Now, we analyze the space complexity of the algorithm. First, consider A dynamic -Preprocess . Obvi-ously, Step 1 uses O (1) space. Steps 1–2 use space O ( k + (log k )(log n )) (including the space used to store { f, F , . . . , F d − , C } ) by Theorem 3.3. Altogether, A dynamic -Preprocess runs in space O ( k +(log k )(log n )).Next, we discuss A dynamic -Update . Steps 1–2 take space O (ln k ) by Theorem 3.3. Observe that the spaceused in Steps 3–6 is dominated by the space used by the set C of ℓ -sampling primitive instances. ByLemma 2.2, one instance of an ℓ -sampling primitive uses space O (log n · log( δ − )). Since δ = k ln(2 k ) ,we have log( δ − ) = O (log k ). It follows that a single instance of an ℓ -sampler uses space O (log n · log k ).Since | C | = O ( W k ln k ), Steps 3–6 use space O ( W k log n log k ) = ˜ O ( W k ). Finally, consider A dynamic -Query . The space in Steps 1 – 4 is dominated by the space used by C and the space needed to storethe graph G ′ , and hence E ′ . By the above discussion, C takes space ˜ O ( W k ). Since at most one edge issampled from each ℓ -sampler instance and | C | = O ( W k ln k ), we have | E ′ | = | C | = O ( W k ln k ). Step 5utilizes space O ( | E ′ | ) [19, 20]. Therefore, A dynamic -Query runs in space ˜ O ( W k ). It follows that the spacecomplexity of A dynamic is ˜ O ( W k ). 11he space complexity of the above algorithm is large if the number of distinct weights W is large. Underthe same promise that the parameter k is at least as large as the size of any maximum matching in G , anapproximation scheme for p- WT-Matching that is more space efficient was presented in [8]. This schemeapproximates p-
WT-Matching to within ratio 1+ ǫ , for any ǫ >
0, and has space complexity ˜ O ( k ǫ − log W ′ )and update time ˜ O (1). The main idea behind this approximation scheme is to reduce the number of distinctweights in G by rounding each weight to the nearest power of 1 + ǫ .Using Theorem 4.2, and following the same approach in [8], we can obtain the same approximation resultas in [8], albeit without the reliance on such a strong promise: Theorem 4.3.
Let S be a graph stream of a graph G , and let W ′ = wt ( e ) /wt ( e ′ ) , where e ∈ E ( G ) is anedge with the maximum weight and e ′ ∈ E ( G ) is an edge with the minimum weight. Let < ǫ < . Inthe dynamic streaming model, there exists an algorithm for p- WT-Matching that computes a matching M ′ such that (1) if G contains a maximum-weight k -matching M , then with probability at least − k ln(2 k ) , wt ( M ′ ) > (1 − ǫ ) wt ( M ) ; and (2) if G does not contain a k -matching then M ′ = ∅ . Moreover, the algorithmruns in ˜ O ( k ǫ − log W ′ ) space and has ˜ O (1) update time.Proof. For each edge e ∈ E , round wt ( e ) and assign it a new weight of (1 + ǫ ) i such that (1 + ǫ ) i − < wt ( e ) ≤ (1 + ǫ ) i . Thus, there are O ( ǫ − log W ′ ) distinct weights after rounding. By Theorem 4.2, the space andupdate time are ˜ O ( k ǫ − log W ′ ) and ˜ O (1) respectively, and the success probability is at least 1 − k ln(2 k ) .Now we prove that wt ( M ′ ) > (1 − ǫ ) wt ( M ).Let e ∈ M and let e ′ be the edge sampled from the ℓ -sampler that e is fed to. It suffices to prove that wt ( e ′ ) > (1 − ǫ ) wt ( e ). Assume that wt ( e ) is rounded to (1 + ǫ ) i . Then, wt ( e ′ ) is rounded to (1 + ǫ ) i aswell. If wt ( e ′ ) ≥ wt ( e ), we are done; otherwise, (1 + ǫ ) i − < wt ( e ′ ) < wt ( e ) ≤ (1 + ǫ ) i . It follows that wt ( e ′ ) > (1 + ǫ ) i − ≥ wt ( e ) / (1 + ǫ ) > (1 − ǫ ) wt ( e ).The following theorem is a consequence of Theorem 4.2 (applied with W = 1): Theorem 4.4.
In the dynamic streaming model, there is an algorithm for p-
Matching such that, oninput ( S , k ) , the algorithm outputs a matching M ′ satisfying that (1) if G contains a k -matching then, withprobability at least − k ln(2 k ) , M ′ is a k -matching of G ; and (2) if G does not contain a k -matching then M ′ = ∅ . Moreover, the algorithm runs in ˜ O ( k ) space and has ˜ O (1) update time. Consider an undirected graphs G = ( V, E ) with a weight function wt : E ( G ) −→ R ≥ . We define amore general dynamic graph streaming model for an undirected graph G : G is given as a stream S =( e i , ∆ ( e i )) , . . . , ( e i j , ∆ j ( e i j )) , . . . of updates of the weights of the edges, where e i j is an edge and ∆ j ( e i j ) ∈ R , and a parameter k ∈ N . The j -th update ( e i j , ∆ j ( e i j )) updates the weight of e i j by setting wt ( e i j ) = wt ( e i j ) + ∆ j ( e i j ). We assume that wt ( · ) ≥ j . Initially, wt ( · ) = . This models allows theweight of an edge to dynamically change, and generalizes the dynamic graph streaming model in [8], where wt ( e ) is either 0 or a fixed value associated with e . In particular, each element ( e i j , ∆ j ( e i j )) in S is either( e i j , wt ( e i j )) or ( e i j , − wt ( e i j )), and ( e i j , wt ( e i j )) means to insert the edge e i j while ( e i j , − wt ( e i j )) representsthe deletion of the edge e i j .In this subsection, we prove a lower bound for the weighted k -matching problem in the more generaldynamic streaming model. This lower bound result holds even for parameter value k = 1. We prove thislower bound via a reduction from the problem of computing the function F ∞ of data streams defined asfollows:Given a data stream S ′ = x , x , . . . , x m , where each x i ∈ { , . . . , n ′ } , let c i = |{ j | x j = i }| denote thenumber of occurrences of i in the stream S ′ . Define F ∞ = max ≤ i ≤ n ′ c i . The following theorem appearsin [38]: Theorem 4.5 ([38]) . For every data stream of length m , any randomized streaming algorithm that computes F ∞ to within a (1 ± . factor with probability at least / requires space Ω(min { m, n ′ } ) . We remark that, approximating F ∞ to within a (1 ± .
2) factor means computing a number that iswithin (1 ± .
2) factor from F ∞ ; however, that approximate number may not correspond to the number ofoccurrences of a value in the stream. 12 heorem 4.6. For every dynamic graph streaming of length m for weighted graphs, any randomized stream-ing algorithm that, with probability at least / , approximates the maximum-weight -matching of the graphto a factor uses space Ω( { m, ( n − n − } ) .Proof. Given a data stream S ′ = x , x , . . . , x m , where each x i ∈ { , . . . , n ′ } , we define a graph stream S fora weighted graph G on n vertices, where n satisfies ( n − n − / < n ′ ≤ n ( n − /
2. Let V = { . . . , n − } be the vertex-set of G . We first define a bijective function χ : { ( i, j ) | i < j ∈ [ n ] − } −→ [ n ( n − ]. Let χ − bethe inverse function of χ . Then, we can translate S ′ to a general dynamic graph streaming S of underlyingweighted graph G by corresponding with x i the i -th element ( χ − ( x i ) ,
1) of S , for i ∈ [ m ]. Observe thatcomputing F ∞ of S ′ is equivalent to computing a maximum-weight 1-matching for the graph stream S of G . Let uv be a maximum-weight 1-matching of S , then χ ( uv ) is F ∞ of S ′ . By Theorem 4.5, it follows thatany randomized approximation streaming algorithm that approximates the maximum-weight 1-matching of G to a -factor with probability at least 2 / { m, ( n − n − } ), thus completing the proof. In this section, we give a streaming algorithm for p-WT-Matching , and hence for p-
Matching as a specialcase, in the insert-only model. We start by defining some notations.Given a weighted graph G = ( V = { , . . . , n − } , E ) along with the weight function wt : E ( G ) −→ R ≥ ,and a parameter k , we define a new function β : E ( G ) −→ R ≥ × [ n ] − × [ n ] − as follows: for e = uv ∈ E ,where u < v , let β ( e ) = ( wt ( e ) , u, v ). Observe that β is injective.Define a partial order relation ≺ on E ( G ) as follows: for any two distinct edges e, e ′ ∈ E ( G ), e ≺ e ′ if β ( e ) is lexicographically smaller than β ( e ′ ). For a vertex v ∈ V and an edge e incident to v , define Γ v tobe the sequence of edges incident to v , sorted in a decreasing order w.r.t. ≺ . We say that e is the i -heaviest edge w.r.t. v if e is the i -th element in Γ v .Let f : V −→ [4 k ] − be a hash function. Let H be a subgraph of G (possibly G itself). The function f partitions V ( H ) into the set of subsets V = { V , . . . , V r } , where each V i , i ∈ [ r ], consists of the verticesin V ( H ) that have the same image under f . A matching M in H is said to be nice w.r.t. f if no twovertices of M belong to the same part V i , where i ∈ [ r ], in V . If f is clear from the context, we willsimply write M is nice. We define the compact subgraph of H under f , denoted C ompact ( H, f ), as thesubgraph of H consisting of each edge uv in H whose endpoints belong to different parts, say u ∈ V i , v ∈ V j , i = j ∈ [ r ], and such that β ( uv ) is maximum over all edges between V i and V j ; that is, β ( uv ) =max { β ( u ′ v ′ ) | u ′ v ′ ∈ E ( H ) ∧ u ′ ∈ V i ∧ v ′ ∈ V j } . Finally, we define the reduced compact subgraph of H under f , denoted R ed - C om ( H, f ), by (1) selecting each edge uv ∈ C ompact ( H, f ) such that uv is among the 8 k heaviest edges (or all edges if there are not that many edges) incident to vertices in V i and among the 8 k heaviest edges incident to vertices in V j ; and then (2) letting q = k (16 k −
1) and retaining from the selectededges in (1) the q heaviest edges (or all edges if there are not that many edges). We have the following: Lemma 5.1.
The subgraph C ompact ( H, f ) has a nice k -matching if and only if R ed - C om ( H, f ) has a nice k -matching. Moreover, if C ompact ( H, f ) (and hence R ed - C om ( H, f ) ) has a nice k -matching, then the weightof a maximum-weight nice k -matching in C ompact ( H, f ) is equal to that in R ed - C om ( H, f ) .Proof. As discussed before, the function f partitions V ( H ) into the set V = { V , . . . , V r } , where each V i , i ∈ [ r ], consists of the vertices in V ( H ) that have the same image under f . Define the auxiliary weightedgraph Φ whose vertices are the parts V , . . . , V r , and such that there is an edge V i V j in Φ, i = j ∈ [ r ], if somevertex u ∈ V i is adjacent to some vertex v ∈ V j in C ompact ( H, f ); we associate with edge V i V j the value β ( uv ) and associate the edge uv with V i V j . Obviously, there is one-to-one correspondence between the nice k -matchings in C ompact ( H, f ) and the k -matchings of Φ. Let H be the subgraph of Φ formed by selectingeach edge V i V j , i = j ∈ [ r ], such that V i V j is among the 8 k heaviest edges (or all edges if there are not thatmany edges) incident to V i and among the 8 k heaviest edges (or all edges if there are not that many edges)incident to V j . Let H ′ consist of the q heaviest edges in H ; if H has at most q edges, we let H ′ = H . Sincethere is a one-to-one correspondence between the nice k -matchings in C ompact ( H, f ) and the k -matchingsof Φ, it suffices to prove the statement of the lemma with respect to matchings in Φ and H ′ ; namely, since H ′ is a subgraph of Φ, it suffices to show that: if Φ has a maximum-weight k -matching M then H ′ has amaximum-weight k -matching of the same weight as M .13uppose that Φ has a maximum-weight k -matching M . Choose M such that the number of edges in M that remain in H is maximized. We will show first that all the edges in M remain in H . Suppose not,then there is an edge V i V i ∈ M such that V i V i is not among the 8 k heaviest edges incident to one ofits endpoints, say V i . Since | V ( M ) | = 2 k < k , it follows that there is a heaviest edge V i V i incidentto V i such that β ( V i V i ) > β ( V i V i ) and V i / ∈ V M . If V i V i ∈ H , then ( M − V i V i ) + V i V i is amaximum-weight k -matching of Φ that contains more edges of H than M , contradicting our choice of M .It follows that V i V i / ∈ H . Then, V i V i is not among the 8 k heaviest edges incident to V i . Now applythe above argument to V i to select the heaviest edge V i V i such that β ( V i V i ) > β ( V i V i ) > β ( V i V i )and V i / ∈ V M . By applying the above argument j times, we obtain a sequence of j vertices V i , V i , . . . , V i j ,such that (1) { V i , . . . , V i j } ∩ V M = ∅ ; and (2) V i a = V i b for every a = b ∈ [ j ], which is guaranteed by β ( V i a V i a +1 ) < β ( V i a +1 V i a +2 ) < · · · < β ( V i b − V i b ) and V i a V i a +1 is the heaviest edge incident to V i a such that V i a +1 / ∈ V M . Since Φ is finite, the above process must end at an edge e not in M and such that β ( e ) exceeds β ( V i V i ), contradicting our choice of M . Therefore, M ⊆ E ( H ).Now, choose a maximum-weight k -matching of H that maximizes the number of edges retained in H ′ .Without loss of generality, call it M . We prove that the edges of M are retained in H ′ , thus proving the lemma.Suppose that this is not the case. Since each vertex in V ( M ) has degree at most 8 k and one of its edges mustbe in M , the number of edges in H incident to the vertices in M is at most 2 k (8 k −
1) + k = k (16 k −
1) = q .It follows that there is an edge e in H ′ whose endpoints are not in M and such that β ( e ) is larger than the β () value of some edge in M , contradicting our choice of M . Lemma 5.2.
Let f : V −→ [4 k ] − be a hash function, and let H be a subgraph of G . There is an algorithm Alg-Reduce( H , f ) that computes R ed - C om ( H, f ) and whose time and space complexity is O ( | H | + k ) .Proof. The algorithm
Alg-Reduce( H , f ) works as follows. First, for each v ∈ V ( H ), it computes f ( v ) anduses it to partition V ( H ) into V , . . . , V r , where each V i , i ∈ [ r ], consists of the vertices in V ( H ) that havethe same image under f . Clearly, the above can be done in time O ( | H | + k ) (e.g., using Radix sort). Then,it partitions the edges of H into groups E i,j , i = j ∈ [ r ], where each E i,j consists of all the edges in H thatgo between V i and V j . Clearly, this can be done in O ( | H | ) time (e.g., using Radix sort on the labels of thepairs of parts containing the edges). From each group E i,j , among all edge in E i,j , the algorithm retainsthe edge uv corresponding to the maximum value β ( uv ), which clearly can be done in O ( | H | ) time. Next,the algorithm groups the remaining edges into (overlapping) groups, where each group E i consists of all theedges (among the remaining edges) that are incident to the same part V i , for i ∈ [ r ]; note that each edgeappears in exactly two such groups. The algorithm now discards every edge uv , where u ∈ V i , v ∈ V j , ifeither uv is not among the heaviest 8 k edges in E i or is not among the 8 k heaviest edges in E j . The abovecan be implemented in O ( | H | ) time by applying a linear-time ((8 k )-th order) selection algorithm [11] to each E i to select the (8 k )-th heaviest edge in E i , and then discard all edges of lighter weight from E i ; an edgeis kept if it is kept in both groups that contain its endpoints (which can be easily done, e.g., using a Radixsort). Finally, invoking a q -th order selection algorithm [11], where q = k (16 k − q edges among the remaining edges; those edges form R ed - C om ( H, f ). The above algorithm runs in time O ( | H | + k ), and its space complexity is dominated by O ( | H | + k ) as well. This completes the proof.We now present the streaming algorithm A Insert for p-WT-Matching . Let ( S , k ) be an instance ofp- WT-Matching , where S = ( e , wt ( e )) , . . . , ( e i , wt ( e i )) , . . . is a stream of edge-insertions for a graph G .For i ∈ N , let G i be the subgraph of G consisting of the first i edges e , . . . , e i of S , and for j ≤ i , let G j,i be the subgraph of G whose edges are { e j , . . . , e i } ; if j > i , we let G j,i = ∅ . Let f be a hash functionchosen u.a.r. from a universal set H of hash functions mapping V to [4 k ] − . The algorithm A Insert , afterprocessing the i -th element ( e i , wt ( e i )), computes two subgraphs G fi , G si defined as follows. For i = 0, define G fi = G si = ∅ . Suppose now that i >
0. Define ˆ i to be the largest multiple of q that is smaller than i , thatis, i = ˆ i + p , where 0 < p ≤ q ; and define i ∗ as the largest multiple of q that is smaller than ˆ i if ˆ i >
0, and 0otherwise (i.e., i ∗ = 0 if ˆ i = 0). The subgraph G fi is defined only when i is a multiple of q (i.e., i = j · q where j ≥ i = j · q > G fi = R ed - C om ( G f ˆ i ∪ G i ∗ +1 , ˆ i ); that is, G fi is the reducedcompact subgraph of the graph consisting of G f ˆ i plus the subgraph consisting of the edges encountered after e i ∗ , starting from e i ∗ +1 up to e ˆ i . The subgraph G si is defined as G si = G f ˆ i ∪ G i ∗ +1 ,i ; that is, G si consists of14he previous (before i ) reduced compact subgraph plus the subgraph consisting of the edges starting after i ∗ up to i . We refer to Figure 2 for an illustration of the definitions of G fi and G si . Observation 5.3.
For each i that is a multiple of q , G fi contains at most q edges (by the definition of areduced compact subgraph). Observation 5.4.
For each i , G si contains at most q edges. Lemma 5.5.
For each i ≥ , if G i contains a maximum-weight k -matching, then with probability at least / , G si contains a maximum-weight k -matching of G i .Proof. Let M = { u u , . . . , u k − u k − } be a maximum-weight k -matching in G i , and let V M = { u , . . . , u k − } .Since f is a hash function chosen u.a.r. from a universal set H of hash functions mapping V to [4 k ] − , byTheorem 2.3, with probability at least 1 / f is perfect w.r.t. V M . Now, suppose that f is perfect w.r.t. V M ,and hence, we have f ( u j ) = f ( u l ) for every j = l ∈ [2 k ] − . Thus, M is a nice matching (w.r.t. f ) in G i . Bythe definition of C ompact ( G i , f ), there is a set M ′ of k edges M ′ = { u ′ u ′ , . . . , u ′ k − u ′ k − } in C ompact ( G i , f )such that { f ( u ′ i ) , f ( u ′ i +1 ) } = { f ( u i ) , f ( u i +1 ) } and β ( u ′ i u ′ i +1 ) ≥ β ( u i u i +1 ) for i ∈ [ k ] − . It follows that wt ( u ′ i u ′ i +1 ) ≥ wt ( u i u i +1 ) for i ∈ [ k ] − . Therefore, C ompact ( G i , f ) contains a maximum-weight k -matchingof G i , namely { u ′ u ′ , . . . , u ′ k − u ′ k − } ; moreover, this matching is nice. By Lemma 5.1, R ed - C om ( G i , f ) con-tains a maximum-weight k -matching of C ompact ( G i , f ).Next, we prove that G si contains a maximum-weight k -matching of G i . If i ≤ q , then G si = G ,i = G i by definition, and hence G si contains a maximum-weight k -matching of G i . Suppose now that i > q . Bydefinition, G si = G f ˆ i ∪ G i ∗ +1 ,i . (Recall that, by definition, G fq = ∅ , G f q = R ed - C om ( G fq ∪ G ,q ), G f q = R ed - C om ( G f q ∪ G q +1 , q ) , . . . , G f ˆ i = R ed - C om ( G fi ∗ ∪ G i ∗ − q +1 ,i ∗ ).) For each j ≥ q , let G j be the graph consisting of the edges that are in G f ˆ j ∪ G j ∗ +1 , ˆ j but are not kept in G fj . Consequently,( S q ≤ j< ˆ i,j is a multiple of q G j ) S G f ˆ i = G i ∗ . By the definition of R ed - C om ( G i , f ), it is easy to verify that R ed - C om ( G i , f ) does not contain the edges in G j , for each j ≥
1. It follows that R ed - C om ( G i , f ) is asubgraph of G si , and hence, G si contains a maximum-weight k -matching of R ed - C om ( G i , f ), and hence of C ompact ( G i , f ) by the above discussion. Since C ompact ( G i , f ) contains a maximum-weight k -matching of G i , G si contains a maximum-weight k -matching of G i . It follows that, with probability at least 1 / G si contains a maximum-weight k -matching of G i . . . . . . e e q e q e i ∗ e i ∗ +1 e ˆ i e i q q i ∗ i ∗ +1 ˆ i i = jq R ed - C om ( G i ∗ +1 , ˆ i ∪ G f ˆ i ) = G fi (a) The definition of G fi . . . . . e e q e q e i ∗ e i ∗ +1 e ˆ i e i q q i ∗ i ∗ +1 ˆ i i = jqG i ∗ +1 ,i ∪ G f ˆ i = G si (b) The definition of G si . Figure 2: Illustration of the definitions of G fi and G si .The algorithm A Insert , when queried at the end of the stream, either returns a maximum-weight k -matching of G or the empty set. To do so, at every instant i , it will maintain a subgraph G si that will15ontain the edges of the desired matching, from which this matching can be extracted. To maintain G si ,the algorithm keeps track of the subgraphs G si − , G f ˆ i , the edges e i ∗ +1 , . . . , e i , and will use them in thecomputation of the subgraph G si as follows. If i is not a multiple of q , then G si = G si − + e i , and thealgorithm simply computes G si as such. Otherwise (i.e., i is a multiple of q ), G si = G f ˆ i ∪ G i ∗ +1 ,i , and thealgorithm uses G f ˆ i and G i ∗ +1 ,i = { e i ∗ +1 , . . . , e i } to compute and return G si ; however, in this case (i.e., i isa multiple of q ), the algorithm will additionally need to have G fi already computed, in preparation for thepotential computations of subsequent G sj , for j ≥ i . By Lemma 5.2, the subgraph G fi can be computedby invoking the Alg-Reduce in Lemma 5.2 on G f ˆ i ∪ G i ∗ +1 , ˆ i , which runs in time O ( q ). Note that both G f ˆ i and G i ∗ +1 , ˆ i are available to A ′ at each of the steps ˆ i + 1 , . . . , i . Therefore, the algorithm will staggerthe O ( q ) many operations needed for the computation of G fi uniformly (roughly equally) over each of thesteps ˆ i + 1 , . . . , i , yielding an O (1) operations per step. Note that all the operations in Alg-Reduce canbe explicitly listed, and hence, splitting them over an interval of q steps is easily achievable. Suppose that Alg-Reduce is split into q operations Λ , . . . , Λ q such that each operation takes time O (1). The algorithm A ′ is given in Figure 4: Algorithm 4
The streaming algorithm A Insert in the insert-only streaming model A Insert -Preprocessing: The preprocessing algorithmInput: n = | V ( G ) | and a parameter k ∈ N let f ∈ u.a.r. H , where H is a set of universal hash functions from V to [4 k ] − A Insert -Update: The update algorithmInput:
The i -th element ( e i = uv, wt ( e )) ∈ S let j = i mod q if j is 0 then execute Λ q G si = G f ˆ i ∪ G i ∗ +1 ,i else execute Λ j G si = G si − ∪ e i A Insert -Query: An algorithm to answer query after the i -th update return a maximum-weight k -matching in G si if any; otherwise, return ∅ Lemma 5.6.
The algorithm A Insert runs in space O ( k ) and has update time O (1) .Proof. A Insert -Preprocessing takes O (1) space to store f . The space needed for A Insert -Update is domi-nated by that needed for storing G si , G fi , G i ∗ +1 ,i , G i ∗ +1 , ˆ i , and the space needed to execute Alg-Reduce . Bythe definition of the subgraphs G si , G fi , G i ∗ +1 ,i , G i ∗ +1 , ˆ i , each has size O ( q ), and hence can be stored using O ( q ) = O ( k ) space. By Lemma 5.2, Alg-Reduce runs in space O ( k ). Hence, the overall space complexityof A ′ is O ( k ). A Insert -Query takes space O ( q ) = O ( k ) [19, 20], since G si contains at most O ( q ) edges.Altogether, A Insert runs in space O ( k ).For the update time, as discussed above, we can take the operations performed by Algorithm Alg-Reduce to compute G fi , during any interval of q steps, and stagger them uniformly over the q steps of theinterval. Since Alg-Reduce performs O ( q ) many operations to compute G fi by Lemma 5.2, this yields an O (1) operations per step. It follows that the update time of A Insert is O (1), thus completing the proof.Without loss of generality, and for convenience, we will assume that the algorithm is queried at the endof the stream S , even though the query could take place after any arbitrary operation i .16 heorem 5.7. Let < δ < be a parameter. In the insert-only streaming model, there is an algorithmfor p-WT-Matching such that, on input ( S , k ) , the algorithm outputs a matching M ′ satisfying that (1)if G contains a k -matching then, with probability at least − δ , M ′ is a maximum-weight k -matching of G ;and (2) if G does not contain a k -matching then M ′ = ∅ . The algorithm runs in O ( k log δ ) space and has O (log δ ) update time. In particular, for any constant δ , the algorithm runs in space O ( k ) and has O (1) update time.Proof. Run ⌈ log δ ⌉ -many copies of algorithm A Insert in parallel (i.e., using dove-tailing). Then, by the endof the stream, there are ⌈ log δ ⌉ copies of G sm , where m is the length of the stream. Let G ′ be the union of allthe G sm ’s produced by the runs of A Insert . If G ′ has a k -matching, let M ′ be a maximum-weight k -matchingof G ′ ; otherwise, let M ′ = ∅ .By Lemma 5.5, if G m , i.e., G , contains a maximum-weight k -matching, with probability at least 1 /
2, onecopy of G sm contains a maximum-weight k -matching of G . Hence, with probability at least 1 − (1 / ⌈ log δ ⌉ ≥ − δ , G ′ contains a maximum-weight k -matching of G . It follows that if G contains a maximum-weight k -matching M then, with probability at least 1 − δ , G ′ contains a maximum-weight k -matching of the sameweight as M and hence M ′ is a maximum-weight k -matching of G .Observe that the graph G ′ is a subgraph of G . Therefore, statement (2) in the theorem clearly holdstrue.By Lemma 5.6, the above algorithm runs in space O ( k log δ ) and has update time O (log δ ), thuscompleting the proof. In this paper, we presented streaming algorithms for the k -matching problem in both the dynamic andinsert-only streaming models. Our results improve previous works and achieve optimal space and update-time complexity. Our result for the weighted k -matching problem was achieved using a newly-developedstructural result that is of independent interest.An obvious open question that ensues from our work is whether the dependency on the number of distinctweights W in our result, and the result in [8] as well, for weighted k -matching in the dynamic streamingmodel can be lifted. More specifically, does there exist a dynamic streaming algorithm for p-WT-Matching whose space complexity is ˜ O ( k ) and update time is ˜ O (1)? We leave this as an (important) open questionfor future research. 17 eferences [1] Kook Jin Ahn and Sudipto Guha. Linear programming in the semi-streaming model with applicationto the maximum matching problem. Information and Computation , 222:59–79, 2013.[2] Kook Jin Ahn, Sudipto Guha, and Andrew McGregor. Graph sketches: sparsification, spanners, andsubgraphs. In
Proceedings of the 31st ACM Symposium on Principles of Database Systems (PODS ’12) ,pages 5–14, 2012.[3] KookJin Ahn, Graham Cormode, Sudipto Guha, Andrew McGregor, and Anthony Wirth. Correlationclustering in data streams. In
Proceedings of the 32nd International Conference on Machine Learning(ICML ’15) , pages 2237–2246, 2015.[4] Noga Alon, Yossi Matias, and Mario Szegedy. The space complexity of approximating the frequencymoments.
Journal of Computer and System Sciences , 58(1):137 – 147, 1999.[5] Sepehr Assadi, Sanjeev Khanna, and Yang Li. On estimating maximum matching size in graph streams.In
Proceedings of the 2017 Annual ACM-SIAM Symposium on Discrete Algorithms (SODA ’17) , pages1723–1742, 2017.[6] Sepehr Assadi, Sanjeev Khanna, Yang Li, and Grigory Yaroslavtsev. Maximum matchings in dynamicgraph streams and the simultaneous communication model. In
Proceedings of the Twenty-SeventhAnnual ACM-SIAM Symposium on Discrete Algorithms (SODA ’16) , pages 1345–1364, 2016.[7] Jianer Chen, Ying Guo, and Qin Huang. Linear-time parameterized algorithms with limited localresources. arXiv preprint arXiv:2003.02866 , 2020.[8] Rajesh Chitnis, Graham Cormode, Hossein Esfandiari, MohammadTaghi Hajiaghayi, Andrew McGre-gor, Morteza Monemizadeh, and Sofya Vorotnikova. Kernelization via sampling with applications tofinding matchings and related problems in dynamic graph streams. In
Proceedings of the 27th annualACM-SIAM symposium on Discrete algorithms (SODA ’16) , pages 1326–1344, 2016.[9] Rajesh Chitnis, Graham Cormode, Hossein Esfandiari, MohammadTaghi Hajiaghayi, and Morteza Mon-emizadeh. New streaming algorithms for parameterized maximal matching and beyond. In
Proceedingsof the 27th ACM Symposium on Parallelism in Algorithms and Architectures (SPAA ’15) , pages 56–58,2015.[10] Rajesh Chitnis, Graham Cormode, Mohammad Taghi Hajiaghayi, and Morteza Monemizadeh. Parame-terized streaming: Maximal matching and vertex cover. In
Proceedings of the 2015 Annual ACM-SIAMSymposium on Discrete Algorithms (SODA ’15) , pages 1234–1251, 2015.[11] Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein.
Introduction to algo-rithms . MIT press, 2009.[12] Graham Cormode and Donatella Firmani. A unifying framework for ℓ -sampling algorithms. Distributedand Parallel Databases , 32(3):315–335, 2014.[13] Michael Crouch and Daniel S. Stubbs. Improved streaming algorithms for weighted matching, viaunweighted matching. In
Proceedings of APPROX/RANDOM ’14 , volume 28, pages 96–104, 2014.[14] Marek Cygan, Fedor V. Fomin, Lukasz Kowalik, Daniel Lokshtanov, Daniel Marx, Marcin Pilipczuk,Michal Pilipczuk, and Saket Saurabh.
Parameterized Algorithms . Springer, 2015.[15] Leah Epstein, Asaf Levin, Juli´an Mestre, and Danny Segev. Improved approximation guarantees forweighted matching in the semi-streaming model.
SIAM Journal on Discrete Mathematics , 25(3):1251–1265, 2011.[16] Stefan Fafianie and Stefan Kratsch. Streaming kernelization. In
Proceedings of the 39th InternationalSymposium on Mathematical Foundations of Computer Science 2014 (MFCS ’14) , pages 275–286, 2014.1817] Joan Feigenbaum, Sampath Kannan, Andrew McGregor, Siddharth Suri, and Jian Zhang. On graphproblems in a semi-streaming model.
Theoretical Computer Science , 348(2-3):207–216, 2005.[18] Joan Feigenbaum, Sampath Kannan, Martin J. Strauss, and Mahesh Viswanathan. An approximatel1-difference algorithm for massive data streams.
SIAM Journal on Computing , 32(1):131–151, January2003.[19] Harold N. Gabow. Data structures for weighted matching and nearest common ancestors with linking.In
Proceedings of the First Annual ACM-SIAM Symposium on Discrete Algorithms (SODA ’90) , page434–443. SIAM, 1990.[20] Harold N. Gabow. Data structures for weighted matching and extensions to b-matching and f-factors.
ACM Transactions on Algorithms , 14(3), 2018.[21] Ashish Goel, Michael Kapralov, and Sanjeev Khanna. On the communication and streaming complexityof maximum bipartite matching. In
Proceedings of the Twenty-Third Annual ACM-SIAM Symposiumon Discrete Algorithms (SODA ’12) , pages 468–485, 2012.[22] Ashish Goel, Michael Kapralov, and Ian Post. Single pass sparsification in the streaming model withedge deletions. arXiv preprint arXiv:1203.4900 , 2012.[23] Piotr Indyk. Stable distributions, pseudorandom generators, embeddings, and data stream computation.
Journal of the ACM , 53(3):307–323, May 2006.[24] Hossein Jowhari, Mert Sa˘glam, and G´abor Tardos. Tight bounds for lp samplers, finding duplicatesin streams, and related problems. In
Proceedings of the Thirtieth ACM SIGMOD-SIGACT-SIGARTSymposium on Principles of Database Systems (PODS ’11) , page 49–58, 2011.[25] Michael Kapralov. Better bounds for matchings in the streaming model. In
Proceedings of the Twenty-Fourth Annual ACM-SIAM Symposium on Discrete Algorithms (SODA ’13) , pages 1679–1697, 2013.[26] Michael Kapralov, Sanjeev Khanna, and Madhu Sudan. Approximating matching size from randomstreams. In
Proceedings of the Twenty-Fifth Annual ACM-SIAM Symposium on Discrete Algorithms(SODA ’14) , pages 734–751, 2014.[27] Michael Kapralov and David Woodruff. Spanners and sparsifiers in dynamic streams. In
Proceedings ofthe 2014 ACM Symposium on Principles of Distributed Computing (PODC ’14) , pages 272–281, 2014.[28] Christian Konrad, Fr´ed´eric Magniez, and Claire Mathieu. Maximum matching in semi-streaming withfew passes. In
Proceedings of APPROX/RANDOM ’12 , pages 231–242, 2012.[29] Christian Konrad and Adi Ros´en. Approximating semi-matchings in streaming and in two-party com-munication. In
Proceedings of the 40th International Colloquium on Automata, Languages, and Pro-gramming (ICALP ’13) , pages 637–649, 2013.[30] Viatcheslav Korenwein, Andr´e Nichterlein, Rolf Niedermeier, and Philipp Zschoche. Data reductionfor maximum matching on real-world graphs: Theory and experiments. In , volume 112 of
LIPIcs , pages 53:1–53:13, 2018.[31] Roie Levin and David Wajc. Streaming submodular matching meets the primal-dual method. arXivpreprint arXiv:2008.10062 , to appear in
SODA ’21 , 2020.[32] Andrew McGregor. Finding graph matchings in data streams. In
Proceedings of APPROX/RANDOM’05 , pages 170–181, 2005.[33] Andrew McGregor. Graph stream algorithms: A survey.
SIGMOD Record , 43(1):9–20, May 2014.[34] George B. Mertzios, Hendrik Molter, Rolf Niedermeier, Viktor Zamaraev, and Philipp Zschoche. Com-puting maximum matchings in temporal graphs. In , volume 154 of
LIPIcs , pages 27:1–27:14, 2020.1935] George B. Mertzios, Andr´e Nichterlein, and Rolf Niedermeier. A linear-time algorithm for maximum-cardinality matching on cocomparability graphs.
SIAM Journal on Discrete Mathematics , 32(4):2820–2835, 2018.[36] George B. Mertzios, Andr´e Nichterlein, and Rolf Niedermeier. The power of linear-time data reductionfor maximum matching.
Algorithmica , 82(12):3521–3565, 2020.[37] Ami Paz and Gregory Schwartzman. A (2+ ǫ )-approximation for maximum weight matching in thesemi-streaming model. ACM Transaction on Algorithms , 15(2):18:1–18:15, 2019.[38] Tim Roughgarden. Communication complexity (for algorithm designers). arXiv preprintarXiv:1509.06257 , 2015.[39] Jeanette P. Schmidt, Alan Siegel, and Aravind Srinivasan. Chernoff-hoeffding bounds for applicationswith limited independence.
SIAM Journal on Discrete Mathematics , 8(2):223–250, 1995.[40] Robert E. Tarjan.
Data structures and network algorithms . SIAM, 1983.[41] Salil P. Vadhan. Pseudorandomness.
Foundations and Trends in Theoretical Computer Science , 7(1-3):1–336, 2012.[42] Mariano Zelke. Weighted matching in the semi-streaming model.