Beating Two-Thirds For Random-Order Streaming Matching
BBeating Two-Thirds For Random-Order Streaming Matching
Sepehr Assadi ∗ Soheil Behnezhad † Abstract
We study the maximum matching problem in the random-order semi-streaming setting. Inthis problem, the edges of an arbitrary n -vertex graph G = ( V, E ) arrive in a stream one by oneand in a random order. The goal is to have a single pass over the stream, use O ( n · polylog( n ))space, and output a large matching of G .We prove that for an absolute constant ε >
0, one can find a (2 / ε )-approximatemaximum matching of G using O ( n log n ) space with high probability. This breaks the naturalboundary of 2 / / ∗ ( [email protected] ) Department of Computer Science, Rutgers University. Research supported in bypart by the NSF CAREER award CCF-2047061. † ( [email protected] ) Department of Computer Science, University of Maryland. Research supported by GooglePh.D. Fellowship. i a r X i v : . [ c s . D S ] F e b ontents ( / ) -Approximation Early On 5 A Tools from Information Theory 35 ii Introduction
A matching in a graph G = ( V, E ) is any collection of vertex-disjoint edges and in the maximummatching problem, we are interested in finding a matching of largest size in G . This problem hasbeen a cornerstone of algorithmic research and its study has led to numerous breakthrough resultsin theoretical computer science. In this paper, we study the maximum matching problem in the semi-streaming model of computation [FKM +
05] defined as follows.
Definition 1.1.
Given a graph G = ( V, E ) with n vertices V = { , . . . , n } and m edges in E presented in a stream S = (cid:104) e , . . . , e m (cid:105) , a semi-streaming algorithm makes a single pass over thestream of edges S and uses O ( n · polylog( n )) space, measured in words of size Θ(log n ) bits, and atthe end outputs an approximate maximum matching of G . The greedy algorithm for maximal matching gives a simple / -approximation algorithm to thisproblem in O ( n ) space. When the stream of edges is adversarially ordered, this is simply the bestresult known for this problem, while it is also known that a better than ∼ . random order streams . This line of work was pioneered in [KMM12] whoshowed that the / -approximation of greedy can be broken in this case and obtained an algorithmwith approximation ratio ( / + 0 . +
20] followed up on theapproach of [KMM12] and improved the approximation ratio all the way to 6 /
11 [FHM + +
19] built on the sparsification approach of [BS15,BS16] in dynamic graphs to achievean (almost) / -approximation but at the cost of (cid:101) O ( n . ) space, which is no longer semi-streaming.A beautiful work of [Ber20] then obtained a semi-streaming (almost) / -approximation by showinghow a generalization of the sparsification approach in [ABB +
19] can be found in (cid:101) O ( n ) space.The / -approximation ratio of the algorithm of [Ber20] is the best possible among all priortechniques for this problem: the first line of attack in [KMM12,Kon18,GKMS19,FHM +
20] is basedon finding length-3 augmenting paths and even finding all these paths does not lead to a better-than- / -approximation . The second line in [ABB +
19, Ber20] is based on finding an edge-degreeconstrained subgraph (EDCS) which hits the same exact barrier as there are graphs whose EDCSdoes not provide a better than / -approximation (see [BS15]). Finally, even for an algorithmicallyeasier variant of this problem, the one-way communication problem, which roughly corresponds toonly measuring the space of the algorithm when crossing the midpoint of the stream, the best knownapproximation ratio is still / which is known to be tight for adversarial orders/partitions [GKK12].Given this state-of-affairs, the / -approximation ratio for random-order streaming matching hasemerged as natural barrier [Kon18, Ber20]. In particular, [Ber20] posed obtaining a ( / + Ω(1))-approximation to this problem as an important open question. We resolve this question in theaffirmative in our work. Our main result is a semi-streaming algorithm for maximum matching in random-order streamswith approximation ratio strictly-better-than- / . The work of [FHM +
20] also considers length-5 augmenting paths. However, these paths are used instead of length-3 paths “missed” by the algorithm not in addition to length-3 paths and thus the same shortcoming persists. heorem 1 (Main Result) . Let G be an n -vertex graph whose edges arrive in a random-orderstream. For an absolute constant ε > , there is a single-pass streaming algorithm that obtainsa ( + ε ) -approximate maximum matching of G using O ( n log n ) space with high probability. Theorem 1 breaks the / -barrier of all prior work in [KMM12,Kon18,GKMS19,ABB + + / is minuscule in this theorem (while wedid not optimize for constants, the bound on ε is only ∼ − at this point), it still proves that( ⁄ )-approximation is not the “right” answer to this problem. This is in contrast to some otherproblems of similar flavor such as one-way communication complexity of matching (on adversarialpartitions) [GKK12, AB19] or the fault-tolerant matching problem [AB19] which are both solvedusing similar techniques (see the unifying framework of [AB19] based on EDCS) and for both / -approximation is provably best possible. Beyond ( / ) -approximation. Breaking this / -barrier naturally raises the question on whatis the right bound on the approximation ratio of random-order streaming matching. In particular,is (1 − ε )-approximation possible? We make progress toward settling this question by showingthat no “truly” space-efficient algorithm exists for this latter problem: there is provably no semi-streaming matching algorithm even on bipartite graphs that can achieve a (1 − ε )-approximationin O (exp((1 /ε ) . ) · n · polylog( n )) space; in other words, if one hopes for achieving a (1 − ε )-approximation, an exponential dependence on (1 /ε ) in the space is unavoidable (see Corollary 5.1).As the main focus of our work is on the algorithm in Theorem 1, we postpone the details andthe ideas behind this result to Section 5. Prior work.
As stated earlier, there has been two main lines of attack on the streaming matchingproblem in random-order streams. The first approach aims to find a large matching of the graph G early on in the stream, and then spends the rest of the stream augmenting this matching. Forinstance, [KMM12] showed that in order for the greedy algorithm to fail to find a better-than- / -approximation, the algorithm should necessarily pick many “wrong” edges early on in the stream.As such, in instances where greedy is not beating the / -approximation itself, we already have analmost / -approximation by the middle of the stream, and we can thus focus on augmenting thismatching in the remainder half to beat / -approximation. The work of [Kon18] then improved thisresult further by showing that a modified greedy algorithm, when unsuccessful in obtaining a largematching itself, finds an almost / -approximation when only o (1)-fraction of the stream has passed(as opposed to middle), which gives us more room for augmentation. Finally, [FHM +
20] built onthis approach and further improved the augmentation phase.The second approach to this problem was based on obtaining an EDCS, a subgraph definedby [BS15, BS16] and studied further in [AB19], that acts as a “matching sparsifier”. On a highlevel, an EDCS is a sparse subgraph satisfying the following two constraints: ( i ) edge-degree ofedges in the EDCS cannot be “high”, while ( ii ) edge-degree of missing edges cannot be “low”.These constraints ensure that an EDCS always contains an almost / -approximate matching of thegraph and has additional robustness properties [BS15, BS16, ABB +
19, AB19, Ber20]. For instance,[ABB +
19] proved that union of several EDCS computed on different parts of a random stream, isitself an EDCS for the entire stream. This allowed them to compute an EDCS of the input in (cid:101) O ( n . )space and directly obtain their almost / -approximation. Finally, [Ber20] gave an elegant proof thatweakening the requirement of EDCS allows one to still preserve the almost / -approximation butnow recover this subgraph in only O ( n log n ) space. More specifically, the algorithm of [Ber20] first2nds a subgraph only satisfying property ( i ) of the EDCS in the first o (1) fraction of the stream,and then picks all (potentially) necessary edges for satisfying property ( ii ) in the remainder; theproof then shows that this set of potentially necessary edges is of size only O ( n log n ). Our work.
Our approach can be seen as a natural combination of these two mostly disjointlines of work. The first part comes from a better understanding of EDCS. We present a roughcharacterization of when an EDCS cannot beat the / -approximation, which shows that in theseinstances, we can effectively ignore the second constraint of EDCS. As a result, we obtain thatthe only way for the algorithm of [Ber20] to fail to achieve a better-than- / -approximation, is ifit already picks an almost / -approximation in the first o (1) fraction of the stream. Note thatthis is conceptually similar to the first line of work on random-order streaming matching, but thetechniques are entirely disjoint. In particular, our proof is a deterministic property of EDCS not arandomized property of a greedy algorithm on a particular ordering.We are now in the familiar territory of having a large matching very early on in the stream,and we can spend the remainder of the stream augmenting it. The main difference however is thatstarting from an almost / -approximation matching, there is essentially no length-3 paths for usto augment and we instead need to handle length-5 augmenting paths. The key challenge is to findthe middle edge of these length-5 augmenting paths. Indeed, we note that the / -approximationlower bound of [GKK12] for adversarial order streams gives away a / -approximate matching earlyon for free, yet it is provably impossible to augment it in the remainder of the stream using asemi-streaming algorithm. To get around this, we crucially use the random arrival assumptionagain. Particularly, we regard any length-5 augmenting path whose middle edge arrives after itstwo endpoint edges as a “discoverable” path and then find a constant fraction of such paths. Sincethe edges arrive in a random order, a constant fraction of length-5 augmenting will be discoverableand thus we are able to beat / -approximation in our setting. General notation.
For a graph G = ( V, E ) and v ∈ V , we use deg G ( v ) to denote the degree of v in G and N G ( v ) to denote the neighborset of v (when clear from the context, we may drop thesubscript G ). For any edge e = ( u, v ) ∈ E , we define the edge-degree of e in G as deg( u ) + deg( v ).We use µ ( G ) to denote the size (i.e., the number of edges) of the maximum matching in G .For integer k ≥ p ∈ [0 , B ( k, p ) to denote the binomial distribution withparameters k and p . That is, B ( k, p ) is the discrete probability distribution of the number ofsuccessful experiments out of k experiments each with an independent probability p of success. Random-order streams.