[PDF] Breaking the n -Pass Barrier: A Streaming Algorithm for Maximum Weight Bipartite Matching

Abstract

Given a weighted bipartite graph with n vertices and m edges, the maximum weight bipartite matching problem is to find a set of vertex-disjoint edges with the maximum weight. This classic problem has been extensively studied for over a century. In this paper, we present a new streaming algorithm for the maximum weight bipartite matching problem that uses O ˜ (n) space and O ˜ ( m − − √ ) passes, which breaks the n -pass barrier. All the previous algorithms either require Ω(nlogn) passes or only find an approximate solution. To achieve this pass bound, our algorithm combines a number of techniques from different fields such as the interior point method (IPM), symmetric diagonally dominant (SDD) system solving, the isolation lemma, and LP duality. To the best of our knowledge, this is the first work that implements the SDD solver and IPM in the streaming model in O ˜ (n) spaces for graph matrix. All the previous IPMs only focus on optimizing the running time, regardless of the space usage. The LP solver for general matrix is impossible to be implemented in O ˜ (n) spaces.

Full PDF

BBreaking the n -Pass Barrier: A Streaming Algorithm forMaximum Weight Bipartite Matching S. Cliﬀ Liu ∗ Zhao Song † Hengjie Zhang ‡ Abstract

Given a weighted bipartite graph with n vertices and m edges, the maximum weight bipartitematching problem is to ﬁnd a set of vertex-disjoint edges with the maximum weight. This classicproblem has been extensively studied for over a century.In this paper, we present a new streaming algorithm for the maximum weight bipartitematching problem that uses (cid:101) O ( n ) space and (cid:101) O ( √ m ) passes, which breaks the n -pass barrier.All the previous algorithms either require Ω( n log n ) passes or only ﬁnd an approximate solution.To achieve this pass bound, our algorithm combines a number of techniques from diﬀerentﬁelds such as the interior point method (IPM), symmetric diagonally dominant (SDD) systemsolving, the isolation lemma, and LP duality. To the best of our knowledge, this is the ﬁrstwork that implements the SDD solver and IPM in the streaming model in (cid:101) O ( n ) spaces for graphmatrix. All the previous IPMs only focus on optimizing the running time, regardless of thespace usage. The LP solver for general matrix is impossible to be implemented in (cid:101) O ( n ) spaces. ∗ Princeton University. [email protected] . † Columbia University, Princeton University, and Institute for Advanced Study. [email protected] . ‡ Columbia University. [email protected] . a r X i v : . [ c s . D S ] S e p ontents O ( n log n ) passes . . . . . . . . . . . . . . . . . . . 51.3.4 Other algorithms with more space . . . . . . . . . . . . . . . . . . . . . . . . 51.4 An overview of our techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61.4.1 Interior point method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71.4.2 Streaming implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71.4.3 The isolation lemma . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81.5 Roadmap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 A Notations 16B Isolation lemma in the streaming model 17

B.1 Isolation lemma . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17B.2 Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18B.3 Proof of uniqueness: step 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18B.4 Proof of uniqueness: step 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18B.5 Proof of uniqueness: step 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19B.6 Proof of uniqueness: step 4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

C Preliminary for IPM 20

C.1 Deﬁnitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20C.2 Approximation tools from previous work . . . . . . . . . . . . . . . . . . . . . . . . . 21

D Algorithm 24E Error analysis of IPM 24

E.1 Assumptions on parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25E.2 Bounding potential function Φ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25E.3 Bounding the movement of t . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26E.4 Upper bounding the potential function . . . . . . . . . . . . . . . . . . . . . . . . . . 27E.5 Move both: ﬁnal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28E.6 Move both: part 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28E.7 Move both: part 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29E.8 Move both: part 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 F Running time of IPM 31G SDD solver in the streaming model 31

G.1 SDD and Laplacian systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31G.2 The preconditioner . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32G.3 An iterative solver . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33G.4 An iterative solver: the space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33i.5 An iterative solver: the accuracy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34G.6 Main result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

H Minimum vertex cover 36

H.1 Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36H.2 Correctness of Algorithm 6 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37H.3 Running time of Algorithm 6 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39H.4 Building blocks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40H.5 A minimum vertex cover solver . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

I Combine 41

I.1 Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41I.2 Running time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41I.3 Correctness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42I.4 Primal to dual . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42I.5 Properties of primal and dual LP solutions . . . . . . . . . . . . . . . . . . . . . . . . 43

J Solver reductions 45

J.1 From SDDM solver to SDD solver . . . . . . . . . . . . . . . . . . . . . . . . . . . 45J.2 From SDDM solver to SDDM solver . . . . . . . . . . . . . . . . . . . . . . . . . . . 45ii Introduction

Motivated by the need of processing massive graphs, the streaming model has became a desirabletestbed for designing algorithms for fundamental graph problems. The beautiful work of Ahn, Guhaand McGregor [AGM12a] gave a streaming algorithm for dynamic graph connectivity and spanningforest with O ( n log n ) space complexity, which has been proved to be optimal due to Nelson andYu [NY19, Yu20]. Since then, there is a growing line of work studying max-cut [KKS14b, KK15,KKSV17], cut and spectral sparsiﬁers [KL11, AGM12b, KLM +

17, KMM +

19, KNST19], spannerconstruction [AGM12b, KW14], and so on.Matching is a fundamental problem in theoretical computer science. The studies of classicalgraph matching algorithms has spanned for over a century. The design of matching algorithms hastwo categories, one is focusing on leveraging the combinatorial properties in the graph [Tar83]; theother is reformulating graph problem as a speciﬁc linear program [DS08], and using interior pointmethod [Kar84] and SDD system solver [ST04]. In the streaming setting, the bipartite match-ing problem has been extensively studied [FKM +

04, AG11, EKMS12, GKK12, EKMS12, Kap13,DNO14, KKS14a, DNO14, AG18, ABB +

19, AKSY20, AR20, FHM +

20, Ber20].Many works mentioned above consider the semi-streaming model, where each edge along withits weight is revealed one by one in the stream in an adversarial order and the algorithms areallowed to make one or more passes over the stream using (cid:101) O ( n ) space. (In fact, even some basicgraph algorithms require Ω( n ) space [FKM + not tobe the same. Despite the intimate relationship between this model and the matching problem,almost all the existing streaming algorithms for maximum weight bipartite matching are only ableto ﬁnd an approximate solution. The only exception is a folklore algorithm that repeatedly ﬁndsan augmenting path by a breath-ﬁrst search, which takes O ( n log n ) passes and (cid:101) O ( n ) space to ﬁndan exact maximum cardinality bipartite matching (see Section 1.2 for details). Obtaining even an O ( n ) -pass semi-streaming algorithm for this problem seems to be a major challenge, which raisesthe following open problem: Is there a semi-streaming algorithm for maximum weight bipartite matching that uses o ( n ) passes? In this work, we answer this question positively. We give a semi-streaming algorithm that takes (cid:101) O ( √ m ) IPM iterations to compute the maximum weight bipartite matching, where each iterationtakes O (log n ) passes to implement a streaming version of an SDD solver for computing the Newtonstep of the path-following IPM. IPM was proposed by Karmarkar [Kar84] in 1984. It has been shownthat IPM is a powerful tool in improving the running time of linear programming solver and severalspeciﬁc graph-type linear programming. The classical IPM solver does not have space requirement,but in the streaming setting, the algorithm is not allowed to store all the m constraints, where theLP has m constraints and n variables. To the best of our knowledge, our algorithm is the ﬁrst toimplement IPM in space-limited models such as the streaming model.More speciﬁcally, to solve the maximum weight bipartite matching problem in (cid:101) O ( n ) space,we consider its dual, the generalized minimum vertex cover problem in bipartite graph, and solvethis problem using a streaming version of IPM. Therefore, this work also yields an (cid:101) O ( √ m ) -passsemi-streaming algorithm for the minimum vertex cover problem in bipartite graph. To transformthe dual solution to a primal solution in (cid:101) O ( n ) space, we use the isolation lemma, whose original Some authors deﬁne the space in the streaming model to be the number of cells, where each cell can hold O (log n ) bits or even a number with inﬁnite precision. Our bounds remain unchanged even if each cell only holds constantnumber of bits. (cid:101) Θ( m ) random bits [MVV87]. We bypass this issue by implementing an alternativeconstruction that uses (cid:101) O ( n ) random bits in the semi-streaming model. We assume all edge weights are integers between and poly( n ) . For any function f : R → R , weuse (cid:101) O ( f ) to denote f · poly log( f ) . For any positive semi-deﬁnite matrix A ∈ R n × n , let (cid:107) · (cid:107) A denotethe norm where (cid:107) x (cid:107) A = √ x (cid:62) Ax , ∀ x ∈ R n .The given graph G has n vertices and m edges. In the semi-streaming model, the algorithm ispermitted (cid:101) O ( n ) space. In each pass, the algorithm reads the stream of all edges of G , where eachedge and its weight appear together. The edge orders in diﬀerent passes need not to be the same.The task is to minimize the number of passes.We provide a semi-streaming algorithm for bipartite matching running in (cid:101) O ( √ m ) passes. Thissolves the long standing open problem of whether maximum matching can be solved in o ( n ) passes,in all regimes where m = n − (cid:15) for any constant (cid:15) > . Theorem 1.1 (Main theorem, informal version of Theorem I.1) . Given a bipartite graph G with n vertices and m edges, there exists a streaming algorithm that computes a maximum weight matchingof G in (cid:101) O ( √ m ) passes and (cid:101) O ( n ) space with probability − / poly( n ) . As a byproduct, we also show that minimum vertex cover in bipartite graph, which is the dualproblem of maximum matching, can be solved within the same number of passes and space. A vertex cover of a graph is a set of vertices that includes at least one endpoint of every edge of thegraph, and a minimum vertex cover is a vertex cover of minimum size.

Theorem 1.2 (Informal version of Theorem H.8) . Given a bipartite graph G with n vertices and m edges, there exists a streaming algorithm that computes a minimum vertex cover of G in (cid:101) O ( √ m ) passes and (cid:101) O ( n ) space with probability − / poly( n ) . Our result is obtained by a novel application of interior point method together with SDD systemsolver in the streaming model. To the best of our knowledge, this is the ﬁrst work that implementsthose techniques in the streaming model in which only (cid:101) O ( n ) space is allowed. We put our result ofSDD solver here for possible independent interests. Theorem 1.3 (Informal version of Theorem G.9) . There is a streaming algorithm which takes inputan SDD matrix A , a vector b ∈ R n , and a parameter (cid:15) ∈ (0 , , if there exists x ∗ ∈ R n such that Ax ∗ = b , then with probability − / poly( n ) , the algorithm returns an x ∈ R n such that (cid:107) x − x ∗ (cid:107) A ≤ (cid:15) · (cid:107) x ∗ (cid:107) A in O (1 + log(1 /(cid:15) ) / log log n ) passes and (cid:101) O ( n ) space. We hope the tools developed in this paper will ﬁnd further applications in the study of streamingalgorithms and beyond. We can actually solve a generalized version of the minimum vertex cover problem in bipartite graph: each edge e needs to be covered for at least b e ∈ Z + times, where the case of b = m is the classic minimum vertex cover. Our algorithm can actually solve any SDD system, which is more general than an SDD system. See a formaldeﬁnition of SDD matrix in Section A. .2 Related work Streaming algorithms for matching

Maximum matching has been extensively studied in thestreaming model for decades, where almost all of them fall into the category of approximation algo-rithms. For algorithms that only make one pass over the edges stream, researchers make continuousprogress on pushing the constant approximation ratio above / , which is under the assumption thatthe edges are arrived in a uniform random order [FKM +

04, KKS14a, ABB +

19, FHM +

20, Ber20].The random-order assumption makes the problem substantially easier and is usually unrealistic. Amore natural setting is multi-pass streaming with adversarial edge arriving. There is a long line ofresearch in proving upper bounds and lower bounds on the number of passes to compute a maxi-mum matching in the streaming model [AG11, EKMS12, GKK12, EKMS12, Kap13, DNO14, AG18,AKSY20, AR20]. Notably, [AG11, AG18] use linear programming and duality theory (see the nextsubsection for more details).However, all the algorithms above can only compute an approximate maximum matching: tocompute a matching whose size is at least (1 − (cid:15) ) times the optimal, one needs to spend poly (1 /(cid:15) ) passes (see [DNO14, AG18] and the references therein).Despite the intimate relationship between the matching problem and the streaming model, nobreakthrough is made towards solving exact bipartite matching in the streaming model. We remarkthat a simple folklore algorithm inspired by the classic algorithm of Hopcroft and Karp [HK73] canactually ﬁnd the exact maximum cardinality bipartite matching in (cid:101) O ( n ) passes using (cid:101) O ( n ) space,whose details can be found in the next subsection. Streaming spectral sparsifer

Initialized by the study of cut sparsiﬁer in the streaming model[AGM12a], a simple one-pass semi-streaming algorithm for computing a spectral sparsiﬁer of anyweighted graph is given in [KL11], which suﬃces for our use in this paper. The problem becomesmore challenging in a dynamic setting, i.e., both insertion and deletion of edges from the graph areallowed. Using the idea of linear sketching, [KLM +

17] gives a single-pass semi-streaming algorithmfor computing the spectral sparsiﬁer in the dynamic setting. However, their brute-force approachto recover the sparsiﬁer from the sketching uses Ω( n ) time. An improved recover time is given in[KMM +

19] but requires more spaces, e.g., (cid:15) − n . log O (1) n . Finally, [KNST19] proposes a single-pass semi-streaming algorithm that uses (cid:15) − n log O (1) n space and (cid:15) − n log O (1) n recover time tocompute an (cid:15) -spectral sparsiﬁer which has O ( (cid:15) − n log n ) edges. Note that Ω( (cid:15) − n log n ) space isnecessary for this problem [CKST19]. SDD solver and IPM

There is a long line of work focusing on fast SDD solvers [ST04, KMP10,KMP11, KOSZ13, CKM +

14, PS14, KS16]. Spielman and Teng give the ﬁrst nearly-linear time SDDsolver, which is simpliﬁed with a better running time in later works. The current fastest SDD solverruns in O ( m log / n poly(log log n ) log(1 /(cid:15) )) time [CKM + (cid:101) Θ( m ) space. Peng and Spielman give a nearly-linear work parallelSDD solver, which, if aiming for (cid:101) O (1) depth, requires m processors (space) [PS14].The interior point method was originally proposed by Karmarkar [Kar84] for solving linearprogram. Since then, there is a long line of work on speeding up interior point method for solvingclassical optimization problems, e.g., linear program [Vai87, Ren88, Vai89, NN89, LS14, LS13, LS14,LS15, CLS19, LSZ19, Son19, BLSS20, SY20, JSWZ20], and semi-deﬁnite program [Ans00, NN92,NN94, JKL + .3 Previous techniques In this section, we summarize the previous techniques for computing a maximum matching in thestreaming model. • In Section 1.3.1, we introduce some representative approximation algorithms for bipartitematching. • In Section 1.3.2, we present a potential method to compute an exact bipartite matching,showcasing the current bottleneck in the ﬁeld. • In Section 1.3.3, we discuss a simple folklore semi-streaming algorithm that uses O ( n log n ) passes, which is by far the best. • In Section 1.3.4, we introduce some related bipartite matching algorithms which use morespace than that allowed in the semi-streaming model.

Given a parameter (cid:15) ∈ (0 , , many streaming algorithms are to ﬁnd a matching of size (1 − (cid:15) ) timesthe size of the maximum matching. The space and passes usages of these approximation algorithmsare increasing functions of /(cid:15) . A natural idea to ﬁnd an approximate matching is to iteratively sample a small subset of edgesand use these edges to reﬁne the current matching. These algorithms are called sampling-based algorithms. In [AG18], Ahn and Guha show that by adaptively sampling (cid:101) O ( n ) edges in eachiteration, one can either obtain a certiﬁcate that the sampled edges admit a desirable matching, orthese edges can be used to reﬁne the solution of a speciﬁc LP. The LP is a nonstandard relaxation ofthe matching problem, and will eventually be used to produce a good approximate matching. Thealgorithm of Ahn and Guha can compute a (1 − (cid:15) ) -approximate matching for weighted nonbiparitegraph in (cid:101) O (1 /(cid:15) ) passes and (cid:101) O ( n poly(1 /(cid:15) )) space. However, the degree of poly(1 /(cid:15) ) in the spaceusage can be very large, making their algorithm inapplicable for small (non-constant) (cid:15) < / log n in the semi-streaming model.Finding a (1 − (cid:15) ) -approximate maximum matching for arbitrary small (cid:15) (with dependence on n )requires diﬀerent methods. Inspired by the well-studied water ﬁlling process in online algorithms(see [DJK13] and the references therein), Kapralov proposes an algorithm that generalizes thewater ﬁlling process to multiple passes [Kap13]. This algorithm works in the vertex arrival semi-streaming model, where a vertex and all of its incident edges arrive in the stream together. Theobservation is that the water ﬁlling from pass ( k − to pass k follows the same manner as that inthe ﬁrst pass (with a more careful double-counting trick), then the basic diﬀerential equations givea (1 − / √ πk ) -approximate matching in k passes.Kapralov’s algorithm removes the poly(log n ) factor in the number of passes comparing with[AG11], giving a (1 − (cid:15) ) -approximate maximum matching in O (1 /(cid:15) ) passes, albeit in a strongervertex arrival model. Very recently, Assadi, Liu, and Tarjan give a simple semi-streaming algorithmthat computes a (1 − (cid:15) ) -approximate maximum matching in O (1 /(cid:15) ) passes, removing the vertexarrival condition [ALT20]. Their algorithm considers the left vertices in a bipartite graph as bidders and the right vertices as items , then the maximum cardinality bipartite matching problem is to ﬁndan auction maximizing the social welfare. The algorithm only needs to maintain a price for each We will be focusing on approximate algorithms that ﬁnd a matching that is close to (or can potentially be usedto ﬁnd) an exact maximum matching, so all the constant-approximate algorithms are not introduced here. We referthe interested readers to [AB19] and the references therein. O (1 /(cid:15) ) rounds (passes). The space usageis O ( n log(1 /(cid:15) )) for storing all the prices, which is (cid:101) O ( n ) since (cid:15) ≥ / ( n + 1) (otherwise we couldobtain an exact matching). A potential method to compute an exact maximum matching is to augment an approximate match-ing by repeatedly ﬁnding augmenting paths. The state-of-the-art semi-streaming algorithm basedon auctions by [ALT20] takes O (1 /(cid:15) ) passes to ﬁnd a (1 − (cid:15) ) -approximate maximum cardinalitybipartite matching, where (cid:15) ≥ / ( n + 1) is a parameter. (For (cid:15) = Ω(1 / log n ) , there could be algo-rithms that take fewer passes. Recall that the algorithm of [AG18] does not work in semi-streamingwhen (cid:15) is too small.) So given o ( n ) passes, no existing algorithm can ﬁnd a matching of size at least OPT − O ( √ n ) , assuming the maximum matching has size OPT = Θ( n ) . Therefore, we are only ableto obtain a matching of size OPT − Ω( √ n ) from previous approximation algorithms. However, cur-rently there is no semi-streaming algorithm that solves directed graph reachability, a problem thatis no harder than ﬁnding one augmenting path, in o ( √ n ) passes [LJS19]. (The linear-work parallelalgorithm of [LJS19] can be translated into a semi-streaming algorithm that ﬁnds an augmentingpath in n / o (1) passes.) So augmenting a matching of size OPT − Ω( √ n ) to the maximum match-ing of size OPT also takes Ω( n ) passes. In conclusion, based on existing algorithms, the methodof augmenting an approximate matching does not yield an o ( n ) -pass semi-streaming algorithm. Tobreak the n -pass barrier under this framework, we need either an o ( n ) -pass approximation algorithmthat ﬁnds a matching of size OPT − n / − Ω(1) , or an n / − Ω(1) -pass algorithm that ﬁnds at leastone augmenting path. O ( n log n ) passes A simple folklore algorithm inspired by the classic algorithm of Hopcroft and Karp [HK73] canactually ﬁnd the exact maximum cardinality bipartite matching in (cid:101) O ( n ) passes using (cid:101) O ( n ) space.The main idea is the following. Let OPT be the size of the maximum matching in the given n -vertex bipartite graph. If the current matching has size i , then there must exist (OPT − i ) disjointaugmenting paths, so the shortest augmenting path has length at most n/ (OPT − i ) . Using abreath-ﬁrst search (simply ignore the edge in the stream that is not incident with the frontier ofthe breath-ﬁrst search), one can ﬁnd this path in n/ (OPT − i ) passes and augment the currentmatching. Therefore, the total number of passes to compute the perfect matching is at most OPT − (cid:88) i =0 n OPT − i = n · OPT (cid:88) i =1 i = O ( n log n ) . This simple algorithm is by far the state-of-the-art: to the best of our knowledge, before this work,no streaming algorithm computes an exact matching in o ( n log n ) passes. Since the breakthrough result from [Mad13], new improvements have been made in solving themaximum weight bipartite matching problem, and more generally, the maximum ﬂow problem (see[LS20] and the references therein). The maximum cardinality bipartite matching algorithm by Given a matching in a graph, an augmenting path is a path that starts and ends at an unmatched vertex, andalternately contains edges that are outside and inside the matching. m / o (1) time and takes Ω( m ) space, which is currently the fastest when the graphis suﬃciently sparse. For moderately dense graph, a better algorithm is given in [BLN +

20] veryrecently, which runs in (cid:101) O ( m + n . ) time and takes Ω( m ) space. For the maximum weight bipartitematching problem, the O ( n ω ) -time algorithm using matrix inversion where ω < . (see [MS04])requires Ω( n ) space, and the (cid:101) O ( m √ n ) -time algorithm by [LS14] requires Ω( m ) space. Our techniques consist of an involved combination of several techniques such as interior pointmethod, streaming SDD solver, and isolation lemma. • In Section 1.4.1, we provide a high-level picture of how interior point method works. • In Section 1.4.2, we give our streaming SDD solver, and brieﬂy show how to implement theisolation lemma in the streaming model. • In Section 1.4.3, we show a two-fold usage of the isolation lemma: The ﬁrst one is to convert anearly-optimal solution to an exact optimal solution. The second one is to convert an optimalsolution for the dual to an optimal solution for the primal.The major obstacle to overcome in this paper is the space limitation. Give a graph G = ( V, E ) with n = | V | and m = | E | . Consider the LP formulation of the maximum bipartite matchingproblem: max y ∈ R E (cid:88) e ∈ E y e s.t. (cid:88) e ∈ E : e (cid:51) u y e ≤ , ∀ u ∈ Vy ≥ m . Let A ∈ { , } m × n be the (unsigned) edge-vertex incident matrix, the above LP can be written as(Primal) max y ∈ R m (cid:62) m y s.t. A (cid:62) y ≤ m and y ≥ m . (1)Solving this linear programming using IPM needs to store y ∈ R m in the memory, but the spacelimitation is only (cid:101) O ( n ) . So we turn to solve its dual form, in which the solution can be stored in (cid:101) O ( n ) space, then recover the optimal primal solution from a optimal dual solution using standardcomplementary slackness theorem.(Dual) min x ∈ R n (cid:62) n x s.t. Ax ≥ m and x ≥ n . (2)All previous IPMs for the maximum matching problem need to maintain a fractional matching y ∈ R m , which cannot be adapted to the semi-streaming model with an (cid:101) O ( n ) space restriction.The fastest classical LP solvers are due to [LS14], [JSWZ20], and [BLSS20]. The space require-ments for those algorithms are O ( nm ) ([LS14], [BLSS20]), O (( m + n ) ) ([JSWZ20]). The LP solvers in [LS14, JSWZ20, BLSS20] for graph problems require Ω( m ) space, because they need tomaintain some length- m vectors in each iteration even for solving an LP with n variables (e.g., to maintain the dualvariables or the slack variables). Therefore, there is no direct way to use their solvers to solve our primal or dual LPin the semi-streaming model. .4.1 Interior point method Given a matrix A ∈ R m × n and two vectors b ∈ R m and c ∈ R n , consider the following linear program min x ∈ R n c (cid:62) x s . t . Ax ≥ b. We brieﬂy review several basic concepts in Renegar’s algorithm [Ren88] (see Section C, D).Renegar’s IPM solves the above LP by introducing two functions: one is the barrier function F : R n → R and the other is the (parametrized) perturbed objective function f t : R n → R .The standard log-barrier function can be written as F ( x ) := (cid:88) i ∈ [ m ] log( a (cid:62) i x − b i ) , where a (cid:62) , · · · , a (cid:62) m are rows of matrix A . It is easy to see that the domain of F ( x ) is exactly theregion { x ∈ R n : Ax > b } . For any parameter t > , we deﬁne the perturbed function f t : R n → R as follows f t ( x ) := t · c (cid:62) x + F ( x ) . Renegar’s method was based on a type of interior point methods known as the path following method ,which incrementally minimizing f t ( x ) . Once we get the minimizer x ∗ t of f t ( x ) for some large t , it isnot far from being optimal, e.g., c (cid:62) x ∗ t ≤ OPT + m/t .From Renegar’s framework, the potential function Φ t ( x ) := (cid:107)∇ f t ( x ) (cid:107) H ( x ) is used to measurehow close from x to the minimizer x ∗ t , where H ( x ) := ∇ F ( x ) is the hessian matrix. Φ t ( x ) is alwaysgreater or equal than , and on the other hand, Φ t ( x ) = 0 means x = x ∗ t . In each iteration, ourgoal is to increase t to t new , move x to x new , while keeping Φ t new ( x new ) small: Φ t new ( x new ) = Φ t ( x ) + Φ t ( x new ) − Φ t ( x ) (cid:124) (cid:123)(cid:122) (cid:125) x move + Φ t new ( x new ) − Φ t ( x new ) (cid:124) (cid:123)(cid:122) (cid:125) t move We are able to prove the following (see Section E):1. In the x moving part, let x new be applying one newton step from x on function f t ( x ) . Imple-menting Newton step requires an SDD solver. And after that, we can guarantee the standardquadratic convergence rate of Newton method: Φ t ( x new ) ≤ Φ t ( x ) .

2. In the t moving part, we multiplicative increase t to t new . We can guarantee that Φ t new ( x new ) ≤ Φ t ( x new ) + | t new /t − | · √ m. Combining the above two parts, we can set t new = (1 + O (1 / √ m )) · t so that an (cid:101) O ( √ m ) iterationIPM follows. We show that the iterating procedure in IPM can be implemented in the streaming model and eachiteration can be done in (cid:101) O (1) passes (see Section F). Recall that each iteration is performing aNewton step ∇ F ( x ) ∇ f t ( x ) ∈ R n . Note that ∇ F ( x ) ∈ R n × n is an SDD matrix in our setting, and ∇ F ( x ) , ∇ f t ( x ) both rely on the slack variable s ( x ) := Ax − b ∈ R m , but we cannot even storelength- m vector b in the space. We address this problem by7 implementing an SDD solver in the streaming model; • implementing the algorithm in the isolation lemma in the streaming model.To the best of our knowledge, this paper is the ﬁrst to implement the IPM and SDD solver instreaming model. Implement the Laplacian solver in streaming model

For simplicity, we outline a streamingLaplacian solver here, which gives an SDD solver by standard reductions (see Section G). Given agraph G , let L G be the Laplacian matrix of G . Let A be the signed edge-vertex incident matrix,and let a (cid:62) , · · · , a (cid:62) m be rows of A ∈ R m × n . Let s ∈ R m be slack variable.Note that L G has nnz( L G ) = O ( m ) so it cannot be stored in memory, but it can be written as L G = (cid:88) i ∈ [ m ] a i a (cid:62) i s i . Suppose we have an oracle that outputs s i for any given i ∈ [ m ] , then one can do simple thingslike multiplying L G with a vector v , e.g., L G · v . This can be done since we can read one pass ofall edges, and note that for edge e i = ( u, v ) , we have a i = u − v so that a i a (cid:62) i s i · y ∈ R n can becomputed and stored. In fact, we show that this simple operation suﬃces: by [KLM + H of G . Using L H , the Laplacian of H , as a preconditioner of L G , we can doiterative reﬁnement on the solution: x ( t +1) := x ( t ) + L − H · ( b − L G · x ( t ) ) . After (cid:101) O (1) iterations, x will converge to the solution x ∗ = L − G y with desired precision. Implement the isolation lemma in the streaming model

We start with the deﬁnition of theisolation lemma (see Section B).

Deﬁnition 1.4 (Isolation lemma) . Given a set system ( S, F ) where F ⊆ { , } S . Given weight w i to each element i in S , the weight of a set F in F is deﬁned as (cid:80) i ∈ F w i . The goal of the isolationlemma is to design a scheme that can assign weight oblivious to F , such that there is a unique setin F that has the minimum (maximum) weight under this assignment. The isolation lemma always involves randomness since its output is oblivious to F . The isolationlemma says that if we randomly choose this weight, then with a good probability the uniqueness isensured. However, this does not apply to the streaming setting since this weight vector is of length m , which cannot be stored in memory. The ideal is to consistently output the same weight vectorin diﬀerent passes.[CRS95] reduces the number of random bits from | S | to log( |F | ) ≤ | S | . In the case of maximumweight bipartite matching, we have |F | ≤ ( n +1) n since each of the n vertices can choose none or oneof the vertices in the matching. Our streaming implementation is to store these log(( n +1) n ) = (cid:101) O ( n ) random bits, so that we can consistently output the same weight vector in diﬀerent passes. We used two folds of the isolation lemma. The ﬁrst one is to convert an optimal solution for thedual to an optimal solution for the primal. The second one is to convert a nearly-optimal solutionof the dual LP that we computed from IPM to an exact optimal solution.8 lgorithm 1

The algorithmic framework in a high-level procedure Main ( G = ( V L , V R , E ) ) n ← | V L | + | V R | m ← | E | Let A ∈ R m × n be the signed edge-vertex incident matrix of G Let b ∈ R m be the perturbed objective vector produced by the generalized isolation lemma Step 1 : (cid:46) (cid:101) O ( √ m ) passes Use IPM to solve max x ∈ R n (cid:62) n x s . t . Ax ≥ b Step 2 : (cid:46) One pass s ← Ax − b S ← { i ∈ [ m ] | s i = 0 } (cid:46) S is a subset of edges of size at most n Step 3 : (cid:46) Do not read the stream

Find a maximum matching using edges in S in (cid:101) O ( n ) space end procedure

21 43 V L V R e e e e x = (1 , , , > , Ax = (1 , , , > y = (1 , , , > , A > y = (1 , , , >

21 43 e e e e V L V R x = (1 , , , > , Ax = (1 , , , > y = (0 , , , > , A > y = (1 , , , > Figure 1: The red circle is a minimum vertex cover, which is an optimal dual solution. The blueedge is a maximum matching, which is an optimal primal solution. In both examples, the primaland dual satisfy complementary slackness Eq. (3).For the ﬁrst part, to see how the isolation lemma helps, let us see a simple example.Suppose the graph has a (maximum weight) perfect matching. Then the following trivial solutionis optimal to the dual LP: choosing all vertices in V L . Let us show what happens when applyingthe complementary slackness theorem. The complementary slackness theorem says that when y isa feasible primal solution and x is a feasible dual solution, then y is optimal to the primal and x isoptimal to the dual if and only if (cid:104) y, Ax − m (cid:105) = 0 and (cid:104) x, n − A (cid:62) y (cid:105) = 0 . (3)From the above case, we have Ax − m = 0 , so the ﬁrst equality (cid:104) y, Ax − m (cid:105) = 0 puts no constrainton y . Therefore, any solution y ≥ m to the linear system a (cid:62) i y i = 1 , ∀ i ∈ V L is an optimal solution,where a i is the i -th column of A . Note that this linear system has m variables and | V L | equations,which is still hard to ﬁnd a solution in (cid:101) O ( n ) space. Roughly speaking, the complementary slackness theorem does not help in this case since choosingall vertices in V L is always a feasible solution, and actually is an optimal solution when a perfectmatching exists. In general, the inverse of a sparse matrix can be dense, which means the standard Gaussian elimination methodfor linear system solving does not apply in the semi-streaming model. b ∈ R m such that theoptimal solution to the following primal LP is unique :(Primal) max y ∈ R m b (cid:62) y, s.t. A (cid:62) y ≤ n and y ≥ m . Suppose we ﬁnd the optimal solution x in the dual LP and we want to recover the optimalsolution y in the primal LP. Again by plugging in the complementary slackness theorem, we get atmost n equations from the second part (cid:104) x, n − A (cid:62) y (cid:105) = 0 . Since the optimal y is unique and y hasdimension m , the ﬁrst part (cid:104) y, Ax − m (cid:105) must contribute at least m − n equations. Note that theseequations have the form y i = 0 , ∀ i ∈ [ m ] s.t. ( Ax ) i − > . This means that the corresponding edges are unnecessary in order to get one maximum matching.As a result, we can reduce the number of edges from m to n , then compute a maximum matchingin (cid:101) O ( n ) space without reading the stream, which can be done by simply checking all the at most n subsets of edges.However, the classic isolation lemma needs (cid:101) O ( m ) random bits to generate such a perturbedvector b ∈ R m , but storing b in (cid:101) O ( n ) space is inhibited. We bypass this barrier by using [CRS95]’sgeneralized isolation lemma, which only uses (cid:101) O ( n ) random bits in the maximum weight bipartitematching problem. We further implement their algorithm in the streaming model to meet our needs.For the second part, there are related works that consider converting a nearly-optimal solutionto an optimal solution in LP where all coeﬃcients of A, b, c are integers, e.g., [DS08] and [LS14].The basic idea is to randomly perturb the objective vector on a small scale, such that there is aunique minimizer to the LP. (For our application in solving the dual LP, the objective function hasonly n variables, so applying the classic isolation lemma [MVV87] with (cid:101) O ( n ) random bits suﬃcesfor our use.) Note that the feasible area { x | Ax ≥ b, x ≥ } is a polytope, and all extreme pointson this polytope are integral by totally unimodularity. Therefore, if we get closer enough to thatminimizer, our solution can be rounded into the minimizer. The appendix is organized as follows. In Section A, we deﬁne the basic notations in this paper. InSection B, we present the generalized isolation lemma in the semi-streaming model, which will beused to recover the maximum matching from a minimum vertex cover.In Section C, we give some preliminaries for interior point method. In Section D, we present ourstreaming algorithm of interior point method. In Section E, we present the error analysis of interiorpoint method. In section F, we show the running time of our interior point method.In Section G, we give an SDD solver in the streaming model, which is a necessary componentof our interior point method. In section H, we discuss the streaming algorithm for minimum vertexcover. In Section I, we combine all the pieces together to get the ﬁnal algorithm for maximumweight bipartite matching. In Section J, we complement Section G by providing two reductionsfrom weaker solvers to our ﬁnal SDD solver. Acknowledgements

The authors would like to thank Sepehr Assadi for introducing this problem, and thank SepehrAssadi, Robert E. Tarjan, Huacheng Yu, Peilin Zhong for helpful discussions.S. Cliﬀ Liu is partially supported by an innovation research grant from Princeton Universityand a gift from Microsoft (to Robert E. Tarjan). Zhao Song is partially supported by SchmidtFoundation and Simons Foundation. Hengjie Zhang is partially supported by NSF CAREER awardCCF-1844887, Columbia Presidential Fellowship.10 eferences [AB19] Sepehr Assadi and Aaron Bernstein. Towards a uniﬁed theory of sparsiﬁcation formatching problems. In Jeremy T. Fineman and Michael Mitzenmacher, editors, , volume 69, pages 11:1–11:20, 2019.[ABB +

19] Sepehr Assadi, MohammadHossein Bateni, Aaron Bernstein, Vahab Mirrokni, and CliﬀStein. Coresets meet EDCS: algorithms for matching and vertex cover on massivegraphs. In

Proceedings of the Thirtieth Annual ACM-SIAM Symposium on DiscreteAlgorithms , pages 1616–1635. SIAM, 2019.[AG11] Kook Jin Ahn and Sudipto Guha. Linear programming in the semi-streaming modelwith application to the maximum matching problem. In

ICALP , pages 526–538.Springer, 2011.[AG18] Kook Jin Ahn and Sudipto Guha. Access to data and number of iterations: Dual primalalgorithms for maximum matching under resource constraints.

ACM Transactions onParallel Computing (TOPC) , 4(4):1–40, 2018.[AGM12a] Kook Jin Ahn, Sudipto Guha, and Andrew McGregor. Analyzing graph structure vialinear measurements. In

Proceedings of the twenty-third annual ACM-SIAM symposiumon Discrete Algorithms (SODA) , pages 459–467. SIAM, 2012.[AGM12b] Kook Jin Ahn, Sudipto Guha, and Andrew McGregor. Graph sketches: sparsiﬁcation,spanners, and subgraphs. In

Proceedings of the 31st ACM SIGMOD-SIGACT-SIGAIsymposium on Principles of Database Systems (PODS) , pages 5–14, 2012.[AKSY20] Sepehr Assadi, Gillat Kol, Raghuvansh R. Saxena, and Huacheng Yu. Multi-pass graphstreaming lower bounds for cycle counting, max-cut, matching size, and other problems.In

FOCS , 2020.[ALT20] Sepehr Assadi, S. Cliﬀ Liu, and Robert E. Tarjan. An auction algorithm for bipartitematching in streaming and massively parallel computation models. In

Manuscript , 2020.[Ans00] Kurt M Anstreicher. The volumetric barrier for semideﬁnite programming.

Mathematicsof Operations Research , 25(3):365–380, 2000.[AR20] Sepehr Assadi and Ran Raz. Near-quadratic lower bounds for two-pass graph streamingalgorithms. In

FOCS . https://arxiv.org/pdf/2009.01161.pdf , 2020.[Ber20] Aaron Bernstein. Improved bounds for matching in random-order streams. In ICALP ,2020.[BLN +

20] Jan van den Brand, Yin Tat Lee, Danupon Nanongkai, Richard Peng, ThatchapholSaranurak, Aaron Sidford, Zhao Song, and Di Wang. Bipartite matching in nearly-linear time on moderately dense graphs. In

FOCS , 2020.[BLSS20] Jan van den Brand, Yin Tat Lee, Aaron Sidford, and Zhao Song. Solving tall denselinear programs in nearly linear time. In

STOC , 2020.11CKM +

14] Michael B Cohen, Rasmus Kyng, Gary L Miller, Jakub W Pachocki, Richard Peng,Anup B Rao, and Shen Chen Xu. Solving sdd linear systems in nearly m log / n time. In Proceedings of the forty-sixth annual ACM symposium on Theory of computing(STOC) ,pages 343–352, 2014.[CKST19] Charles Carlson, Alexandra Kolla, Nikhil Srivastava, and Luca Trevisan. Optimal lowerbounds for sketching graph cuts. In

Proceedings of the Thirtieth Annual ACM-SIAMSymposium on Discrete Algorithms , pages 2565–2569. SIAM, 2019.[CLS19] Michael B Cohen, Yin Tat Lee, and Zhao Song. Solving linear programs in the currentmatrix multiplication time. In

Proceedings of the 51st Annual ACM Symposium onTheory of Computing (STOC) , 2019.[CRS95] Suresh Chari, Pankaj Rohatgi, and Aravind Srinivasan. Randomness-optimal uniqueelement isolation with applications to perfect matching and related problems.

SIAMJournal on Computing , 24(5):1036–1050, 1995.[DJK13] Nikhil R Devanur, Kamal Jain, and Robert D Kleinberg. Randomized primal-dualanalysis of ranking for online bipartite matching. In

Proceedings of the twenty-fourthannual ACM-SIAM symposium on Discrete algorithms , pages 101–107. SIAM, 2013.[DNO14] Shahar Dobzinski, Noam Nisan, and Sigal Oren. Economic eﬃciency requires interac-tion. In

Proceedings of the forty-sixth annual ACM symposium on Theory of computing(STOC) , pages 233–242, 2014.[DS08] Samuel I Daitch and Daniel A Spielman. Faster approximate lossy generalized ﬂow viainterior point algorithms. In

Proceedings of the fortieth annual ACM symposium onTheory of computing (STOC) , pages 451–460, 2008.[EKMS12] Sebastian Eggert, Lasse Kliemann, Peter Munstermann, and Anand Srivastav. Bipartitematching in the semi-streaming model.

Algorithmica , 63(1-2):490–508, 2012.[FHM +

20] Alireza Farhadi, Mohammad Taghi Hajiaghayi, Tung Mah, Anup Rao, and Ryan ARossi. Approximate maximum matching in random streams. In

Proceedings of the Four-teenth Annual ACM-SIAM Symposium on Discrete Algorithms (SODA) , pages 1773–1785. SIAM, 2020.[FKM +

04] Joan Feigenbaum, Sampath Kannan, Andrew McGregor, Siddharth Suri, and JianZhang. On graph problems in a semi-streaming model. In

ICALP , pages 531–543.Springer, 2004.[FKM +

05] Joan Feigenbaum, Sampath Kannan, Andrew McGregor, Siddharth Suri, and JianZhang. On graph problems in a semi-streaming model.

Departmental Papers (CIS) ,page 236, 2005.[GKK12] Ashish Goel, Michael Kapralov, and Sanjeev Khanna. On the communication andstreaming complexity of maximum bipartite matching. In

Proceedings of the twenty-third annual ACM-SIAM symposium on Discrete Algorithms (SODA) , pages 468–485.SIAM, 2012.[HK73] John E Hopcroft and Richard M Karp. An n / algorithm for maximum matchings inbipartite graphs. SIAM Journal on computing , 2(4):225–231, 1973.12HT56] Isidore Heller and CB Tompkins. An extension of a theorem of dantzigâĂŹs.

Linearinequalities and related systems , 38:247–254, 1956.[JKL +

20] Haotian Jiang, Tarun Kathuria, Yin Tat Lee, Swati Padmanabhan, and Zhao Song. Afaster interior point method for semideﬁnite programming. In

FOCS , 2020.[JSWZ20] Shunhua Jiang, Zhao Song, Omri Weinstein, and Hengjie Zhang. Faster dynamic matrixinverse for faster lps. arXiv preprint arXiv:2004.07470 , 2020.[Kap13] Michael Kapralov. Better bounds for matchings in the streaming model. In

Proceedingsof the twenty-fourth annual ACM-SIAM symposium on Discrete algorithms , pages 1679–1697. SIAM, 2013.[Kar84] Narendra Karmarkar. A new polynomial-time algorithm for linear programming. In

Proceedings of the sixteenth annual ACM symposium on Theory of computing(STOC) ,pages 302–311. ACM, 1984.[KK15] Dmitry Kogan and Robert Krauthgamer. Sketching cuts in graphs and hypergraphs.In

Proceedings of the 2015 Conference on Innovations in Theoretical Computer Science ,pages 367–376, 2015.[KKS14a] Michael Kapralov, Sanjeev Khanna, and Madhu Sudan. Approximating matching sizefrom random streams. In

Proceedings of the twenty-ﬁfth annual ACM-SIAM symposiumon Discrete algorithms (SODA) , pages 734–751. SIAM, 2014.[KKS14b] Michael Kapralov, Sanjeev Khanna, and Madhu Sudan. Streaming lower bounds for ap-proximating max-cut. In

Proceedings of the twenty-sixth annual ACM-SIAM symposiumon Discrete algorithms , pages 1263–1282. SIAM, 2014.[KKSV17] Michael Kapralov, Sanjeev Khanna, Madhu Sudan, and Ameya Velingker. (1+ Ω (1))-approximation to max-cut requires linear space. In Proceedings of the Twenty-EighthAnnual ACM-SIAM Symposium on Discrete Algorithms , pages 1703–1722. SIAM, 2017.[KL11] Jonathan A Kelner and Alex Levin. Spectral sparsiﬁcation in the semi-streaming set-ting. In

STACS , 2011.[KLM +

17] Michael Kapralov, Yin Tat Lee, Cameron Musco, Christopher Musco, and AaronSidford. Single pass spectral sparsiﬁcation in dynamic streams.

SIAM J. Comput. ,46(1):456–477, 2017.[KMM +

19] Michael Kapralov, Aida Mousavifar, Cameron Musco, Christopher Musco, and NavidNouri. Faster spectral sparsiﬁcation in dynamic streams. In arXiv preprint . https://arxiv.org/pdf/1903.12165.pdf , 2019.[KMP10] Ioannis Koutis, Gary L. Miller, and Richard Peng. Approaching optimality for solvingsdd linear systems. In FOCS , pages 235–244, 2010.[KMP11] Ioannis Koutis, Gary L Miller, and Richard Peng. A nearly- m log n time solver forsdd linear systems. In , pages 590–598, 2011.[KNST19] Michael Kapralov, Navid Nouri, Aaron Sidford, and Jakab Tardos. Dynamic streamingspectral sparsiﬁcation in nearly linear time and space. In arXiv preprint . https://arxiv.org/pdf/1903.12150.pdf , 2019.13KOSZ13] Jonathan A Kelner, Lorenzo Orecchia, Aaron Sidford, and Zeyuan Allen Zhu. A simple,combinatorial algorithm for solving sdd systems in nearly-linear time. In Proceedings ofthe forty-ﬁfth annual ACM symposium on Theory of computing (STOC) , pages 911–920,2013.[KS16] Rasmus Kyng and Sushant Sachdeva. Approximate gaussian elimination for laplacians-fast, sparse, and simple. In , pages 573–582. IEEE, 2016.[KW14] Michael Kapralov and David Woodruﬀ. Spanners and sparsiﬁers in dynamic streams.In

Proceedings of the 2014 ACM symposium on Principles of distributed computing(PODS) , pages 272–281, 2014.[LJS19] Yang P Liu, Arun Jambulapati, and Aaron Sidford. Parallel reachability in almost linearwork and square root depth. In , pages 1664–1686. IEEE, 2019.[LS13] Yin Tat Lee and Aaron Sidford. Path ﬁnding ii: An (cid:101) O ( m √ n ) algorithm for the minimumcost ﬂow problem. arXiv preprint arXiv:1312.6713 , 2013.[LS14] Yin Tat Lee and Aaron Sidford. Path ﬁnding methods for linear programming: Solvinglinear programs in O ( √ rank ) iterations and faster algorithms for maximum ﬂow. In , pages 424–433. IEEE, 2014.[LS15] Yin Tat Lee and Aaron Sidford. Eﬃcient inverse maintenance and faster algorithmsfor linear programming. In , pages 230–249. IEEE, 2015.[LS20] Yang P Liu and Aaron Sidford. Faster divergence maximization for faster maximumﬂow. In FOCS , 2020.[LSZ19] Yin Tat Lee, Zhao Song, and Qiuyi Zhang. Solving empirical risk minimization in thecurrent matrix multiplication time. In

COLT , 2019.[Mad13] Aleksander Madry. Navigating central path with electrical ﬂows: From ﬂows to match-ings, and back. In , pages 253–262. IEEE, 2013.[MS04] Marcin Mucha and Piotr Sankowski. Maximum matchings via gaussian elimination. In , pages 248–255.IEEE, 2004.[MVV87] Ketan Mulmuley, Umesh V Vazirani, and Vijay V Vazirani. Matching is as easy asmatrix inversion. In

Proceedings of the nineteenth annual ACM symposium on Theoryof computing , pages 345–354, 1987.[NN89] Yu Nesterov and Arkadi Nemirovsky. Self-concordant functions and polynomial-timemethods in convex programming.

Report, Central Economic and Mathematic Institute,USSR Acad. Sci , 1989.[NN92] Yurii Nesterov and Arkadi Nemirovski. Conic formulation of a convex programmingproblem and duality.

Optimization Methods and Software , 1(2):95–115, 1992.14NN94] Yurii Nesterov and Arkadi Nemirovski.

Interior-point polynomial algorithms in convexprogramming , volume 13. Siam, 1994.[NY19] Jelani Nelson and Huacheng Yu. Optimal lower bounds for distributed and stream-ing spanning forest computation. In

Proceedings of the Thirtieth Annual ACM-SIAMSymposium on Discrete Algorithms , pages 1844–1860. SIAM, 2019.[PS98] Christos H Papadimitriou and Kenneth Steiglitz.

Combinatorial optimization: algo-rithms and complexity . Courier Corporation, 1998.[PS14] Richard Peng and Daniel A Spielman. An eﬃcient parallel solver for sdd linear sys-tems. In

Proceedings of the forty-sixth annual ACM symposium on Theory of computing(STOC) , pages 333–342, 2014.[Ren88] James Renegar. A polynomial-time algorithm, based on newton’s method, for linearprogramming.

Mathematical Programming , 40(1-3):59–93, 1988.[Ren01] James Renegar.

A mathematical view of interior-point methods in convex optimization .SIAM, 2001.[Son19] Zhao Song.

Matrix Theory : Optimization, Concentration and Algorithms . PhD thesis,The University of Texas at Austin, 2019.[ST04] Daniel A. Spielman and Shang-Hua Teng. Nearly-linear time algorithms forgraph partitioning, graph sparsiﬁcation, and solving linear systems. In

Pro-ceedings of the 36th Annual ACM Symposium on Theory of Computing (STOC) ,pages 81–90. https://arxiv.org/abs/cs/0310051 ,divided into https://arxiv.org/abs/0809.3232 , https://arxiv.org/abs/0808.4134 , https://arxiv.org/abs/cs/0607105 , 2004.[ST14] Daniel A. Spielman and Shang-Hua Teng. Nearly linear time algorithms for precondi-tioning and solving symmetric, diagonally dominant linear systems. SIAM J. MatrixAnalysis Applications , 35(3):835–885, 2014.[SY20] Zhao Song and Zheng Yu. Sketching as a tool for solving linear programs. In

Manuscript ,2020.[Tar83] Robert Endre Tarjan.

Data structures and network algorithms . SIAM, 1983.[Thr93] W. Thrash. A note on the least common multiples of dense sets of integers. . , 1993.[Vai87] Pravin M Vaidya. An algorithm for linear programming which requires O ((( m + n ) n +( m + n ) . n ) L ) arithmetic operations. In FOCS . IEEE, 1987.[Vai89] Pravin M Vaidya. Speeding-up linear programming using fast matrix multiplication. In

FOCS . IEEE, 1989.[Yu20] Huacheng Yu. Tight distributed sketching lower bound for connectivity. In arXivpreprint . https://arxiv.org/pdf/2007.12323.pdf , 2020.15 Notations

Standard notations

For a positive integer n , we denote [ n ] = { , , · · · , n } .We use E [] for expectation and Pr[] for probability.For a positive integer n , we use I n to denote the identity matrix of size n × n .For a vector v ∈ R n , we use the standard deﬁnition of (cid:96) p norms: ∀ p ≥ , (cid:107) v (cid:107) p = ( (cid:80) ni =1 | v i | p ) /p .Specially, (cid:107) v (cid:107) ∞ = max i ∈ [ n ] | v i | . We use (cid:107) v (cid:107) to denote the number of nonzero entries in vector v .We use supp( v ) to denote the support of vector v .We use n to denote a length- n vector where every entry is . We use m × n to denote a m × n matrix where each entry is . Similarly, we use the notation n and m × n . Matrix operators

For a square matrix A , we use tr[ A ] to denote its trace. For a square and fullrank matrix A , we use A − denote the true inverse of A . For a matrix A , we use A † to denote itspseudo inverse. We say a square matrix A is positive deﬁnite, if for all x , x (cid:62) Ax > . We say asquare matrix A is positive semi-deﬁnite, if for all x , x (cid:62) Ax ≥ . We use (cid:23) and (cid:22) to denote thep.s.d. ordering. For example, we say A (cid:23) B , if x (cid:62) Ax ≥ x (cid:62) Bx, ∀ x . Matrix norms

For a matrix A , we use (cid:107) A (cid:107) to denote its entry-wise (cid:96) norm, i.e., (cid:107) A (cid:107) = (cid:80) i,j | A i,j | . We use (cid:107) A (cid:107) F to denote its Frobenius norm (cid:107) A (cid:107) F = ( (cid:80) i,j A i,j ) / . We use (cid:107) A (cid:107) todenote its spectral/operator norm. Matrix approximation

Let

A, B ∈ R n × n be psd matrix. Let (cid:15) ∈ (0 , . We say A ≈ (cid:15) B if (1 − (cid:15) ) x (cid:62) Ax ≤ x (cid:62) Bx ≤ (1 + (cid:15) ) x (cid:62) Ax, ∀ x ∈ R n Note that if we have A ≈ (cid:15) B , then (1 − (cid:15) ) (cid:107) x (cid:107) A ≤ (cid:107) x (cid:107) B ≤ (1 + (cid:15) ) (cid:107) x (cid:107) A for all x ∈ R n . The SDD matrix-related deﬁnitionsDeﬁnition A.1 (Edge-vertex incident matrix) . Let G = ( V L , V R , E ) be a connected undirectedbipartite graph. The (unsigned) edge-vertex incidence matrix is denoted as follows B ( e, v ) = (cid:40) , if v incident to e ;0 , otherwise . Deﬁnition A.2 (signed edge-vertex incident matrix) . Let G = ( V L , V R , E ) be a connected directedbipartite graph where all edges orientate from V L to V R . The signed edge-vertex incidence matrix is denoted as follows B ( e, v ) =  +1 , if v incident to e and v ∈ V L ; − , if v incident to e and v ∈ V R ;0 , otherwise . Deﬁnition A.3 (SDDM, SDD matrix) . A square matrix A is weakly diagonally dominant if A i,i ≥ (cid:80) j (cid:54) = i | A i,j | for i , and is strictly diagonally dominant if A i,i > (cid:80) j (cid:54) = i | A i,j | for i . A matrix A isSDD if it is symmetric and weakly diagonally dominant, and is SDD if it is symmetric and strictlydiagonally dominant. A matrix A is SDDM if it is SDD and A i,j ≤ for all i (cid:54) = j . An SDDM matrix is SDDM if it is strictly diagonally dominant. Fact A.4.

An SDDM matrix must be positive semi-deﬁnite. If an SDDM matrix has zero row-sums, then it is a Laplacian matrix. If an SDDM matrix has at least one positive row-sum, then itis an SDDM matrix and positive deﬁnite. it complexity Given a linear programming min x ∈ R n c (cid:62) x (4)s.t. Ax ≥ b, where A ∈ Z m × n , b ∈ Z m , c ∈ Z n all having integer coeﬃcient.The bit complexity L is deﬁned as L := log( m ) + log(1 + d max ( A )) + log(1 + max {(cid:107) c (cid:107) ∞ , (cid:107) b (cid:107) ∞ } ) , where d max ( A ) denotes the largest absolute value of the determinant of a square sub-matrix of A .It is well known that poly( L ) -bit precision is suﬃcient to implement an IPM (e.g., see [DS08] andthe references therein). This is because the absolute values of all intermediate arithmetic resultsare within [2 − poly( L ) , poly( L ) ] , and the errors in all the approximations are at least / poly( n ) .Therefore, truncating all the arithmetic results to poly( L ) bits for some suﬃciently large polynomialpreserves all the error parameters and thus the same analysis holds.We will need the following tools in our later analysis. Lemma A.5 ([HT56]) . Let A ∈ R m × n be the unsigned edge-vertex incident matrix of a bipartitegraph G = ( V L , V R , E ) . Let S := { Ax ≤ b | x ≥ m } , S (cid:48) := { A (cid:62) y ≤ b (cid:48) | y ≥ n } , where b ∈ Z m , b (cid:48) ∈ Z n . Then both A and A (cid:62) are totally unimodular, i.e., all square submatrices of them havedeterminants of , , − , and all extreme points of S and S (cid:48) are integral. B Isolation lemma in the streaming model

The goal of this section is to prove Lemma B.1.Before we prove, we give a simple notation here.For a vector w ∈ Z n and a set S ⊆ [ n ] , we denote w S := (cid:80) i ∈ S w i . We can deﬁne w S := (cid:80) i ∈ S w i similarly when w i : Z t → Z is a function. Lemma B.1 (Streaming implementation of isolation lemma) . Let n, F , Z, w be deﬁne as in Lemma B.2.The Algorithm 2 in Lemma B.2 can be implemented into such an oracle I : I can output w i givenany i ∈ [ n ] . Furthermore, I runs in O (log( Z ) + log( n )) space.Proof. Our I is the streaming implementation of Algorithm 3. It is easy to see that after runningthe procedure Initialize , the procedure

Query ( i ) will output w i given any i ∈ [ n ] .This oracle stores m, Z ∈ N and r ∈ N t in memory. Note that m, Z can be stored in O (log( Z ) +log( n )) bits, and r can be stored in O ( t · log n ) = O ( (cid:100) log( m ) / log( n ) (cid:101) · log( n )) = O (log( Z ) + log( n )) bits. And it is easy to see in Query , all computation can be done within O (log( Z ) + log( n )) space. B.1 Isolation lemma

We state the generalized isolating lemma from previous work [CRS95].

Lemma B.2 (Generalized isolating lemma [CRS95]) . Fix n ∈ N . Fix an unknown family F ⊆ [ n ] .Let Z denote a positive integer such that Z ≥ |F | , there exists an algorithm (Algorithm 2) that uses O (log( Z ) + log( n )) random bits to output a vector w ∈ [0 , n ] n , such that with probability at least / , there is a unique set S ∗ ∈ F that has minimum weight w S ∗ . roof. The proof is already done in [CRS95]. For the completeness, we rewrite their proof here. ByLemma B.4, with probability at least / , all sets S ∈ F has distinct w (2) S . Under the event of wesuccess, by Lemma B.5, we get that all sets S ∈ F has distinct w (3) S . Then by Lemma B.6, we getthat with probability at least / , the output w will give a unique minimum set in F . Since thetwo events of success are independent, the ﬁnal success probability is at least / . B.2 Algorithms

Algorithm 2

Conceptual implementation of the isolation lemma. Algorithm 3 is the implementa-tion of this algorithm in streaming model. procedure Isolation ( n, Z ∈ N ) (cid:46) Z ≥ |F | , Lemma B.2 w (1) i ← i , ∀ i ∈ [ n ] (cid:46) w (1) ∈ N n Choose m uniformly at random from { , , · · · , (2 nZ ) } . (cid:46) m ≤ n Z For each i , deﬁne w (2) i ← w (1) i mod m . (cid:46) w (2) ∈ [ m ] n t ← (cid:100) log( m ) / log( n ) (cid:101) for i = 1 → n do b i,t − , · · · , b i, , b i, ← w (2) i (cid:46) Write w (2) i in base n . b i,j ∈ [ n ] are digits. (cid:46) Note that t is an upper bound on the length w (3) i ( y , · · · , y t − ) ← (cid:80) t − j =0 b i,j · y j (cid:46) w (3) i : Z t → Z is a linear form. end for Choose r , · · · , r t − uniformly and independently at random from { , , · · · , n } . w (4) i ← w (3) i ( r , · · · , r t − ) , ∀ i ∈ [ n ] return w (4) ∈ N n . end procedure B.3 Proof of uniqueness: step 1

The following lemma is from [Thr93].

Lemma B.3 (Step 1, [Thr93]) . Let L ≥ and let S be any subset of { , · · · , L } such that | S | ≥ L . Then, the least common multiple of the elements in S exceeds L . B.4 Proof of uniqueness: step 2

Lemma B.4 (Step 2) . With probability at least / , all sets S ∈ F have distinct weights w (2) S .Proof. We write F = { S , · · · , S k } where k ≤ Z . Deﬁne I := (cid:89) ≤ i

Isolation (). This algorithm is the streaming implementation ofAlgorithm 2. data structure (cid:46) Lemma B.1 members n, t ∈ N (cid:46) t = O (log( Z ) / log( n )) m, Z ∈ N (cid:46) m = O ( n Z ) , Z ≥ |F | r , · · · , r t − ∈ N (cid:46) r i ≤ n end members procedure Initialize ( n, Z ∈ N ) (cid:46) Initialization Choose m ∈ N uniformly at random from { , , · · · , (2 nZ ) } t ← (cid:100) log m/ log n (cid:101) (cid:46) t ∈ N Choose r , · · · , r t − uniformly and independently at random from { , , · · · , n } (cid:46) r ∈ N t end procedure procedure Query ( i ∈ [ n ] ) w (1) ← i (cid:46) w (1) ∈ N , w (1) ≤ n w (2) ← w (1) i mod m (cid:46) w (2) ∈ N , w (2) ≤ n b t − , · · · , b , b ← w (2) (cid:46) b j ∈ [ n ] , ∀ j ∈ [ t ] (cid:46) Write w (2) in base n . Note that t is an upper bound on the length w (4) ← (cid:80) t − j =0 b j · r j return w (4) (cid:46) w (4) ≤ n end procedure end data structure Let L = 2 nZ . Let S = { , · · · , L } be all the possible choices of m . We have that at leasthalf choices of m satisﬁes I mod m (cid:54) = 0 , since otherwise by applying Lemma B.3, I is at least theleast common multiplier of half numbers of S , i.e., I ≥ L , contradicting with the upper bound on I < nZ .Therefore, with probability at least / (the randomness is over the choice of m ), I mod m (cid:54) = 0 ,which means that all sets S ∈ F have distinct weight w (2) S by our deﬁnition of w (2) i = w (1) i mod m (Line 4, Algorithm 2). B.5 Proof of uniqueness: step 3

Lemma B.5 (Step 3) . If w (2) S are distinct for all S ∈ F , then the linear form w (3) S are all distinctfor all S ∈ F .Proof. Use proof by contradiction. Suppose there exists two distinct sets S , S ∈ F that w (3) S = w (3) S . Let y i = n i , ∀ i ∈ [ t ] . We will have w (2) S = w (3) S ( y , · · · , y t − ) = w (3) S ( y , · · · , y t − ) = w (2) S by deﬁnition of w (3) i (Line 9), which is a contradict to our assumption that all w (2) S are distinct. B.6 Proof of uniqueness: step 4

Lemma B.6 (Step 4) . Let C be any collection of distinct linear forms over at most t variables y = y , · · · , y t − with coeﬃcients in { , , · · · , n − } . Choose a random r = r , · · · , r t − by assigning ach r i uniformly and independently from [ n · t ] . Then in the assignment y = r there will be a uniquelinear form with minimum value, with probability at least / .Proof. We call a variable y i to be singular under an assignment r ∈ Z t if there exists two minimumlinear forms in C under assignment r . Then an assignment r gives unique minimum linear form ifand only if no variable y i is singular under this assignment. We will calculate the probability of y i being singular under random assignment r and then take union bound over every y i .For each y i , ﬁx all r , · · · , r i − , r i +1 , · · · , r t − = a , · · · , a i − , a i +1 , · · · , a t − other than r i . Now,every linear form f under this partial assignment can be written as a f + b f · y i with b f < n . Wesplit C into n classes C , · · · , C n − where C j contains all linear forms with b f = j . Let p j be theminimum a f among all linear forms in C j . According to the deﬁnition of singular, y i is singular onassignment r i if and only if the minimum value in the list { p , p + r i , p + 2 r i , · · · , p n − + ( n − · r i } is not unique, which is upper bounded by the probability that the elements in the list has a collision.Since every pair of elements in the list can have at most one choice of r i such that they are equal,we have Pr r i [ y i is singular | r j = a j , ∀ j (cid:54) = i ] ≤ Pr r i [ ∃ l, m s.t. p l + l · r i = p m + m · r i ] ≤ (cid:88) l (cid:54) = m Pr r i [ p l + l · r i = p m + m · r i ] ≤ (cid:18) n (cid:19) · n t ≤ t . Finally, by a union bound on the events that every y i is not singular, the conclusion follows withprobability at least − t · t = 1 / . C Preliminary for IPM

Let’s consider the linear programming min x ∈ R n c (cid:62) x, s . t . Ax ≥ b where A ∈ R m × n , b ∈ R m , c ∈ R n .We denote a (cid:62) , · · · , a (cid:62) m as the row vectors of matrix A ∈ R m × n . C.1 Deﬁnitions

In this section, we give deﬁnitions.

Deﬁnition C.1 (Barrier function) . Deﬁne the logarithmic barrier function F ( x ) : R n → R asfollows: F ( x ) := − (cid:88) i ∈ [ m ] ln( a (cid:62) i x − b i ) . (5) Deﬁnition C.2 (Feasible solution) . For any x ∈ R n , we say x is feasible if for all i ∈ [ m ] , a (cid:62) i x > b i . tatement Comments Def. C.1 Barrier functionDef. C.2 Feasible solutionDef. C.3 Slack variablesDef. C.4 Gradient and Hessian with respect to barrier functionDef. C.5 Perturbed objective functionDef. C.6 Newton stepDef. C.7 Φ t , depends on t Def. C.8 Ψ , doesn’t depend on t Table 1: IPM related deﬁnitions

Deﬁnition C.3 (Slack) . For simplicity, we also deﬁne the slack s i ( x ) := a (cid:62) i x − b i ∈ R for all x ∈ R n and i ∈ [ m ] . Deﬁnition C.4 (Gradient and Hessian of barrier function) . We deﬁne the gradient g ( x ) : R n → R n and Hessian H ( x ) : R n → R n × n of the barrier function F ( x ) as follows: g ( x ) := ∇ F ( x ) = − (cid:88) i ∈ [ m ] a i s i ( x ) ∈ R n (6) H ( x ) := ∇ F ( x ) = (cid:88) i ∈ [ m ] a i a (cid:62) i s i ( x ) ∈ R n × n . (7) Deﬁnition C.5 (Perturbed objective function) . Deﬁne the perturbed objective function f t : R n → R , where t > is a parameter: f t ( x ) := t · c (cid:62) x + F ( x ) . (8) Deﬁnition C.6 (Newton step) . Given a scalar t > and a vector c ∈ R n , deﬁne the Newton step at a feasible point x ∈ R n to be δ x := − ( H ( x )) † · ∇ f t ( x ) = − ( H ( x )) † · ( t · c + ∇ F ( x )) = − ( H ( x )) † · ( t · c + g ( x )) . (9) Deﬁnition C.7 (Deﬁnition of potential Φ ) . Given t > and feasible x, y ∈ R , we deﬁne function Φ t : R n × R n → R : Φ t ( x, y ) := (cid:107)∇ f t ( x ) (cid:107) H ( y ) − . Deﬁnition C.8 (Helper of potential function Ψ ) . Given feasible x, y ∈ R , we deﬁne function Ψ : R n × R n → R : Ψ( x, y ) := (cid:107) g ( x ) (cid:107) H ( y ) − . C.2 Approximation tools from previous work

Lemma C.9 (Hessian approximation, [Ren01]) . Let f be a self-concordant function with domain D f . Deﬁne H ( x ) := ∇ f ( x ) . For any two feasible point x, z ∈ D f , if (cid:107) x − z (cid:107) H ( x ) ≤ / , then wehave (1 − (cid:107) x − z (cid:107) H ( x ) ) · H ( z ) (cid:22) H ( x ) (cid:22) (1 − (cid:107) x − z (cid:107) H ( x ) ) − · H ( z ) . emma C.10 (Perturbed linear system [BLSS20]) . Let H ∈ R d × d be a symmetric positive deﬁnitematrix. Let (cid:15) ∈ (0 , / . Let b, v ∈ R d such that (cid:107) v (cid:107) H − ≤ (cid:15) (cid:107) b (cid:107) H − . Let y = H − ( b + v ) ∈ R d .Then there is a symmetric matrix ∆ ∈ R d × d , such that y = ( H + ∆) − b ∈ R d and (1 + 20 (cid:15) ) − H (cid:22) H + ∆ (cid:22) (1 + 20 (cid:15) ) H Lemma C.11 ([Ren01]) . Let f be a self-concordant function with domain D f . For some x ∈ D f ,deﬁne H ( x ) := ∇ f ( x ) and g ( x ) := ∇ f ( x ) . Deﬁne n ( x ) := H ( x ) − g ( x ) as the Newton step at x .If (cid:107) n ( x ) (cid:107) H ( x ) ≤ / , then f has a minimizer z that (cid:107) z − x (cid:107) H ( x ) ≤ (cid:107) n ( x ) (cid:107) H ( x ) + 3 (cid:107) n ( x ) (cid:107) H ( x ) (1 − (cid:107) n ( x ) (cid:107) H ( x ) ) Lemma C.12 ([Ren01]) . Let f be a self-concordant function with domain D f . Let c ∈ R n be theobjective vector. Deﬁne OPT := min x ∈ D f (cid:104) c, x (cid:105) . Denote H ( x ) := ∇ f ( x ) . For any x, y ∈ D f , wehave (cid:104) c, y (cid:105) − OPT (cid:104) c, x (cid:105) −

OPT ≤ (cid:107) y − x (cid:107) H ( x ) . Proof.

Deﬁne c x := H ( x ) − c . Deﬁne B ( x ) ⊆ R n be the ellipsoid B ( x ) := { y ∈ R n | (cid:107) y − x (cid:107) H ( x ) < } . Since f is self-concordant, we know B ( x ) ⊆ D f . Therefore for all ≤ t < / (cid:107) c x (cid:107) H ( x ) , x − t · c x ∈ D f . This means (cid:104) c, x (cid:105) − t · (cid:107) c x (cid:107) H ( x ) ≥ OPT . By letting t → / (cid:107) c x (cid:107) H ( x ) , we get (cid:107) c x (cid:107) H ( x ) ≤ (cid:104) c, x (cid:105) − OPT . Hence (cid:104) c, y (cid:105) −

OPT (cid:104) c, x (cid:105) −

OPT = 1 + (cid:104) c x , y − x (cid:105) H ( x ) (cid:104) c, x (cid:105) − OPT ≤ (cid:107) c x (cid:107) H ( x ) (cid:107) y − x (cid:107) H ( x ) (cid:104) c, x (cid:105) − OPT ≤ (cid:107) y − x (cid:107) H ( x ) , where the ﬁrst step is by the deﬁnition of inner product that for some PSD matrix S ∈ R n × n , (cid:104) x, y (cid:105) S := (cid:104) Sx, y (cid:105) , the second step is by Cauchy-Schwarz inequality.

Lemma C.13 ( x ∗ t is nearly-optimal to x ∗ [Ren01]) . Given linear programming min x ∈ R n ,Ax ≥ b c (cid:62) x, where A ∈ R m × n , b ∈ R m , c ∈ R n . Let x ∗ ∈ R n be the optimal solution of the above LP. For every t > , let x ∗ t ∈ R n be the x that minimizes f t ( x ) (Def. C.5), it must be that c (cid:62) x ∗ t − c (cid:62) x ∗ < m/t. roof. Since x ∗ t ∈ R n is the minimizer of f t ( x ) , we have ∇ f t ( x ∗ t ) = t · c + g ( x ∗ t ) = n . By g ( x ∗ t ) = − t · c ∈ R n , we get c (cid:62) x ∗ t − c (cid:62) x ∗ = ( g ( x ∗ t )) (cid:62) · ( x ∗ − x ∗ t ) /t. (10)Note that x ∗ , x ∗ t ∈ R n are both feasible, so we have ( g ( x ∗ t )) (cid:62) · ( x ∗ − x ∗ t ) = (cid:88) i ∈ [ m ] a (cid:62) i ( x ∗ − x ∗ t ) s i ( x ∗ t )= (cid:88) i ∈ [ m ] s i ( x ∗ t ) − s i ( x ∗ ) s i ( x ∗ t )= m − (cid:88) i ∈ [ m ] s i ( x ∗ ) s i ( x ∗ t ) < m. where the ﬁrst step is by Deﬁnition of g ( x ) (Def. C.4), the second step is by Deﬁnition of s i ( x ) (Def. C.3) that s i ( x ) = a (cid:62) i x − b .Combining with (Eq. (10)), the lemma follows. Lemma C.14 (Nearly-optimal output: value version [Ren01]) . Given linear programming min x ∈ R n ,Ax ≥ b c (cid:62) x, where A ∈ R m × n , b ∈ R m , c ∈ R n . Let x ∗ ∈ R n be the optimal solution of the above LP. Let (cid:15) N ∈ (0 , / . If for some t > and x ∈ R n we have Φ t ( x, x ) ≤ (cid:15) N (Def. C.7), then we have c (cid:62) x − c (cid:62) x ∗ ≤ mt · (1 + 2 (cid:15) N ) . Proof.

Denote

OPT := c (cid:62) x ∗ as the best value we can get in LP.Let z be the minimizer of f t . From Φ t ( x, x ) ≤ (cid:15) N , we know (cid:107) n ( x ) (cid:107) H ( x ) ≤ (cid:15) N , where n ( x ) := H ( x ) − ∇ f t ( x ) .By Lemma C.11, we have (cid:107) z − x (cid:107) H ( x ) ≤ (cid:15) N . (11)Plug Eq. (11) into Lemma C.12, we get c (cid:62) z − OPT c (cid:62) x − OPT ≤ (cid:15) N . By Lemma C.13, we have c (cid:62) z ≤ OPT + m/t.

By combining the above three inequalities, we get c (cid:62) x − OPT ≤ mt · (1 + 2 (cid:15) N ) . Algorithm

In this section, we present our IPM algorithm.

Algorithm 4

Interior point method procedure InteriorPointMethod ( A, b, c, t start , x start , t ﬁnal , (cid:15) Φ , o ) (cid:46) Lemma E.1 m, n ← dimensions of A (cid:46) A ∈ R m × n is the input matrix, b ∈ R m , c ∈ R n (cid:46) t start , x start satisfy Φ t start ( x start , x start ) ≤ (cid:15) Φ (cid:46) t ﬁnal ∈ R is the ﬁnal goal of t (cid:46) o ∈ { , } . If o is , the algorithm decreases t , otherwise the algorithm increases t (cid:15) x ← / (cid:15) t ← / (100 √ m ) (cid:15) Φ ← (cid:15) Φ (cid:46) (cid:15) Φ ≤ / is given from outside t ← t start x ← x start T ← O ( √ m ) · | log( t ﬁnal /t start ) | (cid:46) number of iterations for k ← to T do δ x := − H ( x ) − · ∇ f t ( x ) (cid:46) δ x is only used for analysis, not involve computation (cid:46) H ( x ) ∈ R n × n , ∇ f t ( x ) ∈ R n (cid:101) δ x ← StreamLS ( H ( x ) , ∇ f t ( x ) , (cid:15) x · (cid:15) Φ ) (cid:46) Algorithm 5 (cid:46)

Compute any (cid:101) δ x that (cid:107) (cid:101) δ x − δ x (cid:107) H ( x ) ≤ (cid:15) x · (cid:107) δ x (cid:107) H ( x ) x new ← x + (cid:101) δ x if o = 0 then t new ← t · (1 − (cid:15) t ) else t new ← t · (1 + (cid:15) t ) end if if ( o = 0 and t < t ﬁnal ) or ( o = 1 and t > t ﬁnal ) then break end if t ← t new , x ← x new end for return x end procedure E Error analysis of IPM

The goal of this section is to present Lemma E.1.Parameters (cid:15) x (cid:15) t (cid:15) Φ Meaning (cid:107) (cid:101) δ x − δ x (cid:107) H ( x ) ≤ (cid:15) x | t new /t − | = (cid:15) t Φ t ( x, x ) ≤ (cid:15) Φ Value /

200 1 / (100 √ m ) 1 / Table 2: Table of parameters occurs in Algorithm 4.24 emma E.1.

Given any matrix A ∈ R m × n , b ∈ R m , c ∈ R n . Let x ∗ ∈ R n be the solution of thefollowing linear programming min x ∈ R n ,Ax ≥ b c (cid:62) x. If InteriorPointMethod is given

A, b, c, t start , x start , t ﬁnal that satisfy Φ t start ( x start , x start ) ≤ (cid:15) Φ (where (cid:15) Φ = 1 / , see Table 2), then it outputs an answer x which is a nearly-optimal solution : c (cid:62) x − c (cid:62) x ∗ ≤ mt ﬁnal · (1 + 2 (cid:15) Φ ) . Proof.

Since our initial point x start and t start satisfy Φ t start ( x start , x start ) ≤ (cid:15) Φ , we can apply Lemma E.3 to get Φ t ﬁnal ( x, x ) ≤ (cid:15) Φ . Applying Lemma C.14 on t ﬁnal and x with our choose of (cid:15) Φ = 1 / < / , we get c (cid:62) x − c (cid:62) x ∗ ≤ mt ﬁnal · (1 + 2 (cid:15) Φ ) . E.1 Assumptions on parameters

Assumption E.2.

Let (cid:15) x ∈ (0 , / , (cid:15) t ∈ (0 , / √ m ) and (cid:15) Φ ∈ (0 , / denote three ﬁxed param-eters such that the following inqueqality hold. (cid:15) t ) · (cid:15) + 16 · (cid:15) x + √ m · (cid:15) t ≤ (cid:15) Φ . E.2 Bounding potential function Φ The goal of this section is to prove Lemma E.3.

Lemma E.3.

For each t , let Φ t : R n × R n → R be deﬁned as Deﬁnition C.7. Assume Assumption E.2holds. If the input x start , t start satisﬁes Φ t start ( x start , x start ) ≤ (cid:15) Φ , then for all iteration k ∈ [ T ] , we have Φ t ( k ) ( x ( k ) , x ( k ) ) ≤ (cid:15) Φ . Proof.

We use proof by induction. In the basic case where k = 0 , we have t (0) := t start and x (0) := x start so that the condition holds by assumption.When k ≥ , for ease of notation, we deﬁne x = x ( k − , t = t ( k − , x new = x ( k ) , t new = t ( k ) anddeﬁne δ x = H ( x ) − ∇ f t ( x ) as in Deﬁnition C.6.First, from induction hypothesis, we have Φ t ( x, x ) ≤ (cid:15) Φ . By deﬁnition of Φ (Deﬁnition C.7),this means (cid:107) δ x (cid:107) H ( x ) ≤ (cid:15) Φ . Next, we will show (cid:107) (cid:101) δ x − δ x (cid:107) H ( x ) ≤ (cid:15) x . 25y calling StreamLS ( H ( x ) , ∇ f t ( x ) , (cid:15) x ) (Line 16, Algorithm 4) and applying Theorem G.9, wehave (cid:107) (cid:101) δ x − δ x (cid:107) H ( x ) ≤ (cid:15) x · (cid:107) δ x (cid:107) H ( x ) . Deﬁne b := H ( x ) δ x and v := H ( x )( (cid:101) δ x − δ x ) and write (cid:101) δ x = H − · ( b + v ) . Note that we have (cid:107) v (cid:107) H ( x ) − ≤ (cid:15) x · (cid:107) b (cid:107) H ( x ) − . Apply Lemma C.10 and we get (cid:101) δ x = H − H ( x ) δ x , where H ∈ R n × n is a positive semi-deﬁnite matrix that satisﬁes (1 + 20 (cid:15) x ) − H ( x ) (cid:22) H (cid:22) (1 + 20 (cid:15) x ) H ( x ) . Therefore, we have (cid:107) (cid:101) δ x − δ x (cid:107) H ( x ) = (cid:107) ( H − H − I ) δ x (cid:107) H ( x ) ≤ (cid:15) x (cid:107) δ x (cid:107) H ( x ) = 20 (cid:15) x (cid:15) Φ ≤ (cid:15) x . Finally, we have Φ t new ( x new , x new ) ≤ t new t · Φ t ( x new , x new ) + | t new /t − | · Ψ( x new , x new ) ≤ t new t · Φ t ( x new , x new ) + | t new /t − | · √ m ≤ (1 + (cid:15) t ) · Φ t ( x new , x new ) + √ m · (cid:15) t ≤ (cid:15) t ) · Φ t ( x, x ) + 16 (cid:15) x + √ m · (cid:15) t ≤ (cid:15) t ) · (cid:15) + 16 (cid:15) x + √ m · (cid:15) t ≤ (cid:15) Φ , where the ﬁrst step is by Lemma E.4, the second step is by Lemma E.5, the fourth step is byLemma E.6 since (cid:15) x + (cid:15) Φ ≤ / , the ﬁfth step is by induction hypothesis, the last step is byAssumption E.2. Section Lemma

LHS RHSSection E.2 Lemma E.3 Φ t new ( x + (cid:101) δ x , x + (cid:101) δ x ) Section E.3 Lemma E.4 Φ t new ( x + (cid:101) δ x , x + (cid:101) δ x ) ( t new /t ) · Φ t ( . ) + (1 − t new /t ) · Ψ( . ) Section E.4 Lemma E.5 Ψ( x + (cid:101) δ x , x + (cid:101) δ x ) √ m Section E.5 Lemma E.6 Φ t ( x + (cid:101) δ x , x + (cid:101) δ x ) Φ t ( x, x ) Section E.6 Lemma E.7 Φ t ( x + (cid:101) δ x , x + (cid:101) δ x ) Φ t ( x + δ x , x + (cid:101) δ x ) Section E.7 Lemma E.8 Φ t ( x + δ x , x + (cid:101) δ x ) Φ t ( x + δ x , x + δ x ) Section E.8 Lemma E.9 Φ t ( x + δ x , x + δ x ) Φ t ( x, x ) Table 3: Summary of movement of potential function.

E.3 Bounding the movement of t The goal of this section is to prove Lemma E.4. 26 emma E.4.

Let Φ and Ψ be deﬁned in Deﬁnition C.7 and Deﬁnition C.8. Then, for all positive t, t new > and feasible x ∈ R n , we have Φ t new ( x, x ) ≤ t new t · Φ t ( x, x ) + | t new /t − | · Ψ( x, x ) , Proof.

We have ∇ f t new ( x ) = ( t new · c + g ( x ))= t new t · ( t · c + g ( x )) + (cid:18) − t new t (cid:19) · g ( x )= t new t · ∇ f t ( x ) + (cid:18) − t new t (cid:19) · g ( x ) , where the ﬁrst step follows from the deﬁnition of f t ( x ) (Deﬁnition C.5), the second step followsfrom moving terms, and the last step follows from the deﬁnition of f t ( x ) (Deﬁnition C.5).Finally, we can upper bound Φ t new ( x, x ) as follows: Φ t new ( x, x ) = (cid:107)∇ f t new ( x ) (cid:107) H ( x ) − ≤ t new t · (cid:107)∇ f t ( x ) (cid:107) H ( x ) − + (cid:18) − t new t (cid:19) · (cid:107) g ( x ) (cid:107) H ( x ) − = t new t · Φ t ( x, x ) + (cid:18) − t new t (cid:19) · Ψ( x, x ) . Thus, we complete the proof.

E.4 Upper bounding the potential function

The goal of this section is to prove Lemma E.5.

Lemma E.5.

Let function

Ψ : R n × R n → R be deﬁned as Deﬁnition C.8. For all feasible x ∈ R n ,we have Ψ( x, x ) ≤ √ m. Proof.

Let z := ( H ( x )) − g ( x ) ∈ R n so that Ψ( x, x ) = (cid:107) z (cid:107) H ( x ) .We can upper bound (cid:107) z (cid:107) H ( x ) as follows: (cid:107) z (cid:107) H ( x ) = z (cid:62) H ( x ) z = z (cid:62) g ( x )= (cid:88) i ∈ [ m ] z (cid:62) a i s i ( x ) ≤ √ m · (cid:16) (cid:88) i ∈ [ m ] ( z (cid:62) a i ) s i ( x ) (cid:17) / = √ m · (cid:16) z (cid:62) ( (cid:88) i ∈ [ m ] a i a (cid:62) i s i ( x ) ) z (cid:17) / = √ m · ( z (cid:62) H ( x ) z ) / = √ m · (cid:107) z (cid:107) H ( x ) , g ( x ) = (cid:80) i ∈ [ m ] a i s i ( x ) (Deﬁnition C.4), the four step is by Cauchy-Schwarz,and the sixth step is by H ( x ) = (cid:80) i ∈ [ m ] a i a (cid:62) i s i ( x ) (Deﬁnition C.4).Therefore, we obtain Ψ( x, x ) = (cid:107) z (cid:107) H ( x ) ≤ √ m. E.5 Move both: ﬁnal

The goal of this section is to prove Lemma E.6

Lemma E.6.

For any feasible x ∈ R n , let δ x ∈ R n be deﬁned as Deﬁnition C.6. Given (cid:107) (cid:101) δ x − δ x (cid:107) H ( x ) ≤ (cid:15) x and (cid:107) δ x (cid:107) H ( x ) ≤ (cid:15) . If (cid:15) x + (cid:15) ≤ / , then we have Φ t ( x + (cid:101) δ x , x + (cid:101) δ x ) ≤ · Φ t ( x, x ) + 16 (cid:15) x . Proof.

The proof is done by showing the following: Φ t ( x + (cid:101) δ x , x + (cid:101) δ x ) ≤ Φ t ( x + δ x , x + (cid:101) δ x ) + 16 (cid:15) x ≤ · Φ t ( x + δ x , x + δ x ) + 16 (cid:15) x ≤ · Φ t ( x, x ) + 16 (cid:15) x , where the ﬁrst step is by Lemma E.7, the second step follows from Lemma E.8, and the third stepfollows from Lemma E.9. E.6 Move both: part 1

The goal of this section is to prove Lemma E.7.

Lemma E.7.

Given (cid:107) (cid:101) δ x − δ x (cid:107) H ( x ) ≤ (cid:15) x and (cid:107) δ x (cid:107) H ( x ) ≤ (cid:15) . If (cid:15) x + (cid:15) ≤ / , then we have Φ t ( x + (cid:101) δ x , x + (cid:101) δ x ) ≤ Φ t ( x + δ x , x + (cid:101) δ x ) + 16 (cid:15) x . Proof.

We have Φ t ( x + (cid:101) δ x , x + (cid:101) δ x ) − Φ t ( x + δ x , x + (cid:101) δ x )= (cid:107)∇ f t ( x + (cid:101) δ x ) (cid:107) H ( x + (cid:101) δ x ) − − (cid:107)∇ f t ( x + δ x ) (cid:107) H ( x + (cid:101) δ x ) − ≤ (cid:107)∇ f t ( x + (cid:101) δ x ) − ∇ f t ( x + δ x ) (cid:107) H ( x + (cid:101) δ x ) − = (cid:107) g ( x + (cid:101) δ x ) − g ( x + δ x ) (cid:107) H ( x + (cid:101) δ x ) − , (12)where the ﬁrst step is by the deﬁnition of Φ t (Deﬁnition C.7), and the second step is by the triangleinequality.Deﬁne φ ( t ) := g (cid:0) x + δ x + t · ( (cid:101) δ x − δ x ) (cid:1) , for t ∈ [0 , . Then we have g ( x + (cid:101) δ x ) − g ( x + δ x ) = φ (1) − φ (0) . By mean value theorem, there exists p ∈ [0 , such that φ (1) − φ (0) = φ (cid:48) ( p ) = H ( x + δ x + p · ( (cid:101) δ x − δ x )) · ( (cid:101) δ x − δ x ) . (cid:107) x + δ x + p · ( (cid:101) δ x − δ x ) − x (cid:107) H ( x ) ≤ (cid:107) δ x (cid:107) H ( x ) + p · (cid:107) (cid:101) δ x − δ x (cid:107) H ( x ) ≤ (cid:15) + (cid:15) ≤ / , by Lemma C.9we have H ( x ) (cid:22) H ( x + δ x + p · ( (cid:101) δ x − δ x )) (cid:22) H ( x ) . (13)Since (cid:107) x + (cid:101) δ x − x (cid:107) H ( x ) ≤ (cid:107) δ x (cid:107) H ( x ) + (cid:107) (cid:101) δ x − δ x (cid:107) H ( x ) ≤ (cid:15) x + (cid:15) ≤ / , by Lemma C.9 we have H ( x ) (cid:22) H ( x + (cid:101) δ x ) (cid:22) H ( x ) . (14)Finally, we have (cid:107) g ( x + (cid:101) δ x ) − g ( x + δ x ) (cid:107) H ( x + (cid:101) δ x ) − = (cid:13)(cid:13) H ( x + δ x + p · ( (cid:101) δ x − δ x )) · ( (cid:101) δ x − δ x ) (cid:13)(cid:13) H ( x + (cid:101) δ x ) − ≤ (cid:13)(cid:13) H ( x + δ x + p · ( (cid:101) δ x − δ x )) · ( (cid:101) δ x − δ x ) (cid:13)(cid:13) H ( x ) − ≤ (cid:13)(cid:13)(cid:101) δ x − δ x (cid:13)(cid:13) H ( x ) ≤ (cid:15) x , where the ﬁrst step is by Eq. (12), the second step is by Eq. (14), the third step is by Eq. (13). E.7 Move both: part 2

The goal of this section is to prove Lemma E.8.

Lemma E.8.

Given (cid:107) (cid:101) δ x − δ x (cid:107) H ( x ) ≤ (cid:15) x and (cid:107) δ x (cid:107) H ( x ) ≤ (cid:15) . If (cid:15) x + (cid:15) ≤ / , then we have Φ t ( x + δ x , x + (cid:101) δ x ) ≤ · Φ t ( x + δ x , x + δ x ) . Proof.

Since (cid:107) x + (cid:101) δ x − x (cid:107) H ( x ) ≤ (cid:107) δ x (cid:107) H ( x ) + (cid:107) (cid:101) δ x − δ x (cid:107) H ( x ) ≤ (cid:15) x + (cid:15) ≤ / , by Lemma C.9 we have H ( x ) (cid:22) H ( x + (cid:101) δ x ) (cid:22) H ( x ) . Since (cid:107) x + δ x − x (cid:107) H ( x ) ≤ (cid:15) ≤ / , by Lemma C.9 we have H ( x ) (cid:22) H ( x + δ x ) (cid:22) H ( x ) . Therefore, Φ t ( x + δ x , x + (cid:101) δ x ) = (cid:107)∇ f t ( x + δ x ) (cid:107) H ( x + (cid:101) δ x ) − ≤ (cid:107)∇ f t ( x + δ x ) (cid:107) H ( x ) − ≤ (cid:107)∇ f t ( x + δ x ) (cid:107) H ( x + δ x ) − = 16Φ t ( x + δ x , x + δ x ) , where the ﬁrst step is by deﬁnition of Φ (Deﬁnition C.7), the second step follows from H ( x ) (cid:22) H ( x + (cid:101) δ x ) (cid:22) H ( x ) , the third step follows by H ( x ) (cid:22) H ( x + δ x ) (cid:22) H ( x ) , the last step is bydeﬁnition of Φ (Deﬁnition C.7). 29 .8 Move both: part 3 The goal of this section is to prove Lemma E.9.

Lemma E.9. Φ t ( x + δ x , x + δ x ) ≤ Φ t ( x, x ) Proof.

For simplicity, we deﬁne x tmp := x + δ x and z := ( H ( x tmp )) − ∇ f t ( x tmp ) . First, we have ∇ f t ( x tmp ) = t · c + g ( x tmp )= − H ( x ) · δ x − g ( x ) + g ( x tmp )= − (cid:88) i ∈ [ m ] a i a (cid:62) i s i ( x ) δ x + (cid:88) i ∈ [ m ] a i s i ( x ) − (cid:88) i ∈ [ m ] a i s i ( x tmp )= (cid:88) i ∈ [ m ] (cid:18) a i a (cid:62) i s i ( x ) s i ( x tmp ) · δ x − a i a (cid:62) i s i ( x ) · δ x (cid:19) = (cid:88) i ∈ [ m ] a i (cid:0) a (cid:62) i δ x (cid:1) s i ( x ) s i ( x tmp ) , (15)where the ﬁrst step is by deﬁnition of f t (Deﬁnition C.5), the second step is by H ( x ) · δ x = − t · c − g ( x ) ,the third step is by deﬁnition of g ( x ) and H ( x ) (Deﬁnition C.4), and the fourth and last step areby s i ( x tmp ) − s i ( x ) = s i ( x + δ x ) − s i ( x ) = a (cid:62) i ( x + δ x − x ) + ( b − b ) = a (cid:62) i δ x . As a result, (cid:107) z (cid:107) x tmp = z (cid:62) H ( x tmp ) z = z (cid:62) ∇ f t ( x tmp )= (cid:88) i ∈ [ m ] z (cid:62) a i (cid:0) a (cid:62) i δ x (cid:1) s i ( x ) s i ( x tmp ) ≤ (cid:16) (cid:88) i ∈ [ m ] (cid:0) z (cid:62) a i (cid:1) s i ( x tmp ) (cid:17) / · (cid:88) i ∈ [ m ] (cid:0) a (cid:62) i δ x (cid:1) s i ( x ) = ( z (cid:62) H ( x tmp ) z ) / · ( δ (cid:62) x H ( x ) δ x )= (cid:107) z (cid:107) x tmp · (cid:107) δ x (cid:107) x , where the third step is by Eq. (15), the fourth step follows from Cauchy-Schwarz inequality, theﬁfth step is by deﬁnition of H ( x ) (Deﬁnition C.4).Therefore, we have that Φ t ( x + δ x , x + δ x ) = (cid:107) z (cid:107) x tmp ≤ (cid:107) δ x (cid:107) x = Φ t ( x, x ) . Running time of IPM

The goal of this section is to prove Lemma F.1.

Lemma F.1 (Running time of IPM) . Consider Algorithm 4 with input A ∈ R m × n , b ∈ R m , c ∈ R n , x start ∈ R n , t start ∈ R , t ﬁnal ∈ R . Let x ∈ R n be any feasible point. Deﬁne H ( x ) ∈ R n × n and s ( x ) ∈ R m as the Hessian and slack variables (Def. C.4, Def.C.3). Suppose we have the followingconditionsC.1 (cid:80) i ∈ [ m ] a i /s i ( x ) can be computed in one pass in f ( n ) space;C.2 the Hessian H ( x ) ∈ R n × n can be decomposed into H ( x ) = L G ( x ) + D ( x ) , where L G ( x ) ∈ R n × n is the Laplacian of some graph G with edge weight w and D ( x ) ∈ R n × n is diagonal matrix;C.3 we can read the edge-weight pair ( e, w e ) in one pass, and we can read the diagonal of D ( x ) ∈ R n × n in one pass.Then Algorithm 4 can be implemented in the streaming model with (cid:101) O ( n ) + f ( n ) space and (cid:101) O ( √ m ) passes.Proof. Now consider the iterating part (Line 13-28). First we show ∇ f t ( x ) ∈ R n can be computedin one pass.By Deﬁnition C.5, ∇ f t ( x ) = c · t − (cid:88) i ∈ [ m ] a i /s i ( x ) , By condition C.1, ∇ f t ( x ) ∈ R n can be computed in one pass in f ( n ) space.Since we have Conditions C.2 and C.3, we are able to ﬁrst read D in one pass and store itsentries in space O ( n ) , and then apply Theorem G.9 to show (cid:101) δ x can be computed in (cid:101) O (1) pass and (cid:101) O ( n ) space.All other computation can be done in (cid:101) O ( n ) space.Since there are totally T = O ( √ m log( m/(cid:15) ipm )) iterations, the totally number of passes used is (cid:101) O ( √ m ) . Since we care reuse the space for x and t , the total space used is (cid:101) O ( n ) + f ( n ) . G SDD solver in the streaming model

In this section, we present an SDDM solver in the streaming model. Later in Section J we reducesthe problem of sovling an SDD system to solving an SDDM system, giving an SDD solver in thestreaming model. G.1 SDD and Laplacian systems

We need the deﬁnition of a spectral sparsiﬁer, and a streaming algorithm for computing it.

Deﬁnition G.1 ( δ -spectral sparsiﬁer) . Given a weighted undirected graph G and a parameter δ > ,an edge-reweighted subgraph H of G is an δ -spectral sparsiﬁer of G if (1 − δ ) · x (cid:62) L G x ≤ x (cid:62) L H x ≤ (1 + δ ) · x (cid:62) L G x, ∀ x ∈ R n , We also say L H is an δ -spectral sparsiﬁer of L G . here L G and L H are the Laplacians of G and H , respectively. Lemma G.2 ([KLM + . Let G be a weighted graph and δ ∈ (0 , be a parameter. There existsa streaming algorithm that takes G as input, uses δ − n poly(log n ) space and pass, and outputs aweighted graph H with δ − n poly(log n ) edges such that with probability at least − / poly( n ) , H is a δ -spectral sparsiﬁer of G . We will use the classic SDD solver in the sequential model, which is formally described below.

Theorem G.3 ([ST04]) . There is an algorithm which takes input an SDD matrix A , a vector b ∈ R n , and a parameter (cid:15) ∈ (0 , / , if there exists x ∗ ∈ R n such that Ax ∗ = b , then withprobability − / poly( n ) , the algorithm returns an x ∈ R n such that (cid:107) x − x ∗ (cid:107) A ≤ (cid:15) · (cid:107) x ∗ (cid:107) A in nnz( A ) · poly(log n ) · log(1 /(cid:15) ) time. The returned x is called an (cid:15) -approximate solution to the SDD system Ax = b . G.2 The preconditioner

To prove that our SDDM solver (Algorithm 5) gives the desired accuracy, we need the concept of a preconditioner (and how to compute the preconditioner of an SDDM matrix).We deﬁne preconditioner as follows:

Deﬁnition G.4 (Preconditioner) . For any positive deﬁnite matrix A ∈ R n × n and accuracy param-eter δ > , we say P is a δ -precondioner if (cid:107) P − Ax − x (cid:107) A ≤ δ · (cid:107) x (cid:107) A , ∀ x ∈ R n . Lemma G.5.

Let A be an SDDM matrix (Deﬁnition A.3) and let A = L G + D where L G is aLaplacian matrix of graph G and D (cid:31) is a diagonal matrix with non-negative entries. For any δ ∈ (0 , / , if H is a δ -spectral sparsiﬁer of G , and we deﬁne P := L H + D . Then P satisﬁes thefollowing two conditions: • P is a (2 δ ) -preconditioner(Deﬁnition G.4) of A , • P (cid:31) .Proof. The lemma clearly holds for x = n , so assume not in the following. By Deﬁnition G.1, (1 + δ ) x (cid:62) L G x ≥ x (cid:62) L H x ≥ (1 − δ ) x (cid:62) L G x ≥ , then L H (cid:23) as it must be symmetric. Since D (cid:31) , we have that A (cid:31) , P (cid:31) , and (1 + δ ) x (cid:62) ( L G + D ) x ≥ x (cid:62) ( L H + D ) x ≥ (1 − δ ) x (cid:62) ( L G + D ) x > , which is (1 + δ ) A (cid:23) P (cid:23) (1 − δ ) A . So we obtain − δ A − (cid:23) P − (cid:23)

11 + δ A − , which, by δ ∈ (0 , / , implies (1 + 2 δ ) A − (cid:23) P − (cid:23) (1 − δ ) A − . (16) By Fact A.4, such decomposition always exists. A / and right multiply A / on Eq. (16) to get (1 + 2 δ ) I (cid:23) A / P − A / (cid:23) (1 − δ ) I. (17)Subtracting I from Eq. (17) then left multiplying x (cid:62) and right multiplying x , we obtain δx (cid:62) x ≥ x (cid:62) ( A / P − A / − I ) (cid:124) (cid:123)(cid:122) (cid:125) := M x ≥ − δx (cid:62) x. (18)Since Eq. (18) holds for any x ∈ R n and M is symmetric, using spectral theorem, we get δ ≥ λ n ( M ) ≥ λ ( M ) ≥ − δ. Next, we have that (cid:107)

M x (cid:107) = max {| λ n ( M ) | , | λ ( M ) |} ≤ δ (cid:107) x (cid:107) . Let y := A − / x . As rank ( A − / ) = n , we get that for any y ∈ R n , δ (cid:107) A / y (cid:107) ≥ (cid:107) M A / y (cid:107) = (cid:107) A / P − Ay − A / y (cid:107) . (19)Finally, rewriting both sides of Eq. (19) by the deﬁnition of matrix norm, we obtain δ (cid:107) y (cid:107) A ≥ (cid:107) P − Ay − y (cid:107) A for any y ∈ R n , giving the lemma. G.3 An iterative solver

In this section, we present a streaming SDDM solver (Algorithm 5) that takes (cid:101) O ( n ) space and O (log(1 /(cid:15) ) / log log n ) passes for approximately solving SDDM system Ax = b with error parameter (cid:15) ∈ (0 , / . G.4 An iterative solver: the space

The goal of this section is to prove Lemma G.6.

Lemma G.6.

Let A = L G + D ∈ R n × n be an SDDM matrix where L G ∈ R n × n is the Laplacianmatrix of graph G with weight w , and D ∈ R n × n is a diagonal matrix. If we can read all edge-weightpairs ( e, w e ) in pass, and if we can read the diagonal of matrix D in pass, then StreamLS ( A, b, (cid:15) ) takes O (max { , log(1 /(cid:15) ) / log log n } ) passes and (cid:101) O ( n ) space. Proof.

By Lemma G.2 and δ = 1 / log n , the δ -spectral sparsiﬁer H has (cid:101) O ( n ) edges and can becomputed in pass and (cid:101) O ( n ) space with probability − / poly( n ) . Therefore, computing the δ -preconditioner P also takes pass and (cid:101) O ( n ) space. Note that P (cid:31) and thus the system P y = r t always has a solution.It remains to prove that each iteration of Lines 9-14 takes pass and (cid:101) O ( n ) space. Since anyiteration t only needs the vectors subscripted by t and t + 1 , we can reuse the space such that thetotal space is (cid:101) O ( n ) . The algorithm can be implemented in the standard RAM model with ﬁnite precision by introducing an (cid:101) O (1) factor in the encoding, which translates to a multiplicative factor of (cid:101) O (1) in the space [ST14]. lgorithm 5 StreamLS ( A = L G + D, b, (cid:15) ) . Return an (cid:15) -approx solution to A − b procedure StreamLS ( A = L G + D ∈ R n × n , b ∈ R n , (cid:15) ∈ R ) (cid:46) Note that L G is from input stream. D is diagonal matrix stored. δ ← / log n (cid:101) δ ← δ/ T ← O (max { , (log(1 /(cid:15) )) / log log n } ) (cid:46) Number of iterations Compute and store a δ -spectral sparsiﬁer H of L G (cid:46) Use one pass, Lemma G.2 Compute a δ -preconditioner of A as P := L H + D r ← b , x ← n (cid:46) r , x ∈ R n for t ← to T − do Compute a (cid:101) δ -approximate solution y t to P y = r t by an SDD solver and store y t in thememory (cid:46) Theorem G.3 r t +1 ← r t − Ay t x t +1 ← x t + y t end for return x T . end procedure Since nnz ( P ) = (cid:101) O ( n ) and (cid:101) δ = δ/ / (2 log n ) , in Line 10, by Theorem G.3, with probability − / poly( n ) a (cid:101) δ -approximate solution y t can befound in (cid:101) O ( n log(1 / (cid:101) δ )) = (cid:101) O ( n ) time, and therefore in (cid:101) O ( n ) space. Note that this step does not readthe stream.In Line 11, computing r t +1 requires computing Ay t , which is done by reading the stream of L G and D for pass and multiplying the corresponding entries and adding up to the correspondingcoordinate. All vectors are in R n , so the total space used in each iteration is (cid:101) O ( n ) . The lemmafollows immediately from T = O (max { , log(1 /(cid:15) ) / log log n } ) and a union bound. G.5 An iterative solver: the accuracy

The goal of this section is to prove Lemma G.7 (which shows that the solver iteratively improvesthe accuracy of the solution).

Lemma G.7.

Let δ ∈ (0 , / . For any t ∈ [0 , T ] , (cid:107) x t − A − b (cid:107) A ≤ (4 δ ) t · (cid:107) A − b (cid:107) A .Proof. The proof is by an induction on t . In the basic case of t = 0 , we have x t = 0 so the statementclearly holds. Assuming the lemma holds for t , we prove the inductive step for t + 1 .Since y t is a (cid:101) δ -approximate solution (Line 10, Algorithm 5), by Theorem G.3 we have (cid:107) y t − P − r t (cid:107) P ≤ (cid:101) δ · (cid:107) P − r t (cid:107) P . By deﬁnition of the matrix norm, this becomes ( y t − P − r t ) (cid:62) P ( y t − P − r t ) ≤ (cid:101) δ · ( P − r t ) (cid:62) P ( P − r t ) . (20)34ince P is a δ -preconditioner, assuming y t (cid:54) = P − r t and applying Deﬁnition G.1 on both sides ofEq. (20), we have (1 − δ ) · ( y t − P − r t ) (cid:62) A ( y t − P − r t ) ≤ (1 + δ ) (cid:101) δ · ( P − r t ) (cid:62) A ( P − r t ) . (21)If y t = P − r t , then Eq. (21) also holds since the right-hand side of Eq. (21) is non-negative due to A (cid:23) . Note that Eq. (21) implies (cid:107) y t − P − r t (cid:107) A ≤ (cid:101) δ (cid:112) (1 + δ ) / (1 − δ ) · (cid:107) P − r t (cid:107) A ≤ (cid:101) δ · (cid:107) P − r t (cid:107) A ≤ δ · (cid:107) P − r t (cid:107) A , (22)where the second step follows from (1 + δ ) / (1 − δ ) ≤ when δ ∈ (0 , / , and the last step followsfrom (cid:101) δ = δ/ .Before continuing, we observe the following, which easily follows from the update rule and aninduction on t : r t = r − A t − (cid:88) i =0 y i = r − A ( x t − x ) = b − Ax t . (23)Using Eq. (23), we bound the left-hand side of Eq. (22) from above by the following: (cid:107) y t − P − r t (cid:107) A ≤ δ · (cid:107) P − ( b − Ax t ) (cid:107) A = δ · (cid:107) P − A ( A − b − x t ) − ( A − b − x t ) + ( A − b − x t ) (cid:107) A ≤ δ · (cid:0) (cid:107) P − A ( A − b − x t ) − ( A − b − x t ) (cid:107) A + (cid:107) A − b − x t (cid:107) A (cid:1) ≤ δ · (cid:0) δ (cid:107) A − b − x t (cid:107) A + (cid:107) A − b − x t (cid:107) A (cid:1) = 2 δ · (cid:107) A − b − x t (cid:107) A , (24)where the third step follows from the triangle inequality, the fourth step follows from Lemma G.5,the last step follows from δ ∈ (0 , / .Finally, we are ready to prove the inductive step: (cid:107) x t +1 − A − b (cid:107) A = (cid:107) x t − A − b + y t (cid:107) A = (cid:107) ( x t − A − b + P − r t ) + ( y t − P − r t ) (cid:107) A ≤ (cid:107) x t − A − b + P − r t (cid:107) A + (cid:107) y t − P − r t (cid:107) A = (cid:107) x t − A − b + P − b − P − Ax t (cid:107) A + (cid:107) y t − P − r t (cid:107) A ≤ (cid:107) P − A ( A − b − x t ) − ( A − b − x t ) (cid:107) A + 2 δ · (cid:107) A − b − x t (cid:107) A ≤ δ · (cid:107) A − b − x t (cid:107) A + 2 δ · (cid:107) A − b − x t (cid:107) A ≤ (4 δ ) t +1 · (cid:107) A − b (cid:107) A , where the ﬁrst step follows from x t +1 = x t + y t , the third step follows from the triangle inequality,the fourth step follows from Eq. (23), the ﬁfth step follows from Eq. (24), the sixth step follows fromLemma G.5, and the last step follows from the induction hypothesis, completing the proof.35 .6 Main result The goal of this section is to prove Theorem G.9

Lemma G.8.

Let (cid:15) ∈ (0 , and b ∈ R n . Let A = L G + D ∈ R n × n be an SDDM matrix where L G ∈ R n × n is the Laplacian matrix of graph G with weight w , and D ∈ R n × n is a diagonal matrix.If we can read all edge-weight pairs ( e, w e ) in one pass, and if we can read the diagonal of matrix D in one pass, then with probability − / poly( n ) , StreamLS ( A, b, (cid:15) ) returns an (cid:15) -approximatesolution x , i.e., (cid:107) x − A − b (cid:107) A ≤ (cid:15) · (cid:107) A − b (cid:107) A . in O (max { , log(1 /(cid:15) ) / log log n } ) passes and (cid:101) O ( n ) space.Proof. It follows by Lemma G.6, our choices of δ, T , and Lemma G.7.By Lemma G.8 and the reduction in Section J, we obtain our main result.

Theorem G.9.

There is a streaming algorithm which takes input an SDD matrix A , a vector b ∈ R n , and a parameter (cid:15) ∈ (0 , , if there exists x ∗ ∈ R n such that Ax ∗ = b , then withprobability − / poly( n ) , the algorithm returns an x ∈ R n such that (cid:107) x − x ∗ (cid:107) A ≤ (cid:15) · (cid:107) x ∗ (cid:107) A in O (max { , log(1 /(cid:15) ) / log log n } ) passes and (cid:101) O ( n ) space. H Minimum vertex cover

The goal of this section is to prove Lemma H.1 (Correctness) and Lemma H.4 (Run time).

H.1 Algorithm

Algorithm 6

Minimum vertex cover procedure MinimumVertexCover ( G = ( V L , V R , E ) , b ∈ Z m , c ∈ Z n , n, m ) (cid:46) Lemma H.1 Let A be the signed edge-vertex incident matrix of G with direction V L to V R L ← the bit complexity of A , b , c Modify A , b , c (add constraints x V L ≥ and x V R ≤ ) to get A , b , c Modify A , b , c according to Lemma H.2 and get A , b , c (cid:46) A = A , b = b x init ← L · ( V L − V R ) t init ← c init ← −∇ g ( x init ) (cid:15) Φ ← / t ← (cid:15) Φ · ( m L +10 ) − x tmp ← InteriorPointMethod ( A , b , c init , t init , x init , t , , (cid:15) Φ / t ← nm L +10 x ← InteriorPointMethod ( A , b , c , t , x tmp , t , (cid:15) Φ , Use Lemma H.2 to turn x into a set S of tight constraints on LP of ( A , b , c ) (cid:46) S ⊆ [ m + 2 n ] return S ∩ [ m ] end procedure .2 Correctness of Algorithm 6 The goal of this section is to prove Lemma H.1.

Lemma H.1 (Correctness of Algorithm 6) . In Algorithm 6, let A be signed edge-vertex incidentmatrix of G with direction V L to V R . Let [ n ] = V L ∪ V R . Let b = b and c = c be the input. If thelinear programming min x ∈ R n c (cid:62) x (25) s . t . Ax ≥ bx v ≥ , ∀ v ∈ V L x v ≤ , ∀ v ∈ V R is both feasible and bounded, then with probability at least / , Algorithm 6 returns a set of tightconstraints on some optimal solution to the LP Eq. (25) .Proof. First we need to show these two calls into

InteriorPointMethod satisfy the initial con-dition that x start , t start is a good start point. The ﬁrst call

In the ﬁrst call (Line 11), we have (cid:107) t init · c init + ∇ g ( x init ) (cid:107) H ( x init ) − = (cid:107) (cid:107) H ( x init ) − = 0 ≤ (cid:15) Φ / , where the ﬁrst step is by deﬁnition of c init .Since the algorithm InteriorPointMethod ends up in parameter x tmp and t , by Lemma E.3,we have (cid:107) t · c init + ∇ g ( x tmp ) (cid:107) H ( x tmp ) − ≤ (cid:15) Φ / . (26)Note that t and x tmp is the input to the second call. The second call

In the second call (Line 13), the desired term can be upper bounded by (cid:107) t · c + ∇ g ( x tmp ) (cid:107) H ( x tmp ) − ≤ (cid:107) t · c init + ∇ g ( x tmp ) (cid:107) H ( x tmp ) − + (cid:107) t · c init − t · c (cid:107) H ( x tmp ) − ≤ (cid:15) Φ / t · (cid:107) c init − c (cid:107) H ( x tmp ) − where the ﬁrst step is by triangle inequality, the second step is from Eq. (26).Now let’s bound the term (cid:107) c init − c (cid:107) H ( x tmp ) − . (cid:107) c init − c (cid:107) H ( x tmp ) − ≤ (cid:107) c init − c (cid:107) · λ min ( H ( x tmp )) − / ≤ (cid:107) c init − c (cid:107) · L +3 ≤ ( (cid:107) c init (cid:107) + (cid:107) c (cid:107) ) · L +3 ≤ ( √ m L +3 + (cid:107) c (cid:107) ) · L +3 ≤ ( √ m L +3 + m L +4 ) · L +3 ≤ m L +8 , where the ﬁrst step is by (cid:107) x (cid:107) H = √ x (cid:62) Hx ≤ (cid:107) x (cid:107) · (cid:112) λ max ( H ) and λ max ( H ) = λ min ( H − ) − , thesecond step is by Lemma H.3, the fourth step is by c init = −∇ g ( x init ) , x init = 2 L ( V L − V R ) andthe deﬁnition of g (Def. C.4), the ﬁfth step is by deﬁnition of c (in Lemma H.2).37y our choice of t := (cid:15) Φ · ( m L +10 ) − (Line 10), we ﬁnally have (cid:107) t · c + ∇ g ( x tmp ) (cid:107) H ( x tmp ) − ≤ (cid:15) Φ / (cid:15) Φ / ≤ (cid:15) Φ . Let

OPT be the optimal solution to the linear programming min x ∈ R n ,A x ≥ b c (cid:62) x. By t := nm L +10 and Lemma E.1, we know our solution x (Line 13) is feasible and has value c (cid:62) x − OPT ≤ mt · (1 + 2 (cid:15) Φ ) ≤ n − − L − . Now, we can apply Lemma H.2 to show that after onematrix vector multiplication, with probability at least / , we can output the tight constraints S of a basic feasible optimal solution of min x ∈ R n ,A x ≥ b c (cid:62) x. (27)Note that this LP is exactly the same as Eq. (25) by our construction on A , b , c (Line 4). Lemma H.2 (Lemma 43 of [LS14]) . Given a feasible and bounded linear programming min x ∈ R n c (cid:62) x (28) s . t . Ax ≥ b, where A ∈ Z m × n , b ∈ Z m , c ∈ Z n all having integer coeﬃcient. Let L be the bit complexity ofEq. (28) .Let r ∈ Z n be chose uniformly at random from the integers {− L +1 n, · · · , L +1 n } . Consider theperturbed linear programming min x ∈ R n (2 L +3 n · c + r ) (cid:62) x (29) s . t . Ax ≥ b. Then with probability at least / over the randomness on r ∈ Z n , we have the following.Let OPT be deﬁned as the optimal value of linear programming Eq. (29) . If we can ﬁnd onefeasible solution for Eq. (29) with value less than

OPT + n − − L − , then we can ﬁnd the tightconstraints of a basic feasible optimal solution of Eq. (28) using one matrix vector multiplicationwith A . Moreover, the bit complexity of Eq. (29) is at most L + log(8 n ) . Lemma H.3.

Let H ( x ) be deﬁned as in Lemma H.6, then we have λ min ( H ( x )) ≥ − L − . Proof.

We can lower bound λ min ( H ( x )) in the following sense, λ min ( H ( x )) ≥ λ min ( D ( x )) ≥ min i ∈ [ n ] D ( x ) i,i ≥ − L − where the ﬁrst step is by Lemma H.6 that H ( x ) = L ( x ) + D ( x ) and both L ( x ) and D ( x ) are positivesemi-deﬁnite matrices, the second step is by the fact that D ( x ) is diagonal matrix, the third stepis by Lemma H.6 that D ( x ) i,i = s i + m ( x ) − + s i + m + n ( x ) − and s ( x ) := Ax − b is bounded by (cid:107) s (cid:107) ∞ ≤ L +3 . 38 .3 Running time of Algorithm 6 Lemma H.4 (Running time of Algorithm 6) . Suppose there’s an oracle I running in f ( n ) spacethat can output ( b ) i given any i ∈ [ m ] . If the input satisﬁes (cid:107) b (cid:107) ∞ , (cid:107) c (cid:107) ∞ is polynomially bounded,then Algorithm 6 can be implemented in streaming model within (cid:101) O ( n ) + f ( n ) + | S | space and (cid:101) O ( √ m ) passes.Proof. In Line 2, we deﬁned A but never explicitly compute and store A in memory.In Line 3, we let L ← O (log n ) as the upper bound on the bit complexity. This cost one pass.Indeed, Lemma A.5 implies that | d max ( A ) | ≤ . So L is the upper bound on the bit complexity.In Line 4 and Line 5, we deﬁned A , b , c , A , b , c . This doesn’t involve computation.In Lines 6, 7, 8, 9, 10, 12, they are done locally in memory.In Line 11 and Line 13, we will show the following three things in order to call Lemma F.1.1. (cid:80) i ∈ [ m ] a i /s i ( x ) can be computed in one pass in h ( n ) space;2. the Hessian H ( x ) ∈ R n × n can be decomposed into H ( x ) = L G ( x ) + D ( x ) , where L G ( x ) ∈ R n × n is the Laplacian of some graph G with edge weight w and D ( x ) ∈ R n × n is diagonal matrix;3. we can read the edge-weight pair ( e, w e ) in one pass, and we can read the diagonal of D ( x ) inone pass. Part 1 : By Lemma H.5 and our assumption of the existence of such oracle I , given a feasible x ∈ R n ,we can output a stream of all edges e i = ( u, v ) together with its slack s i ( x ) = a (cid:62) i x − b i in one passwithin O ( n ) + f ( n ) space. So (cid:80) i ∈ [ m ] a i /s i ( x ) can be computed in one pass in h ( n ) = O ( n ) + f ( n ) space. Part 2 : This follows by Lemma H.6.

Part 3 : By Lemma H.6, the edge weight is w i = s i ( x ) − , and D ( x ) i,i = s i + m ( x ) − + s − i + m + n .So Part 3 is satisﬁed by Lemma H.5.By Lemma F.1, Line 11 and Line 13 can be implemented in streaming model within (cid:101) O ( n ) + f ( n ) space and (cid:101) O ( √ m ) passes.In Line 14, by Lemma H.2, it can be done in one matrix multiplication with A , which can bedone in one pass and O ( n ) space.Outputting set S requires | S | space. Lemma H.5 (Output slack stream) . Suppose there is an oracle I running in f ( n ) space that canoutput b i given any i ∈ [ m ] . Given a feasible x ∈ R n , we can output a stream of all edges e i = ( u, v ) together with its slack s i ( x ) = a (cid:62) i x − b i in one pass within O ( n ) + f ( n ) space.Proof. Wlog, we assume every edge has its unique id i ∈ [ m ] and we receive this id in the stream .We read one pass of all edges. Upon receiving edge e i = ( u, v ) , we output x u − x v − b i where weuse I to calculate b i . Actually we can remove this assumption by observing that our oracle Algorithm 3 also works for i ∈ [ n ] , inwhich we give every edge e = ( u, v ) an index id( e ) = u · n + v ∈ [ n ] . .4 Building blocks Lemma H.6 (Hessian is SDDM matrix) . Let the graph G = ( V, E ) be the input of Algorithm 6.Let n = | V | , m = | E | . Let A ∈ R ( m + n ) × n , b ∈ R m + n be deﬁned as in Line 5 (Algorithm 6).Let x ∈ R n be any feasible point, i.e. A x > b . Let s ( x ) ∈ R m + n and H ( x ) ∈ R n × n be the slackand hessian deﬁned w.r.t. A and b (Def. C.4). Then H ( x ) ∈ R n × n can be written as H ( x ) = L ( x ) + D ( x ) , where L ( x ) ∈ R n × n is the laplacian matrix of graph G with edge weight { s ( x ) − , · · · , s m ( x ) − } , D ( x ) ∈ R n × n is the diagonal matrix with each diagonal entry D ( x ) i,i = s i + m ( x ) − .Proof. Let A ∈ R m × n be signed edge-vertex incident matrix of G with direction V L to V R , then A · x ≥ b can be written as (cid:20) A I V L − I V R (cid:21) · x ≥ (cid:20) b (cid:21) , where I V L ∈ R n × n denotes the diagonal matrix with I i,i = (cid:40) , if i ∈ V L ;0 , otherwise . and I V R ∈ R n × n denotes the diagonal matrix with I i,i = (cid:40) , if i ∈ V R ;0 , otherwise . Denote S ∈ R m × m the diagonal matrix with each diagonal entry S i,i := s i ( x ) . Thus H ( x ) canbe written as H ( x ) = A (cid:62) S − A + m + n (cid:88) i = m +1 e i e (cid:62) i · s i ( x ) − = L ( x ) + D ( x ) , where the ﬁrst step is by deﬁnition of H ( x ) (Def. C.4), the second step is by deﬁnition of L ( x ) and D ( x ) . H.5 A minimum vertex cover solver

Note that Algorithm 6 is actually a high-accuracy fractional minimum vertex cover solver for generalgraph (not necessarily bipartite graph), since we do not use the bipartite property of matrix A inIPM. Theorem H.7.

Let G be a graph with n vertices and m edges. Consider the fractional minimumvertex cover problem Eq. (31) in which every edge e needs to be covered at least b e times. Let x ∗ ∈ R n be the optimal solution of Eq. (31) . There exists a streaming algorithm (Algorithm 6) such that forany δ > , it can output a feasible vertex cover x ∈ R n such that (cid:62) n x ≤ (cid:62) n x ∗ + δ in (cid:101) O ( √ m ) · log(1 /δ ) passes and (cid:101) O ( n ) · log(1 /δ ) space with probability − / poly( n ) . roof. The proof follows directly from the proof of Lemma H.1 and Lemma H.4.As a byproduct, we obtain a fast semi-streaming algorithm for (exact, integral) minimum vertexcover in bipartite graph.

Theorem H.8.

Given a bipartite graph G with n vertices and m edges, there exists a streamingalgorithm that computes a minimum vertex cover of G in (cid:101) O ( √ m ) passes and (cid:101) O ( n ) space withprobability − / poly( n ) .Proof. Given a bipartite graph G , we use the isolation lemma from [MVV87] to assign weight chosenuniformly from [2 m ] on each vertex independently, which takes (cid:101) O ( n ) space. The weight vector isdenoted as w ∈ Z n . Next, we write an LP in the form of Eq. (31) for this weighted minimum vertexcover problem, where the objective is w (cid:62) x (instead of (cid:62) n x ). Since G is bipartite, by Theorem A.5,all extreme points of the polytope are integral. By the isolation lemma, the optimal solution isunique, so the unique solution must be integral. Taking δ = 1 / poly( n ) for some suﬃciently large poly( n ) and applying Theorem H.7, the high-accuracy solution obtained from IPM can be roundedto the optimal integral solution, which in total takes (cid:101) O ( √ m ) passes and (cid:101) O ( n ) space with probability − / poly( n ) . I Combine

The goal of this section is to prove Theorem I.1.

Theorem I.1 (Main theorem, formal version of Theorem 1.1) . Given a bipartite graph G with n vertices and m edges, there exists a streaming algorithm that computes a maximum weightedmatching of G in (cid:101) O ( √ m ) passes and (cid:101) O ( n ) space with probability − / poly( n ) .Proof. By combining Lemma I.2 and Lemma I.3.

I.1 AlgorithmsI.2 Running time

The goal of this section is to prove Lemma I.2

Lemma I.2 (Running time) . Given a bipartite graph with n vertices and m edges, Algorithm 7 canbe implemented such that it runs in (cid:101) O ( √ m ) passes in streaming model in (cid:101) O ( n ) space.Proof. In Line 6, 7, we actually do not explicitly calculate b and b and stored them in memory. Weinstead use an oracle I stated in Lemma B.1 that gives b i bits by bits. So this step does not costspace.In Line 8, we calculate and store c ∈ Z n in (cid:101) O ( n ) space.In Line 9, we call MinimuVertexCover . In order to use Lemma H.4, we need to prove thefollowing properties.1. By Lemma B.2, (cid:107) b (cid:107) ∞ ≤ n so (cid:107) b (cid:107) ∞ ≤ n . And (cid:107) c (cid:107) ∞ = 1 . So both (cid:107) b (cid:107) ∞ and (cid:107) c (cid:107) ∞ ispolynomially bounded.2. By Lemma B.1, there is an oracle I that uses O (log( n n ) + log( m )) = (cid:101) O ( n ) space that canoutput b i . Since each edge e i comes with its weight w i in the stream, we can output b i = b i + n w i when given i ∈ [ m ] .3. We can always assume | S | ≤ n since otherwise we will never enter Line 10.By applying Lemma H.4, this call can be done in (cid:101) O ( n ) space and (cid:101) O ( √ m ) passes.41 lgorithm 7 Main ( G = ( V L , V R , E, w )) procedure Main ( G = ( V L , V R , E, w ) ) (cid:46) Theorem I.1 (cid:46) w denotes edge weights that are integers. n ← | V L | + | V R | , m ← | E | M ← ∅ (cid:46) M is maximum matching for i = 1 → O (log n ) do b ← Isolation ( m, n n ) (cid:46) b ∈ Z m , Algorithm 2, Lemma B.2 b ← b + n · w c ← V L − V R S ← MinimumVertexCover ( G, b, c, n, m ) (cid:46) S ⊆ [ m ] , Algorithm 6 if | S | ≤ n then Let M (cid:48) be maximum weighted matching found in edge set S if w ( M (cid:48) ) > w ( M ) then (cid:46) Update maximum weighted matching M ← M (cid:48) end if end if end for return M end procedure In Line 11, since we are ﬁnding maximum matching in a graph with n vertices and n edges,we can store them in memory, then check all the n possible sets of edges. This cost (cid:101) O ( n ) spacewithout any pass. We ﬁnd the set of edges that is a matching and has the maximum weight.With O (log n ) iterations overhead (Line 5), the number of passes blow up by an O (log n ) factor.So overall, we used (cid:101) O ( n ) space and (cid:101) O ( √ m ) passes. I.3 Correctness

The goal of this section is to prove Lemma I.3.

Lemma I.3 (Correctness) . Given a bipartite graph G with n vertices and m edges. With probabilityat least − / poly( n ) , Algorithm 7 outputs one maximum matching.Proof. Consider the linear programming Eq. (30). According to part 1 of Lemma I.7, with proba-bility at least / , there is a unique solution y ∗ to Eq. (30).By Lemma H.1, with probability at least / the algorithm MinimumVertexCover success-fully returns a subset S of tight constraints which corresponds to an optimal solution x ∗ , s ∗ ondual problem Eq. (31) (note that the linear programming in Lemma H.1 and Eq. (31) only diﬀerin signs). This means s ∗ i = 0 if and only if i ∈ S . According to part 1 and part 2 of Lemma I.7, | S | ≤ n and y ∗ i = 0 for all i / ∈ S . Therefore, there exists a maximum matching using only edges in S , and we will ﬁnd it in Line 11.Overall, in each iteration of the for loop (Line 5), with probability at least (1 / · (1 /

4) = 1 / we can ﬁnd a maximum matching. After O (log n ) loops, we can ﬁnd a maximum matching withprobability at least − / poly( n ) . I.4 Primal to dual

This section tells the preliminary stuﬀ for proving correctness Lemma I.3. We give Theorem I.6,which is later used in Lemma I.7. 42 eﬁnition I.4 (Maximum weighted matching) . Given a bipartite graph G = ( V, E ) with | V | = n and | E | = m . Let A ∈ { , } m × n be the unsigned edge-vertex incident matrix. Given weight b ∈ Z m on every edge, the maximum weighted matching can be written as the following linear programming: Primal max y ∈ R m b (cid:62) y (30) s . t . A (cid:62) y ≤ n y ≥ Its dual form is

Deﬁnition I.5 (Fractional minimum vertex cover) . Let A ∈ Z m × n , b ∈ Z m be deﬁned as in Def. I.4.The dual form of Eq. (30) is Dual min x ∈ R n (cid:62) n x (31) s . t . Ax ≥ bx ≥ Theorem I.6 (Strong duality from complementary slackness [PS98]) . Let y ∈ R m be a feasiblesolution to the primal Eq. (30) , and let x ∈ R n be a feasible solution to the dual Eq. (31) . Let s := Ax − b ∈ R m . Then x ∈ R n , s ∈ R m , y ∈ R m satisfy y (cid:62) s = 0 and x (cid:62) ( n − A (cid:62) y ) = 0 if and only if x ∈ R n , s ∈ R m is optimal to the dual and y ∈ R m is optimal to the primal. I.5 Properties of primal and dual LP solutions

The goal of this section is to prove Lemma I.7.

Lemma I.7 (Properties of LP solutions) . Given a bipartite graph G with n vertices and m edges. Let w ∈ N m be edge weight. Let b ∈ Z m be the output of Isolation ( m, Z ) where Z := n n (Algorithm 2).Let b := b + n · w ∈ Z m . Let A ∈ { , } m × n be edge-vertex incident matrix of G (unsigned).Consider the linear programming in Eq. (30) and Eq. (31) with parameter A and b . With probabilityat least / , we have1. There is a unique solution y ∗ ∈ R m to the primal LP (Eq. (30) ). Furthermore, y ∗ ∈ { , } m , y ∗ is the maximum candidate matching of G ;2. Let x ∗ ∈ R n , s ∗ ∈ R m be optimal solution to the dual LP (Eq. (31) ). Then we have the followingproperties on s ∗ .(a) For any i ∈ [ m ] , if s ∗ i > then y ∗ i = 0 ;(b) (cid:107) s ∗ (cid:107) ≥ m − n Proof.

Part 1

Let the feasible space of y be S := { y ∈ R m | Ay ≤ n , y ≥ } . We implicitly have that y ≤ m ,so S is a bounded region. Let S denote all extreme points on S . The dual LP is a generalized version of the minimum vertex cover problem: each edge i needs to be covered byat least b i times, where the case of b = m is the classic minimum vertex cover. S which has the optimal solution.By Lemma A.5, we know all extreme points is integral. Since m ≤ y ≤ m , all extreme pointsare in { , } m , which correspond to a matching. Because we set b := b + n · w as our objectivevector, we can write (cid:104) b, y (cid:105) = (cid:104) b, y (cid:105) + n · w (cid:62) y . Since (cid:107) b (cid:107) ∞ ≤ n by Lemma B.2, the extreme pointwho has the optimal objective value must be a maximum weighted matching. Let F be all possiblematchings. We have |F | ≤ n n = Z .By applying Lemma B.2, with probability at least / , we know that there is a unique extremepoint y ∗ in S which has the optimal solution.Because our feasible space S is bounded, all point y ∈ S \ S can be written as a linear combinationof extreme points on S . That is, if we write S = { y (1) , · · · , y ( s ) } where s := | S | , then all point y ∈ S \ S can be written as y = (cid:88) i ∈ [ s ] a i y ( i ) , where ≤ a i < , ∀ i ∈ [ s ] and (cid:80) i ∈ [ s ] a i = 1 . Therefore, we have b (cid:62) y = (cid:88) i ∈ [ s ] a i · ( b (cid:62) y ( i ) ) < max i ∈ [ s ] b (cid:62) y ( i ) . So y ∗ is actually the unique optimal solution among all points in S . Part 2

Part (a) follows trivially from Theorem I.6.Now we prove Part (b). Assume (cid:107) s ∗ (cid:107) < m − n . Let x ∗ ∈ R n be any optimal dual solutionthat relates to s ∗ , i.e. Ax ∗ − b = s ∗ . We will show that there exist a feasible solution to the primal y (cid:48) ∈ R m such that y (cid:48) (cid:54) = y ∗ , (cid:104) y (cid:48) , s ∗ (cid:105) = 0 , (cid:104) x ∗ , n − A (cid:62) y (cid:48) (cid:105) = 0 . By Theorem I.6, y (cid:48) is also an optimalsolution to the primal LP, contradicting with the uniqueness of y ∗ . (In fact, we will prove that such y (cid:48) ’s are inﬁnitely many.)Consider the following linear system y i = 0 , ∀ i ∈ [ m ] such that s ∗ i (cid:54) = 0 (32) A (cid:62) y = A (cid:62) y ∗ . (33)In constraint Eq. (32) there are less than (cid:107) s ∗ (cid:107) < m − n equalities, while in constraint Eq. (33)there are n equalities. So if we write the above system in the matrix form (cid:101) Ay = (cid:101) c, y ≥ m , it mustbe that rank ( (cid:101) A ) ≤ (cid:107) s (cid:107) + n < m . We obtain y = (cid:101) A † (cid:101) c + ( I − (cid:101) A † (cid:101) A ) z, where z ∈ R m is a free variable. Since rank ( (cid:101) A (cid:62) (cid:101) A ) ≤ rank ( (cid:101) A ) < m , it must be I m − (cid:101) A † (cid:101) A (cid:54) = m × m .Observe that f ( z ) := (cid:101) A † (cid:101) c + ( I − (cid:101) A † (cid:101) A ) z is an aﬃne function passing through point y ∗ . Also notethat y ∗ ≥ m and y ∗ (cid:54) = m (otherwise y ∗ = m then we can increase an arbitrary component of y ∗ to to increase b (cid:62) y , contradicting with the optimality). As a result, f ( z ) must pass throughinﬁnitely many points in the subspace y ≥ m . Let y (cid:48) be any such solution. By Eq. (32), we have (cid:104) y (cid:48) , s ∗ (cid:105) = 0 By Eq.(33) and Theorem I.6, we have (cid:104) x ∗ , n − A (cid:62) y (cid:48) (cid:105) = (cid:104) x ∗ , n − A (cid:62) y (cid:105) = 0 By Theorem I.6, y (cid:48) is also an optimal solution, contradicting with the uniqueness of y ∗ .44 Solver reductions

J.1 From SDDM solver to SDD solver We recall Gremban’s reduction in [ST04] that reduces the problem of sovling an SDD system tosolving an SDDM system. Let A be an SDD matrix, decompose A into D + A neg + A pos , where D is the diagonal of A , A neg contains all the negative oﬀ-diagonal entries of A with the same size, and A pos contains all the positive oﬀ-diagonal entries of A with the same size. Consider the followinglinear system (cid:98) A (cid:20) x x (cid:21) = (cid:98) b, where (cid:98) A = (cid:20) D + A neg − A pos − A pos D + A neg (cid:21) and (cid:98) b = (cid:20) b − b (cid:21) The matrix (cid:98) A can be (implicitly) computed in the streaming model: in one pass we computeand store the diagonal matrix D by adding the edge weights incident on each vertex; then (cid:98) A isgiven as a stream of edges (entries) since whenever an edge (an entry in A ) arrives, we immediatelyknow its position in (cid:98) A . Note that if Ax = b admits a solution, then x = ( x − x ) / is exactly itssolution. Moreover, if (cid:13)(cid:13)(cid:13)(cid:13)(cid:20) x x (cid:21) − (cid:98) A † (cid:98) b (cid:13)(cid:13)(cid:13)(cid:13) (cid:98) A ≤ (cid:15) (cid:107) (cid:98) A † (cid:98) b (cid:107) (cid:98) A , then x satisﬁes (cid:107) x − A † b (cid:107) A ≤ (cid:15) (cid:107) A † b (cid:107) A . So we obtain an SDD solver with asymptotically the samenumber of passes and space as an SDD sovler. J.2 From SDDM solver to SDDM solver In this section, we show that to approximately solve an SDDM system Ay = b , it suﬃces to pre-process the input in O (1) passes, approximately solve an SDDM system (cid:101) Ay = (cid:101) b with at most thesame size, and possibly do some post-process in O (1) passes.If A is positive deﬁnite, then we can solve the system by an SDDM solver, so assume not in thefollowing.From Fact A.4 we know that A must be a Laplacian matrix. Therefore, it remains to reducethe problem of (approximately) solving a Laplacian system Ly = b to (approximately) solving anSDDM system. The following facts are well-known. Fact J.1.

Given a Laplacian matrix L corresponding to graph G , the following holds: • L (cid:23) ; • L n = 0 and thus λ ( L ) = 0 ; • G is connected iﬀ λ ( L ) > . Given a Laplacian matrix L as a stream of entries, it is equivalent to treat it as a stream ofedges of G . In one pass, we can identify all the connected components of G using (cid:101) O ( n ) space(e.g., by maintaining the spanning forest of G ). Next, any entry in the stream is identiﬁed and45ssigned to the subproblem corresponding to the connected component that contains it. Thisdoes not inﬂuence the worst-case pass and space complexity, because each subproblem uses spaceproportional to the size of its connected component and the total number of passes depends on theconnected component that takes up the most passes. Therefore, we can assume that G is connected,which implies that rank ( L ) = n − by Fact J.1.The goal of approximately solving Ly = b is for given error parameter (cid:15) > , ﬁnding an (cid:15) -approximate solution x satisfying (cid:107) x − L † b (cid:107) L ≤ (cid:15) (cid:107) L † b (cid:107) L . If y is an (exact) solution to system Ly = b , then y (cid:48) := y − y n is also a solution, where y is the ﬁrst entry of y . So we can assume that the ﬁrst entry of L † b is . (There might be manysolutions, but we ﬁx one with the ﬁrst entry being .) Let (cid:101) A be the matrix L with the ﬁrst row andcolumn deleted, and let (cid:101) b be the vector b with the ﬁrst entry deleted. Note that (cid:101) A (cid:31) .Let (cid:101) x be an (cid:15) -approximate solution to the system (cid:101) Ax = (cid:101) b , and let x be the vector (cid:101) x with inserted as its ﬁrst entry. It must be that (cid:107) x − L † b (cid:107) L = (cid:107) (cid:101) x − (cid:101) A − (cid:101) b (cid:107) (cid:101) A because vector (cid:101) A − (cid:101) b is the vector L † b with the ﬁrst entry deleted.Finally, we have that (cid:107) x − L † b (cid:107) L ≤ (cid:15) (cid:107) L † b (cid:107) L since (cid:107) (cid:101) x − (cid:101) A − (cid:101) b (cid:107) (cid:101) A ≤ (cid:15) (cid:107) (cid:101) A − (cid:101) b (cid:107) (cid:101) A and (cid:107) L † b (cid:107) L = (cid:107) (cid:101) A − (cid:101) b (cid:107) (cid:101) A , which gives an (cid:15) -approximate solution to the original system Ly = b . The above process is equivalent to partition L into block diagonal matrices, solve each linear system with respectto the submatrices and corresponding entries of b , and combine the result., and combine the result.