Breaking the n -Pass Barrier: A Streaming Algorithm for Maximum Weight Bipartite Matching
BBreaking the n -Pass Barrier: A Streaming Algorithm forMaximum Weight Bipartite Matching S. Cliff Liu ∗ Zhao Song † Hengjie Zhang ‡ Abstract
Given a weighted bipartite graph with n vertices and m edges, the maximum weight bipartitematching problem is to find a set of vertex-disjoint edges with the maximum weight. This classicproblem has been extensively studied for over a century.In this paper, we present a new streaming algorithm for the maximum weight bipartitematching problem that uses (cid:101) O ( n ) space and (cid:101) O ( √ m ) passes, which breaks the n -pass barrier.All the previous algorithms either require Ω( n log n ) passes or only find an approximate solution.To achieve this pass bound, our algorithm combines a number of techniques from differentfields such as the interior point method (IPM), symmetric diagonally dominant (SDD) systemsolving, the isolation lemma, and LP duality. To the best of our knowledge, this is the firstwork that implements the SDD solver and IPM in the streaming model in (cid:101) O ( n ) spaces for graphmatrix. All the previous IPMs only focus on optimizing the running time, regardless of thespace usage. The LP solver for general matrix is impossible to be implemented in (cid:101) O ( n ) spaces. ∗ Princeton University. [email protected] . † Columbia University, Princeton University, and Institute for Advanced Study. [email protected] . ‡ Columbia University. [email protected] . a r X i v : . [ c s . D S ] S e p ontents O ( n log n ) passes . . . . . . . . . . . . . . . . . . . 51.3.4 Other algorithms with more space . . . . . . . . . . . . . . . . . . . . . . . . 51.4 An overview of our techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61.4.1 Interior point method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71.4.2 Streaming implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71.4.3 The isolation lemma . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81.5 Roadmap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 A Notations 16B Isolation lemma in the streaming model 17
B.1 Isolation lemma . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17B.2 Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18B.3 Proof of uniqueness: step 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18B.4 Proof of uniqueness: step 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18B.5 Proof of uniqueness: step 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19B.6 Proof of uniqueness: step 4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
C Preliminary for IPM 20
C.1 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20C.2 Approximation tools from previous work . . . . . . . . . . . . . . . . . . . . . . . . . 21
D Algorithm 24E Error analysis of IPM 24
E.1 Assumptions on parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25E.2 Bounding potential function Φ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25E.3 Bounding the movement of t . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26E.4 Upper bounding the potential function . . . . . . . . . . . . . . . . . . . . . . . . . . 27E.5 Move both: final . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28E.6 Move both: part 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28E.7 Move both: part 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29E.8 Move both: part 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 F Running time of IPM 31G SDD solver in the streaming model 31
G.1 SDD and Laplacian systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31G.2 The preconditioner . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32G.3 An iterative solver . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33G.4 An iterative solver: the space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33i.5 An iterative solver: the accuracy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34G.6 Main result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
H Minimum vertex cover 36
H.1 Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36H.2 Correctness of Algorithm 6 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37H.3 Running time of Algorithm 6 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39H.4 Building blocks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40H.5 A minimum vertex cover solver . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
I Combine 41
I.1 Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41I.2 Running time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41I.3 Correctness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42I.4 Primal to dual . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42I.5 Properties of primal and dual LP solutions . . . . . . . . . . . . . . . . . . . . . . . . 43
J Solver reductions 45
J.1 From SDDM solver to SDD solver . . . . . . . . . . . . . . . . . . . . . . . . . . . 45J.2 From SDDM solver to SDDM solver . . . . . . . . . . . . . . . . . . . . . . . . . . . 45ii Introduction
Motivated by the need of processing massive graphs, the streaming model has became a desirabletestbed for designing algorithms for fundamental graph problems. The beautiful work of Ahn, Guhaand McGregor [AGM12a] gave a streaming algorithm for dynamic graph connectivity and spanningforest with O ( n log n ) space complexity, which has been proved to be optimal due to Nelson andYu [NY19, Yu20]. Since then, there is a growing line of work studying max-cut [KKS14b, KK15,KKSV17], cut and spectral sparsifiers [KL11, AGM12b, KLM +
17, KMM +
19, KNST19], spannerconstruction [AGM12b, KW14], and so on.Matching is a fundamental problem in theoretical computer science. The studies of classicalgraph matching algorithms has spanned for over a century. The design of matching algorithms hastwo categories, one is focusing on leveraging the combinatorial properties in the graph [Tar83]; theother is reformulating graph problem as a specific linear program [DS08], and using interior pointmethod [Kar84] and SDD system solver [ST04]. In the streaming setting, the bipartite match-ing problem has been extensively studied [FKM +
04, AG11, EKMS12, GKK12, EKMS12, Kap13,DNO14, KKS14a, DNO14, AG18, ABB +
19, AKSY20, AR20, FHM +
20, Ber20].Many works mentioned above consider the semi-streaming model, where each edge along withits weight is revealed one by one in the stream in an adversarial order and the algorithms areallowed to make one or more passes over the stream using (cid:101) O ( n ) space. (In fact, even some basicgraph algorithms require Ω( n ) space [FKM + not tobe the same. Despite the intimate relationship between this model and the matching problem,almost all the existing streaming algorithms for maximum weight bipartite matching are only ableto find an approximate solution. The only exception is a folklore algorithm that repeatedly findsan augmenting path by a breath-first search, which takes O ( n log n ) passes and (cid:101) O ( n ) space to findan exact maximum cardinality bipartite matching (see Section 1.2 for details). Obtaining even an O ( n ) -pass semi-streaming algorithm for this problem seems to be a major challenge, which raisesthe following open problem: Is there a semi-streaming algorithm for maximum weight bipartite matching that uses o ( n ) passes? In this work, we answer this question positively. We give a semi-streaming algorithm that takes (cid:101) O ( √ m ) IPM iterations to compute the maximum weight bipartite matching, where each iterationtakes O (log n ) passes to implement a streaming version of an SDD solver for computing the Newtonstep of the path-following IPM. IPM was proposed by Karmarkar [Kar84] in 1984. It has been shownthat IPM is a powerful tool in improving the running time of linear programming solver and severalspecific graph-type linear programming. The classical IPM solver does not have space requirement,but in the streaming setting, the algorithm is not allowed to store all the m constraints, where theLP has m constraints and n variables. To the best of our knowledge, our algorithm is the first toimplement IPM in space-limited models such as the streaming model.More specifically, to solve the maximum weight bipartite matching problem in (cid:101) O ( n ) space,we consider its dual, the generalized minimum vertex cover problem in bipartite graph, and solvethis problem using a streaming version of IPM. Therefore, this work also yields an (cid:101) O ( √ m ) -passsemi-streaming algorithm for the minimum vertex cover problem in bipartite graph. To transformthe dual solution to a primal solution in (cid:101) O ( n ) space, we use the isolation lemma, whose original Some authors define the space in the streaming model to be the number of cells, where each cell can hold O (log n ) bits or even a number with infinite precision. Our bounds remain unchanged even if each cell only holds constantnumber of bits. (cid:101) Θ( m ) random bits [MVV87]. We bypass this issue by implementing an alternativeconstruction that uses (cid:101) O ( n ) random bits in the semi-streaming model. We assume all edge weights are integers between and poly( n ) . For any function f : R → R , weuse (cid:101) O ( f ) to denote f · poly log( f ) . For any positive semi-definite matrix A ∈ R n × n , let (cid:107) · (cid:107) A denotethe norm where (cid:107) x (cid:107) A = √ x (cid:62) Ax , ∀ x ∈ R n .The given graph G has n vertices and m edges. In the semi-streaming model, the algorithm ispermitted (cid:101) O ( n ) space. In each pass, the algorithm reads the stream of all edges of G , where eachedge and its weight appear together. The edge orders in different passes need not to be the same.The task is to minimize the number of passes.We provide a semi-streaming algorithm for bipartite matching running in (cid:101) O ( √ m ) passes. Thissolves the long standing open problem of whether maximum matching can be solved in o ( n ) passes,in all regimes where m = n − (cid:15) for any constant (cid:15) > . Theorem 1.1 (Main theorem, informal version of Theorem I.1) . Given a bipartite graph G with n vertices and m edges, there exists a streaming algorithm that computes a maximum weight matchingof G in (cid:101) O ( √ m ) passes and (cid:101) O ( n ) space with probability − / poly( n ) . As a byproduct, we also show that minimum vertex cover in bipartite graph, which is the dualproblem of maximum matching, can be solved within the same number of passes and space. A vertex cover of a graph is a set of vertices that includes at least one endpoint of every edge of thegraph, and a minimum vertex cover is a vertex cover of minimum size.
Theorem 1.2 (Informal version of Theorem H.8) . Given a bipartite graph G with n vertices and m edges, there exists a streaming algorithm that computes a minimum vertex cover of G in (cid:101) O ( √ m ) passes and (cid:101) O ( n ) space with probability − / poly( n ) . Our result is obtained by a novel application of interior point method together with SDD systemsolver in the streaming model. To the best of our knowledge, this is the first work that implementsthose techniques in the streaming model in which only (cid:101) O ( n ) space is allowed. We put our result ofSDD solver here for possible independent interests. Theorem 1.3 (Informal version of Theorem G.9) . There is a streaming algorithm which takes inputan SDD matrix A , a vector b ∈ R n , and a parameter (cid:15) ∈ (0 , , if there exists x ∗ ∈ R n such that Ax ∗ = b , then with probability − / poly( n ) , the algorithm returns an x ∈ R n such that (cid:107) x − x ∗ (cid:107) A ≤ (cid:15) · (cid:107) x ∗ (cid:107) A in O (1 + log(1 /(cid:15) ) / log log n ) passes and (cid:101) O ( n ) space. We hope the tools developed in this paper will find further applications in the study of streamingalgorithms and beyond. We can actually solve a generalized version of the minimum vertex cover problem in bipartite graph: each edge e needs to be covered for at least b e ∈ Z + times, where the case of b = m is the classic minimum vertex cover. Our algorithm can actually solve any SDD system, which is more general than an SDD system. See a formaldefinition of SDD matrix in Section A. .2 Related work Streaming algorithms for matching
Maximum matching has been extensively studied in thestreaming model for decades, where almost all of them fall into the category of approximation algo-rithms. For algorithms that only make one pass over the edges stream, researchers make continuousprogress on pushing the constant approximation ratio above / , which is under the assumption thatthe edges are arrived in a uniform random order [FKM +
04, KKS14a, ABB +
19, FHM +
20, Ber20].The random-order assumption makes the problem substantially easier and is usually unrealistic. Amore natural setting is multi-pass streaming with adversarial edge arriving. There is a long line ofresearch in proving upper bounds and lower bounds on the number of passes to compute a maxi-mum matching in the streaming model [AG11, EKMS12, GKK12, EKMS12, Kap13, DNO14, AG18,AKSY20, AR20]. Notably, [AG11, AG18] use linear programming and duality theory (see the nextsubsection for more details).However, all the algorithms above can only compute an approximate maximum matching: tocompute a matching whose size is at least (1 − (cid:15) ) times the optimal, one needs to spend poly (1 /(cid:15) ) passes (see [DNO14, AG18] and the references therein).Despite the intimate relationship between the matching problem and the streaming model, nobreakthrough is made towards solving exact bipartite matching in the streaming model. We remarkthat a simple folklore algorithm inspired by the classic algorithm of Hopcroft and Karp [HK73] canactually find the exact maximum cardinality bipartite matching in (cid:101) O ( n ) passes using (cid:101) O ( n ) space,whose details can be found in the next subsection. Streaming spectral sparsifer
Initialized by the study of cut sparsifier in the streaming model[AGM12a], a simple one-pass semi-streaming algorithm for computing a spectral sparsifier of anyweighted graph is given in [KL11], which suffices for our use in this paper. The problem becomesmore challenging in a dynamic setting, i.e., both insertion and deletion of edges from the graph areallowed. Using the idea of linear sketching, [KLM +
17] gives a single-pass semi-streaming algorithmfor computing the spectral sparsifier in the dynamic setting. However, their brute-force approachto recover the sparsifier from the sketching uses Ω( n ) time. An improved recover time is given in[KMM +
19] but requires more spaces, e.g., (cid:15) − n . log O (1) n . Finally, [KNST19] proposes a single-pass semi-streaming algorithm that uses (cid:15) − n log O (1) n space and (cid:15) − n log O (1) n recover time tocompute an (cid:15) -spectral sparsifier which has O ( (cid:15) − n log n ) edges. Note that Ω( (cid:15) − n log n ) space isnecessary for this problem [CKST19]. SDD solver and IPM
There is a long line of work focusing on fast SDD solvers [ST04, KMP10,KMP11, KOSZ13, CKM +
14, PS14, KS16]. Spielman and Teng give the first nearly-linear time SDDsolver, which is simplified with a better running time in later works. The current fastest SDD solverruns in O ( m log / n poly(log log n ) log(1 /(cid:15) )) time [CKM + (cid:101) Θ( m ) space. Peng and Spielman give a nearly-linear work parallelSDD solver, which, if aiming for (cid:101) O (1) depth, requires m processors (space) [PS14].The interior point method was originally proposed by Karmarkar [Kar84] for solving linearprogram. Since then, there is a long line of work on speeding up interior point method for solvingclassical optimization problems, e.g., linear program [Vai87, Ren88, Vai89, NN89, LS14, LS13, LS14,LS15, CLS19, LSZ19, Son19, BLSS20, SY20, JSWZ20], and semi-definite program [Ans00, NN92,NN94, JKL + .3 Previous techniques In this section, we summarize the previous techniques for computing a maximum matching in thestreaming model. • In Section 1.3.1, we introduce some representative approximation algorithms for bipartitematching. • In Section 1.3.2, we present a potential method to compute an exact bipartite matching,showcasing the current bottleneck in the field. • In Section 1.3.3, we discuss a simple folklore semi-streaming algorithm that uses O ( n log n ) passes, which is by far the best. • In Section 1.3.4, we introduce some related bipartite matching algorithms which use morespace than that allowed in the semi-streaming model.
Given a parameter (cid:15) ∈ (0 , , many streaming algorithms are to find a matching of size (1 − (cid:15) ) timesthe size of the maximum matching. The space and passes usages of these approximation algorithmsare increasing functions of /(cid:15) . A natural idea to find an approximate matching is to iteratively sample a small subset of edgesand use these edges to refine the current matching. These algorithms are called sampling-based algorithms. In [AG18], Ahn and Guha show that by adaptively sampling (cid:101) O ( n ) edges in eachiteration, one can either obtain a certificate that the sampled edges admit a desirable matching, orthese edges can be used to refine the solution of a specific LP. The LP is a nonstandard relaxation ofthe matching problem, and will eventually be used to produce a good approximate matching. Thealgorithm of Ahn and Guha can compute a (1 − (cid:15) ) -approximate matching for weighted nonbiparitegraph in (cid:101) O (1 /(cid:15) ) passes and (cid:101) O ( n poly(1 /(cid:15) )) space. However, the degree of poly(1 /(cid:15) ) in the spaceusage can be very large, making their algorithm inapplicable for small (non-constant) (cid:15) < / log n in the semi-streaming model.Finding a (1 − (cid:15) ) -approximate maximum matching for arbitrary small (cid:15) (with dependence on n )requires different methods. Inspired by the well-studied water filling process in online algorithms(see [DJK13] and the references therein), Kapralov proposes an algorithm that generalizes thewater filling process to multiple passes [Kap13]. This algorithm works in the vertex arrival semi-streaming model, where a vertex and all of its incident edges arrive in the stream together. Theobservation is that the water filling from pass ( k − to pass k follows the same manner as that inthe first pass (with a more careful double-counting trick), then the basic differential equations givea (1 − / √ πk ) -approximate matching in k passes.Kapralov’s algorithm removes the poly(log n ) factor in the number of passes comparing with[AG11], giving a (1 − (cid:15) ) -approximate maximum matching in O (1 /(cid:15) ) passes, albeit in a strongervertex arrival model. Very recently, Assadi, Liu, and Tarjan give a simple semi-streaming algorithmthat computes a (1 − (cid:15) ) -approximate maximum matching in O (1 /(cid:15) ) passes, removing the vertexarrival condition [ALT20]. Their algorithm considers the left vertices in a bipartite graph as bidders and the right vertices as items , then the maximum cardinality bipartite matching problem is to findan auction maximizing the social welfare. The algorithm only needs to maintain a price for each We will be focusing on approximate algorithms that find a matching that is close to (or can potentially be usedto find) an exact maximum matching, so all the constant-approximate algorithms are not introduced here. We referthe interested readers to [AB19] and the references therein. O (1 /(cid:15) ) rounds (passes). The space usageis O ( n log(1 /(cid:15) )) for storing all the prices, which is (cid:101) O ( n ) since (cid:15) ≥ / ( n + 1) (otherwise we couldobtain an exact matching). A potential method to compute an exact maximum matching is to augment an approximate match-ing by repeatedly finding augmenting paths. The state-of-the-art semi-streaming algorithm basedon auctions by [ALT20] takes O (1 /(cid:15) ) passes to find a (1 − (cid:15) ) -approximate maximum cardinalitybipartite matching, where (cid:15) ≥ / ( n + 1) is a parameter. (For (cid:15) = Ω(1 / log n ) , there could be algo-rithms that take fewer passes. Recall that the algorithm of [AG18] does not work in semi-streamingwhen (cid:15) is too small.) So given o ( n ) passes, no existing algorithm can find a matching of size at least OPT − O ( √ n ) , assuming the maximum matching has size OPT = Θ( n ) . Therefore, we are only ableto obtain a matching of size OPT − Ω( √ n ) from previous approximation algorithms. However, cur-rently there is no semi-streaming algorithm that solves directed graph reachability, a problem thatis no harder than finding one augmenting path, in o ( √ n ) passes [LJS19]. (The linear-work parallelalgorithm of [LJS19] can be translated into a semi-streaming algorithm that finds an augmentingpath in n / o (1) passes.) So augmenting a matching of size OPT − Ω( √ n ) to the maximum match-ing of size OPT also takes Ω( n ) passes. In conclusion, based on existing algorithms, the methodof augmenting an approximate matching does not yield an o ( n ) -pass semi-streaming algorithm. Tobreak the n -pass barrier under this framework, we need either an o ( n ) -pass approximation algorithmthat finds a matching of size OPT − n / − Ω(1) , or an n / − Ω(1) -pass algorithm that finds at leastone augmenting path. O ( n log n ) passes A simple folklore algorithm inspired by the classic algorithm of Hopcroft and Karp [HK73] canactually find the exact maximum cardinality bipartite matching in (cid:101) O ( n ) passes using (cid:101) O ( n ) space.The main idea is the following. Let OPT be the size of the maximum matching in the given n -vertex bipartite graph. If the current matching has size i , then there must exist (OPT − i ) disjointaugmenting paths, so the shortest augmenting path has length at most n/ (OPT − i ) . Using abreath-first search (simply ignore the edge in the stream that is not incident with the frontier ofthe breath-first search), one can find this path in n/ (OPT − i ) passes and augment the currentmatching. Therefore, the total number of passes to compute the perfect matching is at most OPT − (cid:88) i =0 n OPT − i = n · OPT (cid:88) i =1 i = O ( n log n ) . This simple algorithm is by far the state-of-the-art: to the best of our knowledge, before this work,no streaming algorithm computes an exact matching in o ( n log n ) passes. Since the breakthrough result from [Mad13], new improvements have been made in solving themaximum weight bipartite matching problem, and more generally, the maximum flow problem (see[LS20] and the references therein). The maximum cardinality bipartite matching algorithm by Given a matching in a graph, an augmenting path is a path that starts and ends at an unmatched vertex, andalternately contains edges that are outside and inside the matching. m / o (1) time and takes Ω( m ) space, which is currently the fastest when the graphis sufficiently sparse. For moderately dense graph, a better algorithm is given in [BLN +
20] veryrecently, which runs in (cid:101) O ( m + n . ) time and takes Ω( m ) space. For the maximum weight bipartitematching problem, the O ( n ω ) -time algorithm using matrix inversion where ω < . (see [MS04])requires Ω( n ) space, and the (cid:101) O ( m √ n ) -time algorithm by [LS14] requires Ω( m ) space. Our techniques consist of an involved combination of several techniques such as interior pointmethod, streaming SDD solver, and isolation lemma. • In Section 1.4.1, we provide a high-level picture of how interior point method works. • In Section 1.4.2, we give our streaming SDD solver, and briefly show how to implement theisolation lemma in the streaming model. • In Section 1.4.3, we show a two-fold usage of the isolation lemma: The first one is to convert anearly-optimal solution to an exact optimal solution. The second one is to convert an optimalsolution for the dual to an optimal solution for the primal.The major obstacle to overcome in this paper is the space limitation. Give a graph G = ( V, E ) with n = | V | and m = | E | . Consider the LP formulation of the maximum bipartite matchingproblem: max y ∈ R E (cid:88) e ∈ E y e s.t. (cid:88) e ∈ E : e (cid:51) u y e ≤ , ∀ u ∈ Vy ≥ m . Let A ∈ { , } m × n be the (unsigned) edge-vertex incident matrix, the above LP can be written as(Primal) max y ∈ R m (cid:62) m y s.t. A (cid:62) y ≤ m and y ≥ m . (1)Solving this linear programming using IPM needs to store y ∈ R m in the memory, but the spacelimitation is only (cid:101) O ( n ) . So we turn to solve its dual form, in which the solution can be stored in (cid:101) O ( n ) space, then recover the optimal primal solution from a optimal dual solution using standardcomplementary slackness theorem.(Dual) min x ∈ R n (cid:62) n x s.t. Ax ≥ m and x ≥ n . (2)All previous IPMs for the maximum matching problem need to maintain a fractional matching y ∈ R m , which cannot be adapted to the semi-streaming model with an (cid:101) O ( n ) space restriction.The fastest classical LP solvers are due to [LS14], [JSWZ20], and [BLSS20]. The space require-ments for those algorithms are O ( nm ) ([LS14], [BLSS20]), O (( m + n ) ) ([JSWZ20]). The LP solvers in [LS14, JSWZ20, BLSS20] for graph problems require Ω( m ) space, because they need tomaintain some length- m vectors in each iteration even for solving an LP with n variables (e.g., to maintain the dualvariables or the slack variables). Therefore, there is no direct way to use their solvers to solve our primal or dual LPin the semi-streaming model. .4.1 Interior point method Given a matrix A ∈ R m × n and two vectors b ∈ R m and c ∈ R n , consider the following linear program min x ∈ R n c (cid:62) x s . t . Ax ≥ b. We briefly review several basic concepts in Renegar’s algorithm [Ren88] (see Section C, D).Renegar’s IPM solves the above LP by introducing two functions: one is the barrier function F : R n → R and the other is the (parametrized) perturbed objective function f t : R n → R .The standard log-barrier function can be written as F ( x ) := (cid:88) i ∈ [ m ] log( a (cid:62) i x − b i ) , where a (cid:62) , · · · , a (cid:62) m are rows of matrix A . It is easy to see that the domain of F ( x ) is exactly theregion { x ∈ R n : Ax > b } . For any parameter t > , we define the perturbed function f t : R n → R as follows f t ( x ) := t · c (cid:62) x + F ( x ) . Renegar’s method was based on a type of interior point methods known as the path following method ,which incrementally minimizing f t ( x ) . Once we get the minimizer x ∗ t of f t ( x ) for some large t , it isnot far from being optimal, e.g., c (cid:62) x ∗ t ≤ OPT + m/t .From Renegar’s framework, the potential function Φ t ( x ) := (cid:107)∇ f t ( x ) (cid:107) H ( x ) is used to measurehow close from x to the minimizer x ∗ t , where H ( x ) := ∇ F ( x ) is the hessian matrix. Φ t ( x ) is alwaysgreater or equal than , and on the other hand, Φ t ( x ) = 0 means x = x ∗ t . In each iteration, ourgoal is to increase t to t new , move x to x new , while keeping Φ t new ( x new ) small: Φ t new ( x new ) = Φ t ( x ) + Φ t ( x new ) − Φ t ( x ) (cid:124) (cid:123)(cid:122) (cid:125) x move + Φ t new ( x new ) − Φ t ( x new ) (cid:124) (cid:123)(cid:122) (cid:125) t move We are able to prove the following (see Section E):1. In the x moving part, let x new be applying one newton step from x on function f t ( x ) . Imple-menting Newton step requires an SDD solver. And after that, we can guarantee the standardquadratic convergence rate of Newton method: Φ t ( x new ) ≤ Φ t ( x ) .
2. In the t moving part, we multiplicative increase t to t new . We can guarantee that Φ t new ( x new ) ≤ Φ t ( x new ) + | t new /t − | · √ m. Combining the above two parts, we can set t new = (1 + O (1 / √ m )) · t so that an (cid:101) O ( √ m ) iterationIPM follows. We show that the iterating procedure in IPM can be implemented in the streaming model and eachiteration can be done in (cid:101) O (1) passes (see Section F). Recall that each iteration is performing aNewton step ∇ F ( x ) ∇ f t ( x ) ∈ R n . Note that ∇ F ( x ) ∈ R n × n is an SDD matrix in our setting, and ∇ F ( x ) , ∇ f t ( x ) both rely on the slack variable s ( x ) := Ax − b ∈ R m , but we cannot even storelength- m vector b in the space. We address this problem by7 implementing an SDD solver in the streaming model; • implementing the algorithm in the isolation lemma in the streaming model.To the best of our knowledge, this paper is the first to implement the IPM and SDD solver instreaming model. Implement the Laplacian solver in streaming model
For simplicity, we outline a streamingLaplacian solver here, which gives an SDD solver by standard reductions (see Section G). Given agraph G , let L G be the Laplacian matrix of G . Let A be the signed edge-vertex incident matrix,and let a (cid:62) , · · · , a (cid:62) m be rows of A ∈ R m × n . Let s ∈ R m be slack variable.Note that L G has nnz( L G ) = O ( m ) so it cannot be stored in memory, but it can be written as L G = (cid:88) i ∈ [ m ] a i a (cid:62) i s i . Suppose we have an oracle that outputs s i for any given i ∈ [ m ] , then one can do simple thingslike multiplying L G with a vector v , e.g., L G · v . This can be done since we can read one pass ofall edges, and note that for edge e i = ( u, v ) , we have a i = u − v so that a i a (cid:62) i s i · y ∈ R n can becomputed and stored. In fact, we show that this simple operation suffices: by [KLM + H of G . Using L H , the Laplacian of H , as a preconditioner of L G , we can doiterative refinement on the solution: x ( t +1) := x ( t ) + L − H · ( b − L G · x ( t ) ) . After (cid:101) O (1) iterations, x will converge to the solution x ∗ = L − G y with desired precision. Implement the isolation lemma in the streaming model
We start with the definition of theisolation lemma (see Section B).
Definition 1.4 (Isolation lemma) . Given a set system ( S, F ) where F ⊆ { , } S . Given weight w i to each element i in S , the weight of a set F in F is defined as (cid:80) i ∈ F w i . The goal of the isolationlemma is to design a scheme that can assign weight oblivious to F , such that there is a unique setin F that has the minimum (maximum) weight under this assignment. The isolation lemma always involves randomness since its output is oblivious to F . The isolationlemma says that if we randomly choose this weight, then with a good probability the uniqueness isensured. However, this does not apply to the streaming setting since this weight vector is of length m , which cannot be stored in memory. The ideal is to consistently output the same weight vectorin different passes.[CRS95] reduces the number of random bits from | S | to log( |F | ) ≤ | S | . In the case of maximumweight bipartite matching, we have |F | ≤ ( n +1) n since each of the n vertices can choose none or oneof the vertices in the matching. Our streaming implementation is to store these log(( n +1) n ) = (cid:101) O ( n ) random bits, so that we can consistently output the same weight vector in different passes. We used two folds of the isolation lemma. The first one is to convert an optimal solution for thedual to an optimal solution for the primal. The second one is to convert a nearly-optimal solutionof the dual LP that we computed from IPM to an exact optimal solution.8 lgorithm 1
The algorithmic framework in a high-level procedure Main ( G = ( V L , V R , E ) ) n ← | V L | + | V R | m ← | E | Let A ∈ R m × n be the signed edge-vertex incident matrix of G Let b ∈ R m be the perturbed objective vector produced by the generalized isolation lemma Step 1 : (cid:46) (cid:101) O ( √ m ) passes Use IPM to solve max x ∈ R n (cid:62) n x s . t . Ax ≥ b Step 2 : (cid:46) One pass s ← Ax − b S ← { i ∈ [ m ] | s i = 0 } (cid:46) S is a subset of edges of size at most n Step 3 : (cid:46) Do not read the stream
Find a maximum matching using edges in S in (cid:101) O ( n ) space end procedure
21 43 V L V R e e e e x = (1 , , , > , Ax = (1 , , , > y = (1 , , , > , A > y = (1 , , , >
21 43 e e e e V L V R x = (1 , , , > , Ax = (1 , , , > y = (0 , , , > , A > y = (1 , , , > Figure 1: The red circle is a minimum vertex cover, which is an optimal dual solution. The blueedge is a maximum matching, which is an optimal primal solution. In both examples, the primaland dual satisfy complementary slackness Eq. (3).For the first part, to see how the isolation lemma helps, let us see a simple example.Suppose the graph has a (maximum weight) perfect matching. Then the following trivial solutionis optimal to the dual LP: choosing all vertices in V L . Let us show what happens when applyingthe complementary slackness theorem. The complementary slackness theorem says that when y isa feasible primal solution and x is a feasible dual solution, then y is optimal to the primal and x isoptimal to the dual if and only if (cid:104) y, Ax − m (cid:105) = 0 and (cid:104) x, n − A (cid:62) y (cid:105) = 0 . (3)From the above case, we have Ax − m = 0 , so the first equality (cid:104) y, Ax − m (cid:105) = 0 puts no constrainton y . Therefore, any solution y ≥ m to the linear system a (cid:62) i y i = 1 , ∀ i ∈ V L is an optimal solution,where a i is the i -th column of A . Note that this linear system has m variables and | V L | equations,which is still hard to find a solution in (cid:101) O ( n ) space. Roughly speaking, the complementary slackness theorem does not help in this case since choosingall vertices in V L is always a feasible solution, and actually is an optimal solution when a perfectmatching exists. In general, the inverse of a sparse matrix can be dense, which means the standard Gaussian elimination methodfor linear system solving does not apply in the semi-streaming model. b ∈ R m such that theoptimal solution to the following primal LP is unique :(Primal) max y ∈ R m b (cid:62) y, s.t. A (cid:62) y ≤ n and y ≥ m . Suppose we find the optimal solution x in the dual LP and we want to recover the optimalsolution y in the primal LP. Again by plugging in the complementary slackness theorem, we get atmost n equations from the second part (cid:104) x, n − A (cid:62) y (cid:105) = 0 . Since the optimal y is unique and y hasdimension m , the first part (cid:104) y, Ax − m (cid:105) must contribute at least m − n equations. Note that theseequations have the form y i = 0 , ∀ i ∈ [ m ] s.t. ( Ax ) i − > . This means that the corresponding edges are unnecessary in order to get one maximum matching.As a result, we can reduce the number of edges from m to n , then compute a maximum matchingin (cid:101) O ( n ) space without reading the stream, which can be done by simply checking all the at most n subsets of edges.However, the classic isolation lemma needs (cid:101) O ( m ) random bits to generate such a perturbedvector b ∈ R m , but storing b in (cid:101) O ( n ) space is inhibited. We bypass this barrier by using [CRS95]’sgeneralized isolation lemma, which only uses (cid:101) O ( n ) random bits in the maximum weight bipartitematching problem. We further implement their algorithm in the streaming model to meet our needs.For the second part, there are related works that consider converting a nearly-optimal solutionto an optimal solution in LP where all coefficients of A, b, c are integers, e.g., [DS08] and [LS14].The basic idea is to randomly perturb the objective vector on a small scale, such that there is aunique minimizer to the LP. (For our application in solving the dual LP, the objective function hasonly n variables, so applying the classic isolation lemma [MVV87] with (cid:101) O ( n ) random bits sufficesfor our use.) Note that the feasible area { x | Ax ≥ b, x ≥ } is a polytope, and all extreme pointson this polytope are integral by totally unimodularity. Therefore, if we get closer enough to thatminimizer, our solution can be rounded into the minimizer. The appendix is organized as follows. In Section A, we define the basic notations in this paper. InSection B, we present the generalized isolation lemma in the semi-streaming model, which will beused to recover the maximum matching from a minimum vertex cover.In Section C, we give some preliminaries for interior point method. In Section D, we present ourstreaming algorithm of interior point method. In Section E, we present the error analysis of interiorpoint method. In section F, we show the running time of our interior point method.In Section G, we give an SDD solver in the streaming model, which is a necessary componentof our interior point method. In section H, we discuss the streaming algorithm for minimum vertexcover. In Section I, we combine all the pieces together to get the final algorithm for maximumweight bipartite matching. In Section J, we complement Section G by providing two reductionsfrom weaker solvers to our final SDD solver. Acknowledgements
The authors would like to thank Sepehr Assadi for introducing this problem, and thank SepehrAssadi, Robert E. Tarjan, Huacheng Yu, Peilin Zhong for helpful discussions.S. Cliff Liu is partially supported by an innovation research grant from Princeton Universityand a gift from Microsoft (to Robert E. Tarjan). Zhao Song is partially supported by SchmidtFoundation and Simons Foundation. Hengjie Zhang is partially supported by NSF CAREER awardCCF-1844887, Columbia Presidential Fellowship.10 eferences [AB19] Sepehr Assadi and Aaron Bernstein. Towards a unified theory of sparsification formatching problems. In Jeremy T. Fineman and Michael Mitzenmacher, editors, , volume 69, pages 11:1–11:20, 2019.[ABB +
19] Sepehr Assadi, MohammadHossein Bateni, Aaron Bernstein, Vahab Mirrokni, and CliffStein. Coresets meet EDCS: algorithms for matching and vertex cover on massivegraphs. In
Proceedings of the Thirtieth Annual ACM-SIAM Symposium on DiscreteAlgorithms , pages 1616–1635. SIAM, 2019.[AG11] Kook Jin Ahn and Sudipto Guha. Linear programming in the semi-streaming modelwith application to the maximum matching problem. In
ICALP , pages 526–538.Springer, 2011.[AG18] Kook Jin Ahn and Sudipto Guha. Access to data and number of iterations: Dual primalalgorithms for maximum matching under resource constraints.
ACM Transactions onParallel Computing (TOPC) , 4(4):1–40, 2018.[AGM12a] Kook Jin Ahn, Sudipto Guha, and Andrew McGregor. Analyzing graph structure vialinear measurements. In
Proceedings of the twenty-third annual ACM-SIAM symposiumon Discrete Algorithms (SODA) , pages 459–467. SIAM, 2012.[AGM12b] Kook Jin Ahn, Sudipto Guha, and Andrew McGregor. Graph sketches: sparsification,spanners, and subgraphs. In
Proceedings of the 31st ACM SIGMOD-SIGACT-SIGAIsymposium on Principles of Database Systems (PODS) , pages 5–14, 2012.[AKSY20] Sepehr Assadi, Gillat Kol, Raghuvansh R. Saxena, and Huacheng Yu. Multi-pass graphstreaming lower bounds for cycle counting, max-cut, matching size, and other problems.In
FOCS , 2020.[ALT20] Sepehr Assadi, S. Cliff Liu, and Robert E. Tarjan. An auction algorithm for bipartitematching in streaming and massively parallel computation models. In
Manuscript , 2020.[Ans00] Kurt M Anstreicher. The volumetric barrier for semidefinite programming.
Mathematicsof Operations Research , 25(3):365–380, 2000.[AR20] Sepehr Assadi and Ran Raz. Near-quadratic lower bounds for two-pass graph streamingalgorithms. In
FOCS . https://arxiv.org/pdf/2009.01161.pdf , 2020.[Ber20] Aaron Bernstein. Improved bounds for matching in random-order streams. In ICALP ,2020.[BLN +
20] Jan van den Brand, Yin Tat Lee, Danupon Nanongkai, Richard Peng, ThatchapholSaranurak, Aaron Sidford, Zhao Song, and Di Wang. Bipartite matching in nearly-linear time on moderately dense graphs. In
FOCS , 2020.[BLSS20] Jan van den Brand, Yin Tat Lee, Aaron Sidford, and Zhao Song. Solving tall denselinear programs in nearly linear time. In
STOC , 2020.11CKM +
14] Michael B Cohen, Rasmus Kyng, Gary L Miller, Jakub W Pachocki, Richard Peng,Anup B Rao, and Shen Chen Xu. Solving sdd linear systems in nearly m log / n time. In Proceedings of the forty-sixth annual ACM symposium on Theory of computing(STOC) ,pages 343–352, 2014.[CKST19] Charles Carlson, Alexandra Kolla, Nikhil Srivastava, and Luca Trevisan. Optimal lowerbounds for sketching graph cuts. In
Proceedings of the Thirtieth Annual ACM-SIAMSymposium on Discrete Algorithms , pages 2565–2569. SIAM, 2019.[CLS19] Michael B Cohen, Yin Tat Lee, and Zhao Song. Solving linear programs in the currentmatrix multiplication time. In
Proceedings of the 51st Annual ACM Symposium onTheory of Computing (STOC) , 2019.[CRS95] Suresh Chari, Pankaj Rohatgi, and Aravind Srinivasan. Randomness-optimal uniqueelement isolation with applications to perfect matching and related problems.
SIAMJournal on Computing , 24(5):1036–1050, 1995.[DJK13] Nikhil R Devanur, Kamal Jain, and Robert D Kleinberg. Randomized primal-dualanalysis of ranking for online bipartite matching. In
Proceedings of the twenty-fourthannual ACM-SIAM symposium on Discrete algorithms , pages 101–107. SIAM, 2013.[DNO14] Shahar Dobzinski, Noam Nisan, and Sigal Oren. Economic efficiency requires interac-tion. In
Proceedings of the forty-sixth annual ACM symposium on Theory of computing(STOC) , pages 233–242, 2014.[DS08] Samuel I Daitch and Daniel A Spielman. Faster approximate lossy generalized flow viainterior point algorithms. In
Proceedings of the fortieth annual ACM symposium onTheory of computing (STOC) , pages 451–460, 2008.[EKMS12] Sebastian Eggert, Lasse Kliemann, Peter Munstermann, and Anand Srivastav. Bipartitematching in the semi-streaming model.
Algorithmica , 63(1-2):490–508, 2012.[FHM +
20] Alireza Farhadi, Mohammad Taghi Hajiaghayi, Tung Mah, Anup Rao, and Ryan ARossi. Approximate maximum matching in random streams. In
Proceedings of the Four-teenth Annual ACM-SIAM Symposium on Discrete Algorithms (SODA) , pages 1773–1785. SIAM, 2020.[FKM +
04] Joan Feigenbaum, Sampath Kannan, Andrew McGregor, Siddharth Suri, and JianZhang. On graph problems in a semi-streaming model. In
ICALP , pages 531–543.Springer, 2004.[FKM +
05] Joan Feigenbaum, Sampath Kannan, Andrew McGregor, Siddharth Suri, and JianZhang. On graph problems in a semi-streaming model.
Departmental Papers (CIS) ,page 236, 2005.[GKK12] Ashish Goel, Michael Kapralov, and Sanjeev Khanna. On the communication andstreaming complexity of maximum bipartite matching. In
Proceedings of the twenty-third annual ACM-SIAM symposium on Discrete Algorithms (SODA) , pages 468–485.SIAM, 2012.[HK73] John E Hopcroft and Richard M Karp. An n / algorithm for maximum matchings inbipartite graphs. SIAM Journal on computing , 2(4):225–231, 1973.12HT56] Isidore Heller and CB Tompkins. An extension of a theorem of dantzigâĂŹs.
Linearinequalities and related systems , 38:247–254, 1956.[JKL +
20] Haotian Jiang, Tarun Kathuria, Yin Tat Lee, Swati Padmanabhan, and Zhao Song. Afaster interior point method for semidefinite programming. In
FOCS , 2020.[JSWZ20] Shunhua Jiang, Zhao Song, Omri Weinstein, and Hengjie Zhang. Faster dynamic matrixinverse for faster lps. arXiv preprint arXiv:2004.07470 , 2020.[Kap13] Michael Kapralov. Better bounds for matchings in the streaming model. In
Proceedingsof the twenty-fourth annual ACM-SIAM symposium on Discrete algorithms , pages 1679–1697. SIAM, 2013.[Kar84] Narendra Karmarkar. A new polynomial-time algorithm for linear programming. In
Proceedings of the sixteenth annual ACM symposium on Theory of computing(STOC) ,pages 302–311. ACM, 1984.[KK15] Dmitry Kogan and Robert Krauthgamer. Sketching cuts in graphs and hypergraphs.In
Proceedings of the 2015 Conference on Innovations in Theoretical Computer Science ,pages 367–376, 2015.[KKS14a] Michael Kapralov, Sanjeev Khanna, and Madhu Sudan. Approximating matching sizefrom random streams. In
Proceedings of the twenty-fifth annual ACM-SIAM symposiumon Discrete algorithms (SODA) , pages 734–751. SIAM, 2014.[KKS14b] Michael Kapralov, Sanjeev Khanna, and Madhu Sudan. Streaming lower bounds for ap-proximating max-cut. In
Proceedings of the twenty-sixth annual ACM-SIAM symposiumon Discrete algorithms , pages 1263–1282. SIAM, 2014.[KKSV17] Michael Kapralov, Sanjeev Khanna, Madhu Sudan, and Ameya Velingker. (1+ Ω (1))-approximation to max-cut requires linear space. In Proceedings of the Twenty-EighthAnnual ACM-SIAM Symposium on Discrete Algorithms , pages 1703–1722. SIAM, 2017.[KL11] Jonathan A Kelner and Alex Levin. Spectral sparsification in the semi-streaming set-ting. In
STACS , 2011.[KLM +
17] Michael Kapralov, Yin Tat Lee, Cameron Musco, Christopher Musco, and AaronSidford. Single pass spectral sparsification in dynamic streams.
SIAM J. Comput. ,46(1):456–477, 2017.[KMM +
19] Michael Kapralov, Aida Mousavifar, Cameron Musco, Christopher Musco, and NavidNouri. Faster spectral sparsification in dynamic streams. In arXiv preprint . https://arxiv.org/pdf/1903.12165.pdf , 2019.[KMP10] Ioannis Koutis, Gary L. Miller, and Richard Peng. Approaching optimality for solvingsdd linear systems. In FOCS , pages 235–244, 2010.[KMP11] Ioannis Koutis, Gary L Miller, and Richard Peng. A nearly- m log n time solver forsdd linear systems. In , pages 590–598, 2011.[KNST19] Michael Kapralov, Navid Nouri, Aaron Sidford, and Jakab Tardos. Dynamic streamingspectral sparsification in nearly linear time and space. In arXiv preprint . https://arxiv.org/pdf/1903.12150.pdf , 2019.13KOSZ13] Jonathan A Kelner, Lorenzo Orecchia, Aaron Sidford, and Zeyuan Allen Zhu. A simple,combinatorial algorithm for solving sdd systems in nearly-linear time. In Proceedings ofthe forty-fifth annual ACM symposium on Theory of computing (STOC) , pages 911–920,2013.[KS16] Rasmus Kyng and Sushant Sachdeva. Approximate gaussian elimination for laplacians-fast, sparse, and simple. In , pages 573–582. IEEE, 2016.[KW14] Michael Kapralov and David Woodruff. Spanners and sparsifiers in dynamic streams.In
Proceedings of the 2014 ACM symposium on Principles of distributed computing(PODS) , pages 272–281, 2014.[LJS19] Yang P Liu, Arun Jambulapati, and Aaron Sidford. Parallel reachability in almost linearwork and square root depth. In , pages 1664–1686. IEEE, 2019.[LS13] Yin Tat Lee and Aaron Sidford. Path finding ii: An (cid:101) O ( m √ n ) algorithm for the minimumcost flow problem. arXiv preprint arXiv:1312.6713 , 2013.[LS14] Yin Tat Lee and Aaron Sidford. Path finding methods for linear programming: Solvinglinear programs in O ( √ rank ) iterations and faster algorithms for maximum flow. In , pages 424–433. IEEE, 2014.[LS15] Yin Tat Lee and Aaron Sidford. Efficient inverse maintenance and faster algorithmsfor linear programming. In , pages 230–249. IEEE, 2015.[LS20] Yang P Liu and Aaron Sidford. Faster divergence maximization for faster maximumflow. In FOCS , 2020.[LSZ19] Yin Tat Lee, Zhao Song, and Qiuyi Zhang. Solving empirical risk minimization in thecurrent matrix multiplication time. In
COLT , 2019.[Mad13] Aleksander Madry. Navigating central path with electrical flows: From flows to match-ings, and back. In , pages 253–262. IEEE, 2013.[MS04] Marcin Mucha and Piotr Sankowski. Maximum matchings via gaussian elimination. In , pages 248–255.IEEE, 2004.[MVV87] Ketan Mulmuley, Umesh V Vazirani, and Vijay V Vazirani. Matching is as easy asmatrix inversion. In
Proceedings of the nineteenth annual ACM symposium on Theoryof computing , pages 345–354, 1987.[NN89] Yu Nesterov and Arkadi Nemirovsky. Self-concordant functions and polynomial-timemethods in convex programming.
Report, Central Economic and Mathematic Institute,USSR Acad. Sci , 1989.[NN92] Yurii Nesterov and Arkadi Nemirovski. Conic formulation of a convex programmingproblem and duality.
Optimization Methods and Software , 1(2):95–115, 1992.14NN94] Yurii Nesterov and Arkadi Nemirovski.
Interior-point polynomial algorithms in convexprogramming , volume 13. Siam, 1994.[NY19] Jelani Nelson and Huacheng Yu. Optimal lower bounds for distributed and stream-ing spanning forest computation. In
Proceedings of the Thirtieth Annual ACM-SIAMSymposium on Discrete Algorithms , pages 1844–1860. SIAM, 2019.[PS98] Christos H Papadimitriou and Kenneth Steiglitz.
Combinatorial optimization: algo-rithms and complexity . Courier Corporation, 1998.[PS14] Richard Peng and Daniel A Spielman. An efficient parallel solver for sdd linear sys-tems. In
Proceedings of the forty-sixth annual ACM symposium on Theory of computing(STOC) , pages 333–342, 2014.[Ren88] James Renegar. A polynomial-time algorithm, based on newton’s method, for linearprogramming.
Mathematical Programming , 40(1-3):59–93, 1988.[Ren01] James Renegar.
A mathematical view of interior-point methods in convex optimization .SIAM, 2001.[Son19] Zhao Song.
Matrix Theory : Optimization, Concentration and Algorithms . PhD thesis,The University of Texas at Austin, 2019.[ST04] Daniel A. Spielman and Shang-Hua Teng. Nearly-linear time algorithms forgraph partitioning, graph sparsification, and solving linear systems. In
Pro-ceedings of the 36th Annual ACM Symposium on Theory of Computing (STOC) ,pages 81–90. https://arxiv.org/abs/cs/0310051 ,divided into https://arxiv.org/abs/0809.3232 , https://arxiv.org/abs/0808.4134 , https://arxiv.org/abs/cs/0607105 , 2004.[ST14] Daniel A. Spielman and Shang-Hua Teng. Nearly linear time algorithms for precondi-tioning and solving symmetric, diagonally dominant linear systems. SIAM J. MatrixAnalysis Applications , 35(3):835–885, 2014.[SY20] Zhao Song and Zheng Yu. Sketching as a tool for solving linear programs. In
Manuscript ,2020.[Tar83] Robert Endre Tarjan.
Data structures and network algorithms . SIAM, 1983.[Thr93] W. Thrash. A note on the least common multiples of dense sets of integers. . , 1993.[Vai87] Pravin M Vaidya. An algorithm for linear programming which requires O ((( m + n ) n +( m + n ) . n ) L ) arithmetic operations. In FOCS . IEEE, 1987.[Vai89] Pravin M Vaidya. Speeding-up linear programming using fast matrix multiplication. In
FOCS . IEEE, 1989.[Yu20] Huacheng Yu. Tight distributed sketching lower bound for connectivity. In arXivpreprint . https://arxiv.org/pdf/2007.12323.pdf , 2020.15 Notations
Standard notations
For a positive integer n , we denote [ n ] = { , , · · · , n } .We use E [] for expectation and Pr[] for probability.For a positive integer n , we use I n to denote the identity matrix of size n × n .For a vector v ∈ R n , we use the standard definition of (cid:96) p norms: ∀ p ≥ , (cid:107) v (cid:107) p = ( (cid:80) ni =1 | v i | p ) /p .Specially, (cid:107) v (cid:107) ∞ = max i ∈ [ n ] | v i | . We use (cid:107) v (cid:107) to denote the number of nonzero entries in vector v .We use supp( v ) to denote the support of vector v .We use n to denote a length- n vector where every entry is . We use m × n to denote a m × n matrix where each entry is . Similarly, we use the notation n and m × n . Matrix operators
For a square matrix A , we use tr[ A ] to denote its trace. For a square and fullrank matrix A , we use A − denote the true inverse of A . For a matrix A , we use A † to denote itspseudo inverse. We say a square matrix A is positive definite, if for all x , x (cid:62) Ax > . We say asquare matrix A is positive semi-definite, if for all x , x (cid:62) Ax ≥ . We use (cid:23) and (cid:22) to denote thep.s.d. ordering. For example, we say A (cid:23) B , if x (cid:62) Ax ≥ x (cid:62) Bx, ∀ x . Matrix norms
For a matrix A , we use (cid:107) A (cid:107) to denote its entry-wise (cid:96) norm, i.e., (cid:107) A (cid:107) = (cid:80) i,j | A i,j | . We use (cid:107) A (cid:107) F to denote its Frobenius norm (cid:107) A (cid:107) F = ( (cid:80) i,j A i,j ) / . We use (cid:107) A (cid:107) todenote its spectral/operator norm. Matrix approximation
Let
A, B ∈ R n × n be psd matrix. Let (cid:15) ∈ (0 , . We say A ≈ (cid:15) B if (1 − (cid:15) ) x (cid:62) Ax ≤ x (cid:62) Bx ≤ (1 + (cid:15) ) x (cid:62) Ax, ∀ x ∈ R n Note that if we have A ≈ (cid:15) B , then (1 − (cid:15) ) (cid:107) x (cid:107) A ≤ (cid:107) x (cid:107) B ≤ (1 + (cid:15) ) (cid:107) x (cid:107) A for all x ∈ R n . The SDD matrix-related definitionsDefinition A.1 (Edge-vertex incident matrix) . Let G = ( V L , V R , E ) be a connected undirectedbipartite graph. The (unsigned) edge-vertex incidence matrix is denoted as follows B ( e, v ) = (cid:40) , if v incident to e ;0 , otherwise . Definition A.2 (signed edge-vertex incident matrix) . Let G = ( V L , V R , E ) be a connected directedbipartite graph where all edges orientate from V L to V R . The signed edge-vertex incidence matrix is denoted as follows B ( e, v ) = +1 , if v incident to e and v ∈ V L ; − , if v incident to e and v ∈ V R ;0 , otherwise . Definition A.3 (SDDM, SDD matrix) . A square matrix A is weakly diagonally dominant if A i,i ≥ (cid:80) j (cid:54) = i | A i,j | for i , and is strictly diagonally dominant if A i,i > (cid:80) j (cid:54) = i | A i,j | for i . A matrix A isSDD if it is symmetric and weakly diagonally dominant, and is SDD if it is symmetric and strictlydiagonally dominant. A matrix A is SDDM if it is SDD and A i,j ≤ for all i (cid:54) = j . An SDDM matrix is SDDM if it is strictly diagonally dominant. Fact A.4.
An SDDM matrix must be positive semi-definite. If an SDDM matrix has zero row-sums, then it is a Laplacian matrix. If an SDDM matrix has at least one positive row-sum, then itis an SDDM matrix and positive definite. it complexity Given a linear programming min x ∈ R n c (cid:62) x (4)s.t. Ax ≥ b, where A ∈ Z m × n , b ∈ Z m , c ∈ Z n all having integer coefficient.The bit complexity L is defined as L := log( m ) + log(1 + d max ( A )) + log(1 + max {(cid:107) c (cid:107) ∞ , (cid:107) b (cid:107) ∞ } ) , where d max ( A ) denotes the largest absolute value of the determinant of a square sub-matrix of A .It is well known that poly( L ) -bit precision is sufficient to implement an IPM (e.g., see [DS08] andthe references therein). This is because the absolute values of all intermediate arithmetic resultsare within [2 − poly( L ) , poly( L ) ] , and the errors in all the approximations are at least / poly( n ) .Therefore, truncating all the arithmetic results to poly( L ) bits for some sufficiently large polynomialpreserves all the error parameters and thus the same analysis holds.We will need the following tools in our later analysis. Lemma A.5 ([HT56]) . Let A ∈ R m × n be the unsigned edge-vertex incident matrix of a bipartitegraph G = ( V L , V R , E ) . Let S := { Ax ≤ b | x ≥ m } , S (cid:48) := { A (cid:62) y ≤ b (cid:48) | y ≥ n } , where b ∈ Z m , b (cid:48) ∈ Z n . Then both A and A (cid:62) are totally unimodular, i.e., all square submatrices of them havedeterminants of , , − , and all extreme points of S and S (cid:48) are integral. B Isolation lemma in the streaming model
The goal of this section is to prove Lemma B.1.Before we prove, we give a simple notation here.For a vector w ∈ Z n and a set S ⊆ [ n ] , we denote w S := (cid:80) i ∈ S w i . We can define w S := (cid:80) i ∈ S w i similarly when w i : Z t → Z is a function. Lemma B.1 (Streaming implementation of isolation lemma) . Let n, F , Z, w be define as in Lemma B.2.The Algorithm 2 in Lemma B.2 can be implemented into such an oracle I : I can output w i givenany i ∈ [ n ] . Furthermore, I runs in O (log( Z ) + log( n )) space.Proof. Our I is the streaming implementation of Algorithm 3. It is easy to see that after runningthe procedure Initialize , the procedure
Query ( i ) will output w i given any i ∈ [ n ] .This oracle stores m, Z ∈ N and r ∈ N t in memory. Note that m, Z can be stored in O (log( Z ) +log( n )) bits, and r can be stored in O ( t · log n ) = O ( (cid:100) log( m ) / log( n ) (cid:101) · log( n )) = O (log( Z ) + log( n )) bits. And it is easy to see in Query , all computation can be done within O (log( Z ) + log( n )) space. B.1 Isolation lemma
We state the generalized isolating lemma from previous work [CRS95].
Lemma B.2 (Generalized isolating lemma [CRS95]) . Fix n ∈ N . Fix an unknown family F ⊆ [ n ] .Let Z denote a positive integer such that Z ≥ |F | , there exists an algorithm (Algorithm 2) that uses O (log( Z ) + log( n )) random bits to output a vector w ∈ [0 , n ] n , such that with probability at least / , there is a unique set S ∗ ∈ F that has minimum weight w S ∗ . roof. The proof is already done in [CRS95]. For the completeness, we rewrite their proof here. ByLemma B.4, with probability at least / , all sets S ∈ F has distinct w (2) S . Under the event of wesuccess, by Lemma B.5, we get that all sets S ∈ F has distinct w (3) S . Then by Lemma B.6, we getthat with probability at least / , the output w will give a unique minimum set in F . Since thetwo events of success are independent, the final success probability is at least / . B.2 Algorithms
Algorithm 2
Conceptual implementation of the isolation lemma. Algorithm 3 is the implementa-tion of this algorithm in streaming model. procedure Isolation ( n, Z ∈ N ) (cid:46) Z ≥ |F | , Lemma B.2 w (1) i ← i , ∀ i ∈ [ n ] (cid:46) w (1) ∈ N n Choose m uniformly at random from { , , · · · , (2 nZ ) } . (cid:46) m ≤ n Z For each i , define w (2) i ← w (1) i mod m . (cid:46) w (2) ∈ [ m ] n t ← (cid:100) log( m ) / log( n ) (cid:101) for i = 1 → n do b i,t − , · · · , b i, , b i, ← w (2) i (cid:46) Write w (2) i in base n . b i,j ∈ [ n ] are digits. (cid:46) Note that t is an upper bound on the length w (3) i ( y , · · · , y t − ) ← (cid:80) t − j =0 b i,j · y j (cid:46) w (3) i : Z t → Z is a linear form. end for Choose r , · · · , r t − uniformly and independently at random from { , , · · · , n } . w (4) i ← w (3) i ( r , · · · , r t − ) , ∀ i ∈ [ n ] return w (4) ∈ N n . end procedure B.3 Proof of uniqueness: step 1
The following lemma is from [Thr93].
Lemma B.3 (Step 1, [Thr93]) . Let L ≥ and let S be any subset of { , · · · , L } such that | S | ≥ L . Then, the least common multiple of the elements in S exceeds L . B.4 Proof of uniqueness: step 2
Lemma B.4 (Step 2) . With probability at least / , all sets S ∈ F have distinct weights w (2) S .Proof. We write F = { S , · · · , S k } where k ≤ Z . Define I := (cid:89) ≤ i Isolation (). This algorithm is the streaming implementation ofAlgorithm 2. data structure (cid:46) Lemma B.1 members n, t ∈ N (cid:46) t = O (log( Z ) / log( n )) m, Z ∈ N (cid:46) m = O ( n Z ) , Z ≥ |F | r , · · · , r t − ∈ N (cid:46) r i ≤ n end members procedure Initialize ( n, Z ∈ N ) (cid:46) Initialization Choose m ∈ N uniformly at random from { , , · · · , (2 nZ ) } t ← (cid:100) log m/ log n (cid:101) (cid:46) t ∈ N Choose r , · · · , r t − uniformly and independently at random from { , , · · · , n } (cid:46) r ∈ N t end procedure procedure Query ( i ∈ [ n ] ) w (1) ← i (cid:46) w (1) ∈ N , w (1) ≤ n w (2) ← w (1) i mod m (cid:46) w (2) ∈ N , w (2) ≤ n b t − , · · · , b , b ← w (2) (cid:46) b j ∈ [ n ] , ∀ j ∈ [ t ] (cid:46) Write w (2) in base n . Note that t is an upper bound on the length w (4) ← (cid:80) t − j =0 b j · r j return w (4) (cid:46) w (4) ≤ n end procedure end data structure Let L = 2 nZ . Let S = { , · · · , L } be all the possible choices of m . We have that at leasthalf choices of m satisfies I mod m (cid:54) = 0 , since otherwise by applying Lemma B.3, I is at least theleast common multiplier of half numbers of S , i.e., I ≥ L , contradicting with the upper bound on I < nZ .Therefore, with probability at least / (the randomness is over the choice of m ), I mod m (cid:54) = 0 ,which means that all sets S ∈ F have distinct weight w (2) S by our definition of w (2) i = w (1) i mod m (Line 4, Algorithm 2). B.5 Proof of uniqueness: step 3 Lemma B.5 (Step 3) . If w (2) S are distinct for all S ∈ F , then the linear form w (3) S are all distinctfor all S ∈ F .Proof. Use proof by contradiction. Suppose there exists two distinct sets S , S ∈ F that w (3) S = w (3) S . Let y i = n i , ∀ i ∈ [ t ] . We will have w (2) S = w (3) S ( y , · · · , y t − ) = w (3) S ( y , · · · , y t − ) = w (2) S by definition of w (3) i (Line 9), which is a contradict to our assumption that all w (2) S are distinct. B.6 Proof of uniqueness: step 4 Lemma B.6 (Step 4) . Let C be any collection of distinct linear forms over at most t variables y = y , · · · , y t − with coefficients in { , , · · · , n − } . Choose a random r = r , · · · , r t − by assigning ach r i uniformly and independently from [ n · t ] . Then in the assignment y = r there will be a uniquelinear form with minimum value, with probability at least / .Proof. We call a variable y i to be singular under an assignment r ∈ Z t if there exists two minimumlinear forms in C under assignment r . Then an assignment r gives unique minimum linear form ifand only if no variable y i is singular under this assignment. We will calculate the probability of y i being singular under random assignment r and then take union bound over every y i .For each y i , fix all r , · · · , r i − , r i +1 , · · · , r t − = a , · · · , a i − , a i +1 , · · · , a t − other than r i . Now,every linear form f under this partial assignment can be written as a f + b f · y i with b f < n . Wesplit C into n classes C , · · · , C n − where C j contains all linear forms with b f = j . Let p j be theminimum a f among all linear forms in C j . According to the definition of singular, y i is singular onassignment r i if and only if the minimum value in the list { p , p + r i , p + 2 r i , · · · , p n − + ( n − · r i } is not unique, which is upper bounded by the probability that the elements in the list has a collision.Since every pair of elements in the list can have at most one choice of r i such that they are equal,we have Pr r i [ y i is singular | r j = a j , ∀ j (cid:54) = i ] ≤ Pr r i [ ∃ l, m s.t. p l + l · r i = p m + m · r i ] ≤ (cid:88) l (cid:54) = m Pr r i [ p l + l · r i = p m + m · r i ] ≤ (cid:18) n (cid:19) · n t ≤ t . Finally, by a union bound on the events that every y i is not singular, the conclusion follows withprobability at least − t · t = 1 / . C Preliminary for IPM Let’s consider the linear programming min x ∈ R n c (cid:62) x, s . t . Ax ≥ b where A ∈ R m × n , b ∈ R m , c ∈ R n .We denote a (cid:62) , · · · , a (cid:62) m as the row vectors of matrix A ∈ R m × n . C.1 Definitions In this section, we give definitions. Definition C.1 (Barrier function) . Define the logarithmic barrier function F ( x ) : R n → R asfollows: F ( x ) := − (cid:88) i ∈ [ m ] ln( a (cid:62) i x − b i ) . (5) Definition C.2 (Feasible solution) . For any x ∈ R n , we say x is feasible if for all i ∈ [ m ] , a (cid:62) i x > b i . tatement Comments Def. C.1 Barrier functionDef. C.2 Feasible solutionDef. C.3 Slack variablesDef. C.4 Gradient and Hessian with respect to barrier functionDef. C.5 Perturbed objective functionDef. C.6 Newton stepDef. C.7 Φ t , depends on t Def. C.8 Ψ , doesn’t depend on t Table 1: IPM related definitions Definition C.3 (Slack) . For simplicity, we also define the slack s i ( x ) := a (cid:62) i x − b i ∈ R for all x ∈ R n and i ∈ [ m ] . Definition C.4 (Gradient and Hessian of barrier function) . We define the gradient g ( x ) : R n → R n and Hessian H ( x ) : R n → R n × n of the barrier function F ( x ) as follows: g ( x ) := ∇ F ( x ) = − (cid:88) i ∈ [ m ] a i s i ( x ) ∈ R n (6) H ( x ) := ∇ F ( x ) = (cid:88) i ∈ [ m ] a i a (cid:62) i s i ( x ) ∈ R n × n . (7) Definition C.5 (Perturbed objective function) . Define the perturbed objective function f t : R n → R , where t > is a parameter: f t ( x ) := t · c (cid:62) x + F ( x ) . (8) Definition C.6 (Newton step) . Given a scalar t > and a vector c ∈ R n , define the Newton step at a feasible point x ∈ R n to be δ x := − ( H ( x )) † · ∇ f t ( x ) = − ( H ( x )) † · ( t · c + ∇ F ( x )) = − ( H ( x )) † · ( t · c + g ( x )) . (9) Definition C.7 (Definition of potential Φ ) . Given t > and feasible x, y ∈ R , we define function Φ t : R n × R n → R : Φ t ( x, y ) := (cid:107)∇ f t ( x ) (cid:107) H ( y ) − . Definition C.8 (Helper of potential function Ψ ) . Given feasible x, y ∈ R , we define function Ψ : R n × R n → R : Ψ( x, y ) := (cid:107) g ( x ) (cid:107) H ( y ) − . C.2 Approximation tools from previous work Lemma C.9 (Hessian approximation, [Ren01]) . Let f be a self-concordant function with domain D f . Define H ( x ) := ∇ f ( x ) . For any two feasible point x, z ∈ D f , if (cid:107) x − z (cid:107) H ( x ) ≤ / , then wehave (1 − (cid:107) x − z (cid:107) H ( x ) ) · H ( z ) (cid:22) H ( x ) (cid:22) (1 − (cid:107) x − z (cid:107) H ( x ) ) − · H ( z ) . emma C.10 (Perturbed linear system [BLSS20]) . Let H ∈ R d × d be a symmetric positive definitematrix. Let (cid:15) ∈ (0 , / . Let b, v ∈ R d such that (cid:107) v (cid:107) H − ≤ (cid:15) (cid:107) b (cid:107) H − . Let y = H − ( b + v ) ∈ R d .Then there is a symmetric matrix ∆ ∈ R d × d , such that y = ( H + ∆) − b ∈ R d and (1 + 20 (cid:15) ) − H (cid:22) H + ∆ (cid:22) (1 + 20 (cid:15) ) H Lemma C.11 ([Ren01]) . Let f be a self-concordant function with domain D f . For some x ∈ D f ,define H ( x ) := ∇ f ( x ) and g ( x ) := ∇ f ( x ) . Define n ( x ) := H ( x ) − g ( x ) as the Newton step at x .If (cid:107) n ( x ) (cid:107) H ( x ) ≤ / , then f has a minimizer z that (cid:107) z − x (cid:107) H ( x ) ≤ (cid:107) n ( x ) (cid:107) H ( x ) + 3 (cid:107) n ( x ) (cid:107) H ( x ) (1 − (cid:107) n ( x ) (cid:107) H ( x ) ) Lemma C.12 ([Ren01]) . Let f be a self-concordant function with domain D f . Let c ∈ R n be theobjective vector. Define OPT := min x ∈ D f (cid:104) c, x (cid:105) . Denote H ( x ) := ∇ f ( x ) . For any x, y ∈ D f , wehave (cid:104) c, y (cid:105) − OPT (cid:104) c, x (cid:105) − OPT ≤ (cid:107) y − x (cid:107) H ( x ) . Proof. Define c x := H ( x ) − c . Define B ( x ) ⊆ R n be the ellipsoid B ( x ) := { y ∈ R n | (cid:107) y − x (cid:107) H ( x ) < } . Since f is self-concordant, we know B ( x ) ⊆ D f . Therefore for all ≤ t < / (cid:107) c x (cid:107) H ( x ) , x − t · c x ∈ D f . This means (cid:104) c, x (cid:105) − t · (cid:107) c x (cid:107) H ( x ) ≥ OPT . By letting t → / (cid:107) c x (cid:107) H ( x ) , we get (cid:107) c x (cid:107) H ( x ) ≤ (cid:104) c, x (cid:105) − OPT . Hence (cid:104) c, y (cid:105) − OPT (cid:104) c, x (cid:105) − OPT = 1 + (cid:104) c x , y − x (cid:105) H ( x ) (cid:104) c, x (cid:105) − OPT ≤ (cid:107) c x (cid:107) H ( x ) (cid:107) y − x (cid:107) H ( x ) (cid:104) c, x (cid:105) − OPT ≤ (cid:107) y − x (cid:107) H ( x ) , where the first step is by the definition of inner product that for some PSD matrix S ∈ R n × n , (cid:104) x, y (cid:105) S := (cid:104) Sx, y (cid:105) , the second step is by Cauchy-Schwarz inequality. Lemma C.13 ( x ∗ t is nearly-optimal to x ∗ [Ren01]) . Given linear programming min x ∈ R n ,Ax ≥ b c (cid:62) x, where A ∈ R m × n , b ∈ R m , c ∈ R n . Let x ∗ ∈ R n be the optimal solution of the above LP. For every t > , let x ∗ t ∈ R n be the x that minimizes f t ( x ) (Def. C.5), it must be that c (cid:62) x ∗ t − c (cid:62) x ∗ < m/t. roof. Since x ∗ t ∈ R n is the minimizer of f t ( x ) , we have ∇ f t ( x ∗ t ) = t · c + g ( x ∗ t ) = n . By g ( x ∗ t ) = − t · c ∈ R n , we get c (cid:62) x ∗ t − c (cid:62) x ∗ = ( g ( x ∗ t )) (cid:62) · ( x ∗ − x ∗ t ) /t. (10)Note that x ∗ , x ∗ t ∈ R n are both feasible, so we have ( g ( x ∗ t )) (cid:62) · ( x ∗ − x ∗ t ) = (cid:88) i ∈ [ m ] a (cid:62) i ( x ∗ − x ∗ t ) s i ( x ∗ t )= (cid:88) i ∈ [ m ] s i ( x ∗ t ) − s i ( x ∗ ) s i ( x ∗ t )= m − (cid:88) i ∈ [ m ] s i ( x ∗ ) s i ( x ∗ t ) < m. where the first step is by Definition of g ( x ) (Def. C.4), the second step is by Definition of s i ( x ) (Def. C.3) that s i ( x ) = a (cid:62) i x − b .Combining with (Eq. (10)), the lemma follows. Lemma C.14 (Nearly-optimal output: value version [Ren01]) . Given linear programming min x ∈ R n ,Ax ≥ b c (cid:62) x, where A ∈ R m × n , b ∈ R m , c ∈ R n . Let x ∗ ∈ R n be the optimal solution of the above LP. Let (cid:15) N ∈ (0 , / . If for some t > and x ∈ R n we have Φ t ( x, x ) ≤ (cid:15) N (Def. C.7), then we have c (cid:62) x − c (cid:62) x ∗ ≤ mt · (1 + 2 (cid:15) N ) . Proof. Denote OPT := c (cid:62) x ∗ as the best value we can get in LP.Let z be the minimizer of f t . From Φ t ( x, x ) ≤ (cid:15) N , we know (cid:107) n ( x ) (cid:107) H ( x ) ≤ (cid:15) N , where n ( x ) := H ( x ) − ∇ f t ( x ) .By Lemma C.11, we have (cid:107) z − x (cid:107) H ( x ) ≤ (cid:15) N . (11)Plug Eq. (11) into Lemma C.12, we get c (cid:62) z − OPT c (cid:62) x − OPT ≤ (cid:15) N . By Lemma C.13, we have c (cid:62) z ≤ OPT + m/t. By combining the above three inequalities, we get c (cid:62) x − OPT ≤ mt · (1 + 2 (cid:15) N ) . Algorithm In this section, we present our IPM algorithm. Algorithm 4 Interior point method procedure InteriorPointMethod ( A, b, c, t start , x start , t final , (cid:15) Φ , o ) (cid:46) Lemma E.1 m, n ← dimensions of A (cid:46) A ∈ R m × n is the input matrix, b ∈ R m , c ∈ R n (cid:46) t start , x start satisfy Φ t start ( x start , x start ) ≤ (cid:15) Φ (cid:46) t final ∈ R is the final goal of t (cid:46) o ∈ { , } . If o is , the algorithm decreases t , otherwise the algorithm increases t (cid:15) x ← / (cid:15) t ← / (100 √ m ) (cid:15) Φ ← (cid:15) Φ (cid:46) (cid:15) Φ ≤ / is given from outside t ← t start x ← x start T ← O ( √ m ) · | log( t final /t start ) | (cid:46) number of iterations for k ← to T do δ x := − H ( x ) − · ∇ f t ( x ) (cid:46) δ x is only used for analysis, not involve computation (cid:46) H ( x ) ∈ R n × n , ∇ f t ( x ) ∈ R n (cid:101) δ x ← StreamLS ( H ( x ) , ∇ f t ( x ) , (cid:15) x · (cid:15) Φ ) (cid:46) Algorithm 5 (cid:46) Compute any (cid:101) δ x that (cid:107) (cid:101) δ x − δ x (cid:107) H ( x ) ≤ (cid:15) x · (cid:107) δ x (cid:107) H ( x ) x new ← x + (cid:101) δ x if o = 0 then t new ← t · (1 − (cid:15) t ) else t new ← t · (1 + (cid:15) t ) end if if ( o = 0 and t < t final ) or ( o = 1 and t > t final ) then break end if t ← t new , x ← x new end for return x end procedure E Error analysis of IPM The goal of this section is to present Lemma E.1.Parameters (cid:15) x (cid:15) t (cid:15) Φ Meaning (cid:107) (cid:101) δ x − δ x (cid:107) H ( x ) ≤ (cid:15) x | t new /t − | = (cid:15) t Φ t ( x, x ) ≤ (cid:15) Φ Value / 200 1 / (100 √ m ) 1 / Table 2: Table of parameters occurs in Algorithm 4.24 emma E.1. Given any matrix A ∈ R m × n , b ∈ R m , c ∈ R n . Let x ∗ ∈ R n be the solution of thefollowing linear programming min x ∈ R n ,Ax ≥ b c (cid:62) x. If InteriorPointMethod is given A, b, c, t start , x start , t final that satisfy Φ t start ( x start , x start ) ≤ (cid:15) Φ (where (cid:15) Φ = 1 / , see Table 2), then it outputs an answer x which is a nearly-optimal solution : c (cid:62) x − c (cid:62) x ∗ ≤ mt final · (1 + 2 (cid:15) Φ ) . Proof. Since our initial point x start and t start satisfy Φ t start ( x start , x start ) ≤ (cid:15) Φ , we can apply Lemma E.3 to get Φ t final ( x, x ) ≤ (cid:15) Φ . Applying Lemma C.14 on t final and x with our choose of (cid:15) Φ = 1 / < / , we get c (cid:62) x − c (cid:62) x ∗ ≤ mt final · (1 + 2 (cid:15) Φ ) . E.1 Assumptions on parameters Assumption E.2. Let (cid:15) x ∈ (0 , / , (cid:15) t ∈ (0 , / √ m ) and (cid:15) Φ ∈ (0 , / denote three fixed param-eters such that the following inqueqality hold. (cid:15) t ) · (cid:15) + 16 · (cid:15) x + √ m · (cid:15) t ≤ (cid:15) Φ . E.2 Bounding potential function Φ The goal of this section is to prove Lemma E.3. Lemma E.3. For each t , let Φ t : R n × R n → R be defined as Definition C.7. Assume Assumption E.2holds. If the input x start , t start satisfies Φ t start ( x start , x start ) ≤ (cid:15) Φ , then for all iteration k ∈ [ T ] , we have Φ t ( k ) ( x ( k ) , x ( k ) ) ≤ (cid:15) Φ . Proof. We use proof by induction. In the basic case where k = 0 , we have t (0) := t start and x (0) := x start so that the condition holds by assumption.When k ≥ , for ease of notation, we define x = x ( k − , t = t ( k − , x new = x ( k ) , t new = t ( k ) anddefine δ x = H ( x ) − ∇ f t ( x ) as in Definition C.6.First, from induction hypothesis, we have Φ t ( x, x ) ≤ (cid:15) Φ . By definition of Φ (Definition C.7),this means (cid:107) δ x (cid:107) H ( x ) ≤ (cid:15) Φ . Next, we will show (cid:107) (cid:101) δ x − δ x (cid:107) H ( x ) ≤ (cid:15) x . 25y calling StreamLS ( H ( x ) , ∇ f t ( x ) , (cid:15) x ) (Line 16, Algorithm 4) and applying Theorem G.9, wehave (cid:107) (cid:101) δ x − δ x (cid:107) H ( x ) ≤ (cid:15) x · (cid:107) δ x (cid:107) H ( x ) . Define b := H ( x ) δ x and v := H ( x )( (cid:101) δ x − δ x ) and write (cid:101) δ x = H − · ( b + v ) . Note that we have (cid:107) v (cid:107) H ( x ) − ≤ (cid:15) x · (cid:107) b (cid:107) H ( x ) − . Apply Lemma C.10 and we get (cid:101) δ x = H − H ( x ) δ x , where H ∈ R n × n is a positive semi-definite matrix that satisfies (1 + 20 (cid:15) x ) − H ( x ) (cid:22) H (cid:22) (1 + 20 (cid:15) x ) H ( x ) . Therefore, we have (cid:107) (cid:101) δ x − δ x (cid:107) H ( x ) = (cid:107) ( H − H − I ) δ x (cid:107) H ( x ) ≤ (cid:15) x (cid:107) δ x (cid:107) H ( x ) = 20 (cid:15) x (cid:15) Φ ≤ (cid:15) x . Finally, we have Φ t new ( x new , x new ) ≤ t new t · Φ t ( x new , x new ) + | t new /t − | · Ψ( x new , x new ) ≤ t new t · Φ t ( x new , x new ) + | t new /t − | · √ m ≤ (1 + (cid:15) t ) · Φ t ( x new , x new ) + √ m · (cid:15) t ≤ (cid:15) t ) · Φ t ( x, x ) + 16 (cid:15) x + √ m · (cid:15) t ≤ (cid:15) t ) · (cid:15) + 16 (cid:15) x + √ m · (cid:15) t ≤ (cid:15) Φ , where the first step is by Lemma E.4, the second step is by Lemma E.5, the fourth step is byLemma E.6 since (cid:15) x + (cid:15) Φ ≤ / , the fifth step is by induction hypothesis, the last step is byAssumption E.2. Section Lemma LHS RHSSection E.2 Lemma E.3 Φ t new ( x + (cid:101) δ x , x + (cid:101) δ x ) Section E.3 Lemma E.4 Φ t new ( x + (cid:101) δ x , x + (cid:101) δ x ) ( t new /t ) · Φ t ( . ) + (1 − t new /t ) · Ψ( . ) Section E.4 Lemma E.5 Ψ( x + (cid:101) δ x , x + (cid:101) δ x ) √ m Section E.5 Lemma E.6 Φ t ( x + (cid:101) δ x , x + (cid:101) δ x ) Φ t ( x, x ) Section E.6 Lemma E.7 Φ t ( x + (cid:101) δ x , x + (cid:101) δ x ) Φ t ( x + δ x , x + (cid:101) δ x ) Section E.7 Lemma E.8 Φ t ( x + δ x , x + (cid:101) δ x ) Φ t ( x + δ x , x + δ x ) Section E.8 Lemma E.9 Φ t ( x + δ x , x + δ x ) Φ t ( x, x ) Table 3: Summary of movement of potential function. E.3 Bounding the movement of t The goal of this section is to prove Lemma E.4. 26 emma E.4. Let Φ and Ψ be defined in Definition C.7 and Definition C.8. Then, for all positive t, t new > and feasible x ∈ R n , we have Φ t new ( x, x ) ≤ t new t · Φ t ( x, x ) + | t new /t − | · Ψ( x, x ) , Proof. We have ∇ f t new ( x ) = ( t new · c + g ( x ))= t new t · ( t · c + g ( x )) + (cid:18) − t new t (cid:19) · g ( x )= t new t · ∇ f t ( x ) + (cid:18) − t new t (cid:19) · g ( x ) , where the first step follows from the definition of f t ( x ) (Definition C.5), the second step followsfrom moving terms, and the last step follows from the definition of f t ( x ) (Definition C.5).Finally, we can upper bound Φ t new ( x, x ) as follows: Φ t new ( x, x ) = (cid:107)∇ f t new ( x ) (cid:107) H ( x ) − ≤ t new t · (cid:107)∇ f t ( x ) (cid:107) H ( x ) − + (cid:18) − t new t (cid:19) · (cid:107) g ( x ) (cid:107) H ( x ) − = t new t · Φ t ( x, x ) + (cid:18) − t new t (cid:19) · Ψ( x, x ) . Thus, we complete the proof. E.4 Upper bounding the potential function The goal of this section is to prove Lemma E.5. Lemma E.5. Let function Ψ : R n × R n → R be defined as Definition C.8. For all feasible x ∈ R n ,we have Ψ( x, x ) ≤ √ m. Proof. Let z := ( H ( x )) − g ( x ) ∈ R n so that Ψ( x, x ) = (cid:107) z (cid:107) H ( x ) .We can upper bound (cid:107) z (cid:107) H ( x ) as follows: (cid:107) z (cid:107) H ( x ) = z (cid:62) H ( x ) z = z (cid:62) g ( x )= (cid:88) i ∈ [ m ] z (cid:62) a i s i ( x ) ≤ √ m · (cid:16) (cid:88) i ∈ [ m ] ( z (cid:62) a i ) s i ( x ) (cid:17) / = √ m · (cid:16) z (cid:62) ( (cid:88) i ∈ [ m ] a i a (cid:62) i s i ( x ) ) z (cid:17) / = √ m · ( z (cid:62) H ( x ) z ) / = √ m · (cid:107) z (cid:107) H ( x ) , g ( x ) = (cid:80) i ∈ [ m ] a i s i ( x ) (Definition C.4), the four step is by Cauchy-Schwarz,and the sixth step is by H ( x ) = (cid:80) i ∈ [ m ] a i a (cid:62) i s i ( x ) (Definition C.4).Therefore, we obtain Ψ( x, x ) = (cid:107) z (cid:107) H ( x ) ≤ √ m. E.5 Move both: final The goal of this section is to prove Lemma E.6 Lemma E.6. For any feasible x ∈ R n , let δ x ∈ R n be defined as Definition C.6. Given (cid:107) (cid:101) δ x − δ x (cid:107) H ( x ) ≤ (cid:15) x and (cid:107) δ x (cid:107) H ( x ) ≤ (cid:15) . If (cid:15) x + (cid:15) ≤ / , then we have Φ t ( x + (cid:101) δ x , x + (cid:101) δ x ) ≤ · Φ t ( x, x ) + 16 (cid:15) x . Proof. The proof is done by showing the following: Φ t ( x + (cid:101) δ x , x + (cid:101) δ x ) ≤ Φ t ( x + δ x , x + (cid:101) δ x ) + 16 (cid:15) x ≤ · Φ t ( x + δ x , x + δ x ) + 16 (cid:15) x ≤ · Φ t ( x, x ) + 16 (cid:15) x , where the first step is by Lemma E.7, the second step follows from Lemma E.8, and the third stepfollows from Lemma E.9. E.6 Move both: part 1 The goal of this section is to prove Lemma E.7. Lemma E.7. Given (cid:107) (cid:101) δ x − δ x (cid:107) H ( x ) ≤ (cid:15) x and (cid:107) δ x (cid:107) H ( x ) ≤ (cid:15) . If (cid:15) x + (cid:15) ≤ / , then we have Φ t ( x + (cid:101) δ x , x + (cid:101) δ x ) ≤ Φ t ( x + δ x , x + (cid:101) δ x ) + 16 (cid:15) x . Proof. We have Φ t ( x + (cid:101) δ x , x + (cid:101) δ x ) − Φ t ( x + δ x , x + (cid:101) δ x )= (cid:107)∇ f t ( x + (cid:101) δ x ) (cid:107) H ( x + (cid:101) δ x ) − − (cid:107)∇ f t ( x + δ x ) (cid:107) H ( x + (cid:101) δ x ) − ≤ (cid:107)∇ f t ( x + (cid:101) δ x ) − ∇ f t ( x + δ x ) (cid:107) H ( x + (cid:101) δ x ) − = (cid:107) g ( x + (cid:101) δ x ) − g ( x + δ x ) (cid:107) H ( x + (cid:101) δ x ) − , (12)where the first step is by the definition of Φ t (Definition C.7), and the second step is by the triangleinequality.Define φ ( t ) := g (cid:0) x + δ x + t · ( (cid:101) δ x − δ x ) (cid:1) , for t ∈ [0 , . Then we have g ( x + (cid:101) δ x ) − g ( x + δ x ) = φ (1) − φ (0) . By mean value theorem, there exists p ∈ [0 , such that φ (1) − φ (0) = φ (cid:48) ( p ) = H ( x + δ x + p · ( (cid:101) δ x − δ x )) · ( (cid:101) δ x − δ x ) . (cid:107) x + δ x + p · ( (cid:101) δ x − δ x ) − x (cid:107) H ( x ) ≤ (cid:107) δ x (cid:107) H ( x ) + p · (cid:107) (cid:101) δ x − δ x (cid:107) H ( x ) ≤ (cid:15) + (cid:15) ≤ / , by Lemma C.9we have H ( x ) (cid:22) H ( x + δ x + p · ( (cid:101) δ x − δ x )) (cid:22) H ( x ) . (13)Since (cid:107) x + (cid:101) δ x − x (cid:107) H ( x ) ≤ (cid:107) δ x (cid:107) H ( x ) + (cid:107) (cid:101) δ x − δ x (cid:107) H ( x ) ≤ (cid:15) x + (cid:15) ≤ / , by Lemma C.9 we have H ( x ) (cid:22) H ( x + (cid:101) δ x ) (cid:22) H ( x ) . (14)Finally, we have (cid:107) g ( x + (cid:101) δ x ) − g ( x + δ x ) (cid:107) H ( x + (cid:101) δ x ) − = (cid:13)(cid:13) H ( x + δ x + p · ( (cid:101) δ x − δ x )) · ( (cid:101) δ x − δ x ) (cid:13)(cid:13) H ( x + (cid:101) δ x ) − ≤ (cid:13)(cid:13) H ( x + δ x + p · ( (cid:101) δ x − δ x )) · ( (cid:101) δ x − δ x ) (cid:13)(cid:13) H ( x ) − ≤ (cid:13)(cid:13)(cid:101) δ x − δ x (cid:13)(cid:13) H ( x ) ≤ (cid:15) x , where the first step is by Eq. (12), the second step is by Eq. (14), the third step is by Eq. (13). E.7 Move both: part 2 The goal of this section is to prove Lemma E.8. Lemma E.8. Given (cid:107) (cid:101) δ x − δ x (cid:107) H ( x ) ≤ (cid:15) x and (cid:107) δ x (cid:107) H ( x ) ≤ (cid:15) . If (cid:15) x + (cid:15) ≤ / , then we have Φ t ( x + δ x , x + (cid:101) δ x ) ≤ · Φ t ( x + δ x , x + δ x ) . Proof. Since (cid:107) x + (cid:101) δ x − x (cid:107) H ( x ) ≤ (cid:107) δ x (cid:107) H ( x ) + (cid:107) (cid:101) δ x − δ x (cid:107) H ( x ) ≤ (cid:15) x + (cid:15) ≤ / , by Lemma C.9 we have H ( x ) (cid:22) H ( x + (cid:101) δ x ) (cid:22) H ( x ) . Since (cid:107) x + δ x − x (cid:107) H ( x ) ≤ (cid:15) ≤ / , by Lemma C.9 we have H ( x ) (cid:22) H ( x + δ x ) (cid:22) H ( x ) . Therefore, Φ t ( x + δ x , x + (cid:101) δ x ) = (cid:107)∇ f t ( x + δ x ) (cid:107) H ( x + (cid:101) δ x ) − ≤ (cid:107)∇ f t ( x + δ x ) (cid:107) H ( x ) − ≤ (cid:107)∇ f t ( x + δ x ) (cid:107) H ( x + δ x ) − = 16Φ t ( x + δ x , x + δ x ) , where the first step is by definition of Φ (Definition C.7), the second step follows from H ( x ) (cid:22) H ( x + (cid:101) δ x ) (cid:22) H ( x ) , the third step follows by H ( x ) (cid:22) H ( x + δ x ) (cid:22) H ( x ) , the last step is bydefinition of Φ (Definition C.7). 29 .8 Move both: part 3 The goal of this section is to prove Lemma E.9. Lemma E.9. Φ t ( x + δ x , x + δ x ) ≤ Φ t ( x, x ) Proof. For simplicity, we define x tmp := x + δ x and z := ( H ( x tmp )) − ∇ f t ( x tmp ) . First, we have ∇ f t ( x tmp ) = t · c + g ( x tmp )= − H ( x ) · δ x − g ( x ) + g ( x tmp )= − (cid:88) i ∈ [ m ] a i a (cid:62) i s i ( x ) δ x + (cid:88) i ∈ [ m ] a i s i ( x ) − (cid:88) i ∈ [ m ] a i s i ( x tmp )= (cid:88) i ∈ [ m ] (cid:18) a i a (cid:62) i s i ( x ) s i ( x tmp ) · δ x − a i a (cid:62) i s i ( x ) · δ x (cid:19) = (cid:88) i ∈ [ m ] a i (cid:0) a (cid:62) i δ x (cid:1) s i ( x ) s i ( x tmp ) , (15)where the first step is by definition of f t (Definition C.5), the second step is by H ( x ) · δ x = − t · c − g ( x ) ,the third step is by definition of g ( x ) and H ( x ) (Definition C.4), and the fourth and last step areby s i ( x tmp ) − s i ( x ) = s i ( x + δ x ) − s i ( x ) = a (cid:62) i ( x + δ x − x ) + ( b − b ) = a (cid:62) i δ x . As a result, (cid:107) z (cid:107) x tmp = z (cid:62) H ( x tmp ) z = z (cid:62) ∇ f t ( x tmp )= (cid:88) i ∈ [ m ] z (cid:62) a i (cid:0) a (cid:62) i δ x (cid:1) s i ( x ) s i ( x tmp ) ≤ (cid:16) (cid:88) i ∈ [ m ] (cid:0) z (cid:62) a i (cid:1) s i ( x tmp ) (cid:17) / · (cid:88) i ∈ [ m ] (cid:0) a (cid:62) i δ x (cid:1) s i ( x ) = ( z (cid:62) H ( x tmp ) z ) / · ( δ (cid:62) x H ( x ) δ x )= (cid:107) z (cid:107) x tmp · (cid:107) δ x (cid:107) x , where the third step is by Eq. (15), the fourth step follows from Cauchy-Schwarz inequality, thefifth step is by definition of H ( x ) (Definition C.4).Therefore, we have that Φ t ( x + δ x , x + δ x ) = (cid:107) z (cid:107) x tmp ≤ (cid:107) δ x (cid:107) x = Φ t ( x, x ) . Running time of IPM The goal of this section is to prove Lemma F.1. Lemma F.1 (Running time of IPM) . Consider Algorithm 4 with input A ∈ R m × n , b ∈ R m , c ∈ R n , x start ∈ R n , t start ∈ R , t final ∈ R . Let x ∈ R n be any feasible point. Define H ( x ) ∈ R n × n and s ( x ) ∈ R m as the Hessian and slack variables (Def. C.4, Def.C.3). Suppose we have the followingconditionsC.1 (cid:80) i ∈ [ m ] a i /s i ( x ) can be computed in one pass in f ( n ) space;C.2 the Hessian H ( x ) ∈ R n × n can be decomposed into H ( x ) = L G ( x ) + D ( x ) , where L G ( x ) ∈ R n × n is the Laplacian of some graph G with edge weight w and D ( x ) ∈ R n × n is diagonal matrix;C.3 we can read the edge-weight pair ( e, w e ) in one pass, and we can read the diagonal of D ( x ) ∈ R n × n in one pass.Then Algorithm 4 can be implemented in the streaming model with (cid:101) O ( n ) + f ( n ) space and (cid:101) O ( √ m ) passes.Proof. Now consider the iterating part (Line 13-28). First we show ∇ f t ( x ) ∈ R n can be computedin one pass.By Definition C.5, ∇ f t ( x ) = c · t − (cid:88) i ∈ [ m ] a i /s i ( x ) , By condition C.1, ∇ f t ( x ) ∈ R n can be computed in one pass in f ( n ) space.Since we have Conditions C.2 and C.3, we are able to first read D in one pass and store itsentries in space O ( n ) , and then apply Theorem G.9 to show (cid:101) δ x can be computed in (cid:101) O (1) pass and (cid:101) O ( n ) space.All other computation can be done in (cid:101) O ( n ) space.Since there are totally T = O ( √ m log( m/(cid:15) ipm )) iterations, the totally number of passes used is (cid:101) O ( √ m ) . Since we care reuse the space for x and t , the total space used is (cid:101) O ( n ) + f ( n ) . G SDD solver in the streaming model In this section, we present an SDDM solver in the streaming model. Later in Section J we reducesthe problem of sovling an SDD system to solving an SDDM system, giving an SDD solver in thestreaming model. G.1 SDD and Laplacian systems We need the definition of a spectral sparsifier, and a streaming algorithm for computing it. Definition G.1 ( δ -spectral sparsifier) . Given a weighted undirected graph G and a parameter δ > ,an edge-reweighted subgraph H of G is an δ -spectral sparsifier of G if (1 − δ ) · x (cid:62) L G x ≤ x (cid:62) L H x ≤ (1 + δ ) · x (cid:62) L G x, ∀ x ∈ R n , We also say L H is an δ -spectral sparsifier of L G . here L G and L H are the Laplacians of G and H , respectively. Lemma G.2 ([KLM + . Let G be a weighted graph and δ ∈ (0 , be a parameter. There existsa streaming algorithm that takes G as input, uses δ − n poly(log n ) space and pass, and outputs aweighted graph H with δ − n poly(log n ) edges such that with probability at least − / poly( n ) , H is a δ -spectral sparsifier of G . We will use the classic SDD solver in the sequential model, which is formally described below. Theorem G.3 ([ST04]) . There is an algorithm which takes input an SDD matrix A , a vector b ∈ R n , and a parameter (cid:15) ∈ (0 , / , if there exists x ∗ ∈ R n such that Ax ∗ = b , then withprobability − / poly( n ) , the algorithm returns an x ∈ R n such that (cid:107) x − x ∗ (cid:107) A ≤ (cid:15) · (cid:107) x ∗ (cid:107) A in nnz( A ) · poly(log n ) · log(1 /(cid:15) ) time. The returned x is called an (cid:15) -approximate solution to the SDD system Ax = b . G.2 The preconditioner To prove that our SDDM solver (Algorithm 5) gives the desired accuracy, we need the concept of a preconditioner (and how to compute the preconditioner of an SDDM matrix).We define preconditioner as follows: Definition G.4 (Preconditioner) . For any positive definite matrix A ∈ R n × n and accuracy param-eter δ > , we say P is a δ -precondioner if (cid:107) P − Ax − x (cid:107) A ≤ δ · (cid:107) x (cid:107) A , ∀ x ∈ R n . Lemma G.5. Let A be an SDDM matrix (Definition A.3) and let A = L G + D where L G is aLaplacian matrix of graph G and D (cid:31) is a diagonal matrix with non-negative entries. For any δ ∈ (0 , / , if H is a δ -spectral sparsifier of G , and we define P := L H + D . Then P satisfies thefollowing two conditions: • P is a (2 δ ) -preconditioner(Definition G.4) of A , • P (cid:31) .Proof. The lemma clearly holds for x = n , so assume not in the following. By Definition G.1, (1 + δ ) x (cid:62) L G x ≥ x (cid:62) L H x ≥ (1 − δ ) x (cid:62) L G x ≥ , then L H (cid:23) as it must be symmetric. Since D (cid:31) , we have that A (cid:31) , P (cid:31) , and (1 + δ ) x (cid:62) ( L G + D ) x ≥ x (cid:62) ( L H + D ) x ≥ (1 − δ ) x (cid:62) ( L G + D ) x > , which is (1 + δ ) A (cid:23) P (cid:23) (1 − δ ) A . So we obtain − δ A − (cid:23) P − (cid:23) 11 + δ A − , which, by δ ∈ (0 , / , implies (1 + 2 δ ) A − (cid:23) P − (cid:23) (1 − δ ) A − . (16) By Fact A.4, such decomposition always exists. A / and right multiply A / on Eq. (16) to get (1 + 2 δ ) I (cid:23) A / P − A / (cid:23) (1 − δ ) I. (17)Subtracting I from Eq. (17) then left multiplying x (cid:62) and right multiplying x , we obtain δx (cid:62) x ≥ x (cid:62) ( A / P − A / − I ) (cid:124) (cid:123)(cid:122) (cid:125) := M x ≥ − δx (cid:62) x. (18)Since Eq. (18) holds for any x ∈ R n and M is symmetric, using spectral theorem, we get δ ≥ λ n ( M ) ≥ λ ( M ) ≥ − δ. Next, we have that (cid:107) M x (cid:107) = max {| λ n ( M ) | , | λ ( M ) |} ≤ δ (cid:107) x (cid:107) . Let y := A − / x . As rank ( A − / ) = n , we get that for any y ∈ R n , δ (cid:107) A / y (cid:107) ≥ (cid:107) M A / y (cid:107) = (cid:107) A / P − Ay − A / y (cid:107) . (19)Finally, rewriting both sides of Eq. (19) by the definition of matrix norm, we obtain δ (cid:107) y (cid:107) A ≥ (cid:107) P − Ay − y (cid:107) A for any y ∈ R n , giving the lemma. G.3 An iterative solver In this section, we present a streaming SDDM solver (Algorithm 5) that takes (cid:101) O ( n ) space and O (log(1 /(cid:15) ) / log log n ) passes for approximately solving SDDM system Ax = b with error parameter (cid:15) ∈ (0 , / . G.4 An iterative solver: the space The goal of this section is to prove Lemma G.6. Lemma G.6. Let A = L G + D ∈ R n × n be an SDDM matrix where L G ∈ R n × n is the Laplacianmatrix of graph G with weight w , and D ∈ R n × n is a diagonal matrix. If we can read all edge-weightpairs ( e, w e ) in pass, and if we can read the diagonal of matrix D in pass, then StreamLS ( A, b, (cid:15) ) takes O (max { , log(1 /(cid:15) ) / log log n } ) passes and (cid:101) O ( n ) space. Proof. By Lemma G.2 and δ = 1 / log n , the δ -spectral sparsifier H has (cid:101) O ( n ) edges and can becomputed in pass and (cid:101) O ( n ) space with probability − / poly( n ) . Therefore, computing the δ -preconditioner P also takes pass and (cid:101) O ( n ) space. Note that P (cid:31) and thus the system P y = r t always has a solution.It remains to prove that each iteration of Lines 9-14 takes pass and (cid:101) O ( n ) space. Since anyiteration t only needs the vectors subscripted by t and t + 1 , we can reuse the space such that thetotal space is (cid:101) O ( n ) . The algorithm can be implemented in the standard RAM model with finite precision by introducing an (cid:101) O (1) factor in the encoding, which translates to a multiplicative factor of (cid:101) O (1) in the space [ST14]. lgorithm 5 StreamLS ( A = L G + D, b, (cid:15) ) . Return an (cid:15) -approx solution to A − b procedure StreamLS ( A = L G + D ∈ R n × n , b ∈ R n , (cid:15) ∈ R ) (cid:46) Note that L G is from input stream. D is diagonal matrix stored. δ ← / log n (cid:101) δ ← δ/ T ← O (max { , (log(1 /(cid:15) )) / log log n } ) (cid:46) Number of iterations Compute and store a δ -spectral sparsifier H of L G (cid:46) Use one pass, Lemma G.2 Compute a δ -preconditioner of A as P := L H + D r ← b , x ← n (cid:46) r , x ∈ R n for t ← to T − do Compute a (cid:101) δ -approximate solution y t to P y = r t by an SDD solver and store y t in thememory (cid:46) Theorem G.3 r t +1 ← r t − Ay t x t +1 ← x t + y t end for return x T . end procedure Since nnz ( P ) = (cid:101) O ( n ) and (cid:101) δ = δ/ / (2 log n ) , in Line 10, by Theorem G.3, with probability − / poly( n ) a (cid:101) δ -approximate solution y t can befound in (cid:101) O ( n log(1 / (cid:101) δ )) = (cid:101) O ( n ) time, and therefore in (cid:101) O ( n ) space. Note that this step does not readthe stream.In Line 11, computing r t +1 requires computing Ay t , which is done by reading the stream of L G and D for pass and multiplying the corresponding entries and adding up to the correspondingcoordinate. All vectors are in R n , so the total space used in each iteration is (cid:101) O ( n ) . The lemmafollows immediately from T = O (max { , log(1 /(cid:15) ) / log log n } ) and a union bound. G.5 An iterative solver: the accuracy The goal of this section is to prove Lemma G.7 (which shows that the solver iteratively improvesthe accuracy of the solution). Lemma G.7. Let δ ∈ (0 , / . For any t ∈ [0 , T ] , (cid:107) x t − A − b (cid:107) A ≤ (4 δ ) t · (cid:107) A − b (cid:107) A .Proof. The proof is by an induction on t . In the basic case of t = 0 , we have x t = 0 so the statementclearly holds. Assuming the lemma holds for t , we prove the inductive step for t + 1 .Since y t is a (cid:101) δ -approximate solution (Line 10, Algorithm 5), by Theorem G.3 we have (cid:107) y t − P − r t (cid:107) P ≤ (cid:101) δ · (cid:107) P − r t (cid:107) P . By definition of the matrix norm, this becomes ( y t − P − r t ) (cid:62) P ( y t − P − r t ) ≤ (cid:101) δ · ( P − r t ) (cid:62) P ( P − r t ) . (20)34ince P is a δ -preconditioner, assuming y t (cid:54) = P − r t and applying Definition G.1 on both sides ofEq. (20), we have (1 − δ ) · ( y t − P − r t ) (cid:62) A ( y t − P − r t ) ≤ (1 + δ ) (cid:101) δ · ( P − r t ) (cid:62) A ( P − r t ) . (21)If y t = P − r t , then Eq. (21) also holds since the right-hand side of Eq. (21) is non-negative due to A (cid:23) . Note that Eq. (21) implies (cid:107) y t − P − r t (cid:107) A ≤ (cid:101) δ (cid:112) (1 + δ ) / (1 − δ ) · (cid:107) P − r t (cid:107) A ≤ (cid:101) δ · (cid:107) P − r t (cid:107) A ≤ δ · (cid:107) P − r t (cid:107) A , (22)where the second step follows from (1 + δ ) / (1 − δ ) ≤ when δ ∈ (0 , / , and the last step followsfrom (cid:101) δ = δ/ .Before continuing, we observe the following, which easily follows from the update rule and aninduction on t : r t = r − A t − (cid:88) i =0 y i = r − A ( x t − x ) = b − Ax t . (23)Using Eq. (23), we bound the left-hand side of Eq. (22) from above by the following: (cid:107) y t − P − r t (cid:107) A ≤ δ · (cid:107) P − ( b − Ax t ) (cid:107) A = δ · (cid:107) P − A ( A − b − x t ) − ( A − b − x t ) + ( A − b − x t ) (cid:107) A ≤ δ · (cid:0) (cid:107) P − A ( A − b − x t ) − ( A − b − x t ) (cid:107) A + (cid:107) A − b − x t (cid:107) A (cid:1) ≤ δ · (cid:0) δ (cid:107) A − b − x t (cid:107) A + (cid:107) A − b − x t (cid:107) A (cid:1) = 2 δ · (cid:107) A − b − x t (cid:107) A , (24)where the third step follows from the triangle inequality, the fourth step follows from Lemma G.5,the last step follows from δ ∈ (0 , / .Finally, we are ready to prove the inductive step: (cid:107) x t +1 − A − b (cid:107) A = (cid:107) x t − A − b + y t (cid:107) A = (cid:107) ( x t − A − b + P − r t ) + ( y t − P − r t ) (cid:107) A ≤ (cid:107) x t − A − b + P − r t (cid:107) A + (cid:107) y t − P − r t (cid:107) A = (cid:107) x t − A − b + P − b − P − Ax t (cid:107) A + (cid:107) y t − P − r t (cid:107) A ≤ (cid:107) P − A ( A − b − x t ) − ( A − b − x t ) (cid:107) A + 2 δ · (cid:107) A − b − x t (cid:107) A ≤ δ · (cid:107) A − b − x t (cid:107) A + 2 δ · (cid:107) A − b − x t (cid:107) A ≤ (4 δ ) t +1 · (cid:107) A − b (cid:107) A , where the first step follows from x t +1 = x t + y t , the third step follows from the triangle inequality,the fourth step follows from Eq. (23), the fifth step follows from Eq. (24), the sixth step follows fromLemma G.5, and the last step follows from the induction hypothesis, completing the proof.35 .6 Main result The goal of this section is to prove Theorem G.9 Lemma G.8. Let (cid:15) ∈ (0 , and b ∈ R n . Let A = L G + D ∈ R n × n be an SDDM matrix where L G ∈ R n × n is the Laplacian matrix of graph G with weight w , and D ∈ R n × n is a diagonal matrix.If we can read all edge-weight pairs ( e, w e ) in one pass, and if we can read the diagonal of matrix D in one pass, then with probability − / poly( n ) , StreamLS ( A, b, (cid:15) ) returns an (cid:15) -approximatesolution x , i.e., (cid:107) x − A − b (cid:107) A ≤ (cid:15) · (cid:107) A − b (cid:107) A . in O (max { , log(1 /(cid:15) ) / log log n } ) passes and (cid:101) O ( n ) space.Proof. It follows by Lemma G.6, our choices of δ, T , and Lemma G.7.By Lemma G.8 and the reduction in Section J, we obtain our main result. Theorem G.9. There is a streaming algorithm which takes input an SDD matrix A , a vector b ∈ R n , and a parameter (cid:15) ∈ (0 , , if there exists x ∗ ∈ R n such that Ax ∗ = b , then withprobability − / poly( n ) , the algorithm returns an x ∈ R n such that (cid:107) x − x ∗ (cid:107) A ≤ (cid:15) · (cid:107) x ∗ (cid:107) A in O (max { , log(1 /(cid:15) ) / log log n } ) passes and (cid:101) O ( n ) space. H Minimum vertex cover The goal of this section is to prove Lemma H.1 (Correctness) and Lemma H.4 (Run time). H.1 Algorithm Algorithm 6 Minimum vertex cover procedure MinimumVertexCover ( G = ( V L , V R , E ) , b ∈ Z m , c ∈ Z n , n, m ) (cid:46) Lemma H.1 Let A be the signed edge-vertex incident matrix of G with direction V L to V R L ← the bit complexity of A , b , c Modify A , b , c (add constraints x V L ≥ and x V R ≤ ) to get A , b , c Modify A , b , c according to Lemma H.2 and get A , b , c (cid:46) A = A , b = b x init ← L · ( V L − V R ) t init ← c init ← −∇ g ( x init ) (cid:15) Φ ← / t ← (cid:15) Φ · ( m L +10 ) − x tmp ← InteriorPointMethod ( A , b , c init , t init , x init , t , , (cid:15) Φ / t ← nm L +10 x ← InteriorPointMethod ( A , b , c , t , x tmp , t , (cid:15) Φ , Use Lemma H.2 to turn x into a set S of tight constraints on LP of ( A , b , c ) (cid:46) S ⊆ [ m + 2 n ] return S ∩ [ m ] end procedure .2 Correctness of Algorithm 6 The goal of this section is to prove Lemma H.1. Lemma H.1 (Correctness of Algorithm 6) . In Algorithm 6, let A be signed edge-vertex incidentmatrix of G with direction V L to V R . Let [ n ] = V L ∪ V R . Let b = b and c = c be the input. If thelinear programming min x ∈ R n c (cid:62) x (25) s . t . Ax ≥ bx v ≥ , ∀ v ∈ V L x v ≤ , ∀ v ∈ V R is both feasible and bounded, then with probability at least / , Algorithm 6 returns a set of tightconstraints on some optimal solution to the LP Eq. (25) .Proof. First we need to show these two calls into InteriorPointMethod satisfy the initial con-dition that x start , t start is a good start point. The first call In the first call (Line 11), we have (cid:107) t init · c init + ∇ g ( x init ) (cid:107) H ( x init ) − = (cid:107) (cid:107) H ( x init ) − = 0 ≤ (cid:15) Φ / , where the first step is by definition of c init .Since the algorithm InteriorPointMethod ends up in parameter x tmp and t , by Lemma E.3,we have (cid:107) t · c init + ∇ g ( x tmp ) (cid:107) H ( x tmp ) − ≤ (cid:15) Φ / . (26)Note that t and x tmp is the input to the second call. The second call In the second call (Line 13), the desired term can be upper bounded by (cid:107) t · c + ∇ g ( x tmp ) (cid:107) H ( x tmp ) − ≤ (cid:107) t · c init + ∇ g ( x tmp ) (cid:107) H ( x tmp ) − + (cid:107) t · c init − t · c (cid:107) H ( x tmp ) − ≤ (cid:15) Φ / t · (cid:107) c init − c (cid:107) H ( x tmp ) − where the first step is by triangle inequality, the second step is from Eq. (26).Now let’s bound the term (cid:107) c init − c (cid:107) H ( x tmp ) − . (cid:107) c init − c (cid:107) H ( x tmp ) − ≤ (cid:107) c init − c (cid:107) · λ min ( H ( x tmp )) − / ≤ (cid:107) c init − c (cid:107) · L +3 ≤ ( (cid:107) c init (cid:107) + (cid:107) c (cid:107) ) · L +3 ≤ ( √ m L +3 + (cid:107) c (cid:107) ) · L +3 ≤ ( √ m L +3 + m L +4 ) · L +3 ≤ m L +8 , where the first step is by (cid:107) x (cid:107) H = √ x (cid:62) Hx ≤ (cid:107) x (cid:107) · (cid:112) λ max ( H ) and λ max ( H ) = λ min ( H − ) − , thesecond step is by Lemma H.3, the fourth step is by c init = −∇ g ( x init ) , x init = 2 L ( V L − V R ) andthe definition of g (Def. C.4), the fifth step is by definition of c (in Lemma H.2).37y our choice of t := (cid:15) Φ · ( m L +10 ) − (Line 10), we finally have (cid:107) t · c + ∇ g ( x tmp ) (cid:107) H ( x tmp ) − ≤ (cid:15) Φ / (cid:15) Φ / ≤ (cid:15) Φ . Let OPT be the optimal solution to the linear programming min x ∈ R n ,A x ≥ b c (cid:62) x. By t := nm L +10 and Lemma E.1, we know our solution x (Line 13) is feasible and has value c (cid:62) x − OPT ≤ mt · (1 + 2 (cid:15) Φ ) ≤ n − − L − . Now, we can apply Lemma H.2 to show that after onematrix vector multiplication, with probability at least / , we can output the tight constraints S of a basic feasible optimal solution of min x ∈ R n ,A x ≥ b c (cid:62) x. (27)Note that this LP is exactly the same as Eq. (25) by our construction on A , b , c (Line 4). Lemma H.2 (Lemma 43 of [LS14]) . Given a feasible and bounded linear programming min x ∈ R n c (cid:62) x (28) s . t . Ax ≥ b, where A ∈ Z m × n , b ∈ Z m , c ∈ Z n all having integer coefficient. Let L be the bit complexity ofEq. (28) .Let r ∈ Z n be chose uniformly at random from the integers {− L +1 n, · · · , L +1 n } . Consider theperturbed linear programming min x ∈ R n (2 L +3 n · c + r ) (cid:62) x (29) s . t . Ax ≥ b. Then with probability at least / over the randomness on r ∈ Z n , we have the following.Let OPT be defined as the optimal value of linear programming Eq. (29) . If we can find onefeasible solution for Eq. (29) with value less than OPT + n − − L − , then we can find the tightconstraints of a basic feasible optimal solution of Eq. (28) using one matrix vector multiplicationwith A . Moreover, the bit complexity of Eq. (29) is at most L + log(8 n ) . Lemma H.3. Let H ( x ) be defined as in Lemma H.6, then we have λ min ( H ( x )) ≥ − L − . Proof. We can lower bound λ min ( H ( x )) in the following sense, λ min ( H ( x )) ≥ λ min ( D ( x )) ≥ min i ∈ [ n ] D ( x ) i,i ≥ − L − where the first step is by Lemma H.6 that H ( x ) = L ( x ) + D ( x ) and both L ( x ) and D ( x ) are positivesemi-definite matrices, the second step is by the fact that D ( x ) is diagonal matrix, the third stepis by Lemma H.6 that D ( x ) i,i = s i + m ( x ) − + s i + m + n ( x ) − and s ( x ) := Ax − b is bounded by (cid:107) s (cid:107) ∞ ≤ L +3 . 38 .3 Running time of Algorithm 6 Lemma H.4 (Running time of Algorithm 6) . Suppose there’s an oracle I running in f ( n ) spacethat can output ( b ) i given any i ∈ [ m ] . If the input satisfies (cid:107) b (cid:107) ∞ , (cid:107) c (cid:107) ∞ is polynomially bounded,then Algorithm 6 can be implemented in streaming model within (cid:101) O ( n ) + f ( n ) + | S | space and (cid:101) O ( √ m ) passes.Proof. In Line 2, we defined A but never explicitly compute and store A in memory.In Line 3, we let L ← O (log n ) as the upper bound on the bit complexity. This cost one pass.Indeed, Lemma A.5 implies that | d max ( A ) | ≤ . So L is the upper bound on the bit complexity.In Line 4 and Line 5, we defined A , b , c , A , b , c . This doesn’t involve computation.In Lines 6, 7, 8, 9, 10, 12, they are done locally in memory.In Line 11 and Line 13, we will show the following three things in order to call Lemma F.1.1. (cid:80) i ∈ [ m ] a i /s i ( x ) can be computed in one pass in h ( n ) space;2. the Hessian H ( x ) ∈ R n × n can be decomposed into H ( x ) = L G ( x ) + D ( x ) , where L G ( x ) ∈ R n × n is the Laplacian of some graph G with edge weight w and D ( x ) ∈ R n × n is diagonal matrix;3. we can read the edge-weight pair ( e, w e ) in one pass, and we can read the diagonal of D ( x ) inone pass. Part 1 : By Lemma H.5 and our assumption of the existence of such oracle I , given a feasible x ∈ R n ,we can output a stream of all edges e i = ( u, v ) together with its slack s i ( x ) = a (cid:62) i x − b i in one passwithin O ( n ) + f ( n ) space. So (cid:80) i ∈ [ m ] a i /s i ( x ) can be computed in one pass in h ( n ) = O ( n ) + f ( n ) space. Part 2 : This follows by Lemma H.6. Part 3 : By Lemma H.6, the edge weight is w i = s i ( x ) − , and D ( x ) i,i = s i + m ( x ) − + s − i + m + n .So Part 3 is satisfied by Lemma H.5.By Lemma F.1, Line 11 and Line 13 can be implemented in streaming model within (cid:101) O ( n ) + f ( n ) space and (cid:101) O ( √ m ) passes.In Line 14, by Lemma H.2, it can be done in one matrix multiplication with A , which can bedone in one pass and O ( n ) space.Outputting set S requires | S | space. Lemma H.5 (Output slack stream) . Suppose there is an oracle I running in f ( n ) space that canoutput b i given any i ∈ [ m ] . Given a feasible x ∈ R n , we can output a stream of all edges e i = ( u, v ) together with its slack s i ( x ) = a (cid:62) i x − b i in one pass within O ( n ) + f ( n ) space.Proof. Wlog, we assume every edge has its unique id i ∈ [ m ] and we receive this id in the stream .We read one pass of all edges. Upon receiving edge e i = ( u, v ) , we output x u − x v − b i where weuse I to calculate b i . Actually we can remove this assumption by observing that our oracle Algorithm 3 also works for i ∈ [ n ] , inwhich we give every edge e = ( u, v ) an index id( e ) = u · n + v ∈ [ n ] . .4 Building blocks Lemma H.6 (Hessian is SDDM matrix) . Let the graph G = ( V, E ) be the input of Algorithm 6.Let n = | V | , m = | E | . Let A ∈ R ( m + n ) × n , b ∈ R m + n be defined as in Line 5 (Algorithm 6).Let x ∈ R n be any feasible point, i.e. A x > b . Let s ( x ) ∈ R m + n and H ( x ) ∈ R n × n be the slackand hessian defined w.r.t. A and b (Def. C.4). Then H ( x ) ∈ R n × n can be written as H ( x ) = L ( x ) + D ( x ) , where L ( x ) ∈ R n × n is the laplacian matrix of graph G with edge weight { s ( x ) − , · · · , s m ( x ) − } , D ( x ) ∈ R n × n is the diagonal matrix with each diagonal entry D ( x ) i,i = s i + m ( x ) − .Proof. Let A ∈ R m × n be signed edge-vertex incident matrix of G with direction V L to V R , then A · x ≥ b can be written as (cid:20) A I V L − I V R (cid:21) · x ≥ (cid:20) b (cid:21) , where I V L ∈ R n × n denotes the diagonal matrix with I i,i = (cid:40) , if i ∈ V L ;0 , otherwise . and I V R ∈ R n × n denotes the diagonal matrix with I i,i = (cid:40) , if i ∈ V R ;0 , otherwise . Denote S ∈ R m × m the diagonal matrix with each diagonal entry S i,i := s i ( x ) . Thus H ( x ) canbe written as H ( x ) = A (cid:62) S − A + m + n (cid:88) i = m +1 e i e (cid:62) i · s i ( x ) − = L ( x ) + D ( x ) , where the first step is by definition of H ( x ) (Def. C.4), the second step is by definition of L ( x ) and D ( x ) . H.5 A minimum vertex cover solver Note that Algorithm 6 is actually a high-accuracy fractional minimum vertex cover solver for generalgraph (not necessarily bipartite graph), since we do not use the bipartite property of matrix A inIPM. Theorem H.7. Let G be a graph with n vertices and m edges. Consider the fractional minimumvertex cover problem Eq. (31) in which every edge e needs to be covered at least b e times. Let x ∗ ∈ R n be the optimal solution of Eq. (31) . There exists a streaming algorithm (Algorithm 6) such that forany δ > , it can output a feasible vertex cover x ∈ R n such that (cid:62) n x ≤ (cid:62) n x ∗ + δ in (cid:101) O ( √ m ) · log(1 /δ ) passes and (cid:101) O ( n ) · log(1 /δ ) space with probability − / poly( n ) . roof. The proof follows directly from the proof of Lemma H.1 and Lemma H.4.As a byproduct, we obtain a fast semi-streaming algorithm for (exact, integral) minimum vertexcover in bipartite graph. Theorem H.8. Given a bipartite graph G with n vertices and m edges, there exists a streamingalgorithm that computes a minimum vertex cover of G in (cid:101) O ( √ m ) passes and (cid:101) O ( n ) space withprobability − / poly( n ) .Proof. Given a bipartite graph G , we use the isolation lemma from [MVV87] to assign weight chosenuniformly from [2 m ] on each vertex independently, which takes (cid:101) O ( n ) space. The weight vector isdenoted as w ∈ Z n . Next, we write an LP in the form of Eq. (31) for this weighted minimum vertexcover problem, where the objective is w (cid:62) x (instead of (cid:62) n x ). Since G is bipartite, by Theorem A.5,all extreme points of the polytope are integral. By the isolation lemma, the optimal solution isunique, so the unique solution must be integral. Taking δ = 1 / poly( n ) for some sufficiently large poly( n ) and applying Theorem H.7, the high-accuracy solution obtained from IPM can be roundedto the optimal integral solution, which in total takes (cid:101) O ( √ m ) passes and (cid:101) O ( n ) space with probability − / poly( n ) . I Combine The goal of this section is to prove Theorem I.1. Theorem I.1 (Main theorem, formal version of Theorem 1.1) . Given a bipartite graph G with n vertices and m edges, there exists a streaming algorithm that computes a maximum weightedmatching of G in (cid:101) O ( √ m ) passes and (cid:101) O ( n ) space with probability − / poly( n ) .Proof. By combining Lemma I.2 and Lemma I.3. I.1 AlgorithmsI.2 Running time The goal of this section is to prove Lemma I.2 Lemma I.2 (Running time) . Given a bipartite graph with n vertices and m edges, Algorithm 7 canbe implemented such that it runs in (cid:101) O ( √ m ) passes in streaming model in (cid:101) O ( n ) space.Proof. In Line 6, 7, we actually do not explicitly calculate b and b and stored them in memory. Weinstead use an oracle I stated in Lemma B.1 that gives b i bits by bits. So this step does not costspace.In Line 8, we calculate and store c ∈ Z n in (cid:101) O ( n ) space.In Line 9, we call MinimuVertexCover . In order to use Lemma H.4, we need to prove thefollowing properties.1. By Lemma B.2, (cid:107) b (cid:107) ∞ ≤ n so (cid:107) b (cid:107) ∞ ≤ n . And (cid:107) c (cid:107) ∞ = 1 . So both (cid:107) b (cid:107) ∞ and (cid:107) c (cid:107) ∞ ispolynomially bounded.2. By Lemma B.1, there is an oracle I that uses O (log( n n ) + log( m )) = (cid:101) O ( n ) space that canoutput b i . Since each edge e i comes with its weight w i in the stream, we can output b i = b i + n w i when given i ∈ [ m ] .3. We can always assume | S | ≤ n since otherwise we will never enter Line 10.By applying Lemma H.4, this call can be done in (cid:101) O ( n ) space and (cid:101) O ( √ m ) passes.41 lgorithm 7 Main ( G = ( V L , V R , E, w )) procedure Main ( G = ( V L , V R , E, w ) ) (cid:46) Theorem I.1 (cid:46) w denotes edge weights that are integers. n ← | V L | + | V R | , m ← | E | M ← ∅ (cid:46) M is maximum matching for i = 1 → O (log n ) do b ← Isolation ( m, n n ) (cid:46) b ∈ Z m , Algorithm 2, Lemma B.2 b ← b + n · w c ← V L − V R S ← MinimumVertexCover ( G, b, c, n, m ) (cid:46) S ⊆ [ m ] , Algorithm 6 if | S | ≤ n then Let M (cid:48) be maximum weighted matching found in edge set S if w ( M (cid:48) ) > w ( M ) then (cid:46) Update maximum weighted matching M ← M (cid:48) end if end if end for return M end procedure In Line 11, since we are finding maximum matching in a graph with n vertices and n edges,we can store them in memory, then check all the n possible sets of edges. This cost (cid:101) O ( n ) spacewithout any pass. We find the set of edges that is a matching and has the maximum weight.With O (log n ) iterations overhead (Line 5), the number of passes blow up by an O (log n ) factor.So overall, we used (cid:101) O ( n ) space and (cid:101) O ( √ m ) passes. I.3 Correctness The goal of this section is to prove Lemma I.3. Lemma I.3 (Correctness) . Given a bipartite graph G with n vertices and m edges. With probabilityat least − / poly( n ) , Algorithm 7 outputs one maximum matching.Proof. Consider the linear programming Eq. (30). According to part 1 of Lemma I.7, with proba-bility at least / , there is a unique solution y ∗ to Eq. (30).By Lemma H.1, with probability at least / the algorithm MinimumVertexCover success-fully returns a subset S of tight constraints which corresponds to an optimal solution x ∗ , s ∗ ondual problem Eq. (31) (note that the linear programming in Lemma H.1 and Eq. (31) only differin signs). This means s ∗ i = 0 if and only if i ∈ S . According to part 1 and part 2 of Lemma I.7, | S | ≤ n and y ∗ i = 0 for all i / ∈ S . Therefore, there exists a maximum matching using only edges in S , and we will find it in Line 11.Overall, in each iteration of the for loop (Line 5), with probability at least (1 / · (1 / 4) = 1 / we can find a maximum matching. After O (log n ) loops, we can find a maximum matching withprobability at least − / poly( n ) . I.4 Primal to dual This section tells the preliminary stuff for proving correctness Lemma I.3. We give Theorem I.6,which is later used in Lemma I.7. 42 efinition I.4 (Maximum weighted matching) . Given a bipartite graph G = ( V, E ) with | V | = n and | E | = m . Let A ∈ { , } m × n be the unsigned edge-vertex incident matrix. Given weight b ∈ Z m on every edge, the maximum weighted matching can be written as the following linear programming: Primal max y ∈ R m b (cid:62) y (30) s . t . A (cid:62) y ≤ n y ≥ Its dual form is Definition I.5 (Fractional minimum vertex cover) . Let A ∈ Z m × n , b ∈ Z m be defined as in Def. I.4.The dual form of Eq. (30) is Dual min x ∈ R n (cid:62) n x (31) s . t . Ax ≥ bx ≥ Theorem I.6 (Strong duality from complementary slackness [PS98]) . Let y ∈ R m be a feasiblesolution to the primal Eq. (30) , and let x ∈ R n be a feasible solution to the dual Eq. (31) . Let s := Ax − b ∈ R m . Then x ∈ R n , s ∈ R m , y ∈ R m satisfy y (cid:62) s = 0 and x (cid:62) ( n − A (cid:62) y ) = 0 if and only if x ∈ R n , s ∈ R m is optimal to the dual and y ∈ R m is optimal to the primal. I.5 Properties of primal and dual LP solutions The goal of this section is to prove Lemma I.7. Lemma I.7 (Properties of LP solutions) . Given a bipartite graph G with n vertices and m edges. Let w ∈ N m be edge weight. Let b ∈ Z m be the output of Isolation ( m, Z ) where Z := n n (Algorithm 2).Let b := b + n · w ∈ Z m . Let A ∈ { , } m × n be edge-vertex incident matrix of G (unsigned).Consider the linear programming in Eq. (30) and Eq. (31) with parameter A and b . With probabilityat least / , we have1. There is a unique solution y ∗ ∈ R m to the primal LP (Eq. (30) ). Furthermore, y ∗ ∈ { , } m , y ∗ is the maximum candidate matching of G ;2. Let x ∗ ∈ R n , s ∗ ∈ R m be optimal solution to the dual LP (Eq. (31) ). Then we have the followingproperties on s ∗ .(a) For any i ∈ [ m ] , if s ∗ i > then y ∗ i = 0 ;(b) (cid:107) s ∗ (cid:107) ≥ m − n Proof. Part 1 Let the feasible space of y be S := { y ∈ R m | Ay ≤ n , y ≥ } . We implicitly have that y ≤ m ,so S is a bounded region. Let S denote all extreme points on S . The dual LP is a generalized version of the minimum vertex cover problem: each edge i needs to be covered byat least b i times, where the case of b = m is the classic minimum vertex cover. S which has the optimal solution.By Lemma A.5, we know all extreme points is integral. Since m ≤ y ≤ m , all extreme pointsare in { , } m , which correspond to a matching. Because we set b := b + n · w as our objectivevector, we can write (cid:104) b, y (cid:105) = (cid:104) b, y (cid:105) + n · w (cid:62) y . Since (cid:107) b (cid:107) ∞ ≤ n by Lemma B.2, the extreme pointwho has the optimal objective value must be a maximum weighted matching. Let F be all possiblematchings. We have |F | ≤ n n = Z .By applying Lemma B.2, with probability at least / , we know that there is a unique extremepoint y ∗ in S which has the optimal solution.Because our feasible space S is bounded, all point y ∈ S \ S can be written as a linear combinationof extreme points on S . That is, if we write S = { y (1) , · · · , y ( s ) } where s := | S | , then all point y ∈ S \ S can be written as y = (cid:88) i ∈ [ s ] a i y ( i ) , where ≤ a i < , ∀ i ∈ [ s ] and (cid:80) i ∈ [ s ] a i = 1 . Therefore, we have b (cid:62) y = (cid:88) i ∈ [ s ] a i · ( b (cid:62) y ( i ) ) < max i ∈ [ s ] b (cid:62) y ( i ) . So y ∗ is actually the unique optimal solution among all points in S . Part 2 Part (a) follows trivially from Theorem I.6.Now we prove Part (b). Assume (cid:107) s ∗ (cid:107) < m − n . Let x ∗ ∈ R n be any optimal dual solutionthat relates to s ∗ , i.e. Ax ∗ − b = s ∗ . We will show that there exist a feasible solution to the primal y (cid:48) ∈ R m such that y (cid:48) (cid:54) = y ∗ , (cid:104) y (cid:48) , s ∗ (cid:105) = 0 , (cid:104) x ∗ , n − A (cid:62) y (cid:48) (cid:105) = 0 . By Theorem I.6, y (cid:48) is also an optimalsolution to the primal LP, contradicting with the uniqueness of y ∗ . (In fact, we will prove that such y (cid:48) ’s are infinitely many.)Consider the following linear system y i = 0 , ∀ i ∈ [ m ] such that s ∗ i (cid:54) = 0 (32) A (cid:62) y = A (cid:62) y ∗ . (33)In constraint Eq. (32) there are less than (cid:107) s ∗ (cid:107) < m − n equalities, while in constraint Eq. (33)there are n equalities. So if we write the above system in the matrix form (cid:101) Ay = (cid:101) c, y ≥ m , it mustbe that rank ( (cid:101) A ) ≤ (cid:107) s (cid:107) + n < m . We obtain y = (cid:101) A † (cid:101) c + ( I − (cid:101) A † (cid:101) A ) z, where z ∈ R m is a free variable. Since rank ( (cid:101) A (cid:62) (cid:101) A ) ≤ rank ( (cid:101) A ) < m , it must be I m − (cid:101) A † (cid:101) A (cid:54) = m × m .Observe that f ( z ) := (cid:101) A † (cid:101) c + ( I − (cid:101) A † (cid:101) A ) z is an affine function passing through point y ∗ . Also notethat y ∗ ≥ m and y ∗ (cid:54) = m (otherwise y ∗ = m then we can increase an arbitrary component of y ∗ to to increase b (cid:62) y , contradicting with the optimality). As a result, f ( z ) must pass throughinfinitely many points in the subspace y ≥ m . Let y (cid:48) be any such solution. By Eq. (32), we have (cid:104) y (cid:48) , s ∗ (cid:105) = 0 By Eq.(33) and Theorem I.6, we have (cid:104) x ∗ , n − A (cid:62) y (cid:48) (cid:105) = (cid:104) x ∗ , n − A (cid:62) y (cid:105) = 0 By Theorem I.6, y (cid:48) is also an optimal solution, contradicting with the uniqueness of y ∗ .44 Solver reductions J.1 From SDDM solver to SDD solver We recall Gremban’s reduction in [ST04] that reduces the problem of sovling an SDD system tosolving an SDDM system. Let A be an SDD matrix, decompose A into D + A neg + A pos , where D is the diagonal of A , A neg contains all the negative off-diagonal entries of A with the same size, and A pos contains all the positive off-diagonal entries of A with the same size. Consider the followinglinear system (cid:98) A (cid:20) x x (cid:21) = (cid:98) b, where (cid:98) A = (cid:20) D + A neg − A pos − A pos D + A neg (cid:21) and (cid:98) b = (cid:20) b − b (cid:21) The matrix (cid:98) A can be (implicitly) computed in the streaming model: in one pass we computeand store the diagonal matrix D by adding the edge weights incident on each vertex; then (cid:98) A isgiven as a stream of edges (entries) since whenever an edge (an entry in A ) arrives, we immediatelyknow its position in (cid:98) A . Note that if Ax = b admits a solution, then x = ( x − x ) / is exactly itssolution. Moreover, if (cid:13)(cid:13)(cid:13)(cid:13)(cid:20) x x (cid:21) − (cid:98) A † (cid:98) b (cid:13)(cid:13)(cid:13)(cid:13) (cid:98) A ≤ (cid:15) (cid:107) (cid:98) A † (cid:98) b (cid:107) (cid:98) A , then x satisfies (cid:107) x − A † b (cid:107) A ≤ (cid:15) (cid:107) A † b (cid:107) A . So we obtain an SDD solver with asymptotically the samenumber of passes and space as an SDD sovler. J.2 From SDDM solver to SDDM solver In this section, we show that to approximately solve an SDDM system Ay = b , it suffices to pre-process the input in O (1) passes, approximately solve an SDDM system (cid:101) Ay = (cid:101) b with at most thesame size, and possibly do some post-process in O (1) passes.If A is positive definite, then we can solve the system by an SDDM solver, so assume not in thefollowing.From Fact A.4 we know that A must be a Laplacian matrix. Therefore, it remains to reducethe problem of (approximately) solving a Laplacian system Ly = b to (approximately) solving anSDDM system. The following facts are well-known. Fact J.1. Given a Laplacian matrix L corresponding to graph G , the following holds: • L (cid:23) ; • L n = 0 and thus λ ( L ) = 0 ; • G is connected iff λ ( L ) > . Given a Laplacian matrix L as a stream of entries, it is equivalent to treat it as a stream ofedges of G . In one pass, we can identify all the connected components of G using (cid:101) O ( n ) space(e.g., by maintaining the spanning forest of G ). Next, any entry in the stream is identified and45ssigned to the subproblem corresponding to the connected component that contains it. Thisdoes not influence the worst-case pass and space complexity, because each subproblem uses spaceproportional to the size of its connected component and the total number of passes depends on theconnected component that takes up the most passes. Therefore, we can assume that G is connected,which implies that rank ( L ) = n − by Fact J.1.The goal of approximately solving Ly = b is for given error parameter (cid:15) > , finding an (cid:15) -approximate solution x satisfying (cid:107) x − L † b (cid:107) L ≤ (cid:15) (cid:107) L † b (cid:107) L . If y is an (exact) solution to system Ly = b , then y (cid:48) := y − y n is also a solution, where y is the first entry of y . So we can assume that the first entry of L † b is . (There might be manysolutions, but we fix one with the first entry being .) Let (cid:101) A be the matrix L with the first row andcolumn deleted, and let (cid:101) b be the vector b with the first entry deleted. Note that (cid:101) A (cid:31) .Let (cid:101) x be an (cid:15) -approximate solution to the system (cid:101) Ax = (cid:101) b , and let x be the vector (cid:101) x with inserted as its first entry. It must be that (cid:107) x − L † b (cid:107) L = (cid:107) (cid:101) x − (cid:101) A − (cid:101) b (cid:107) (cid:101) A because vector (cid:101) A − (cid:101) b is the vector L † b with the first entry deleted.Finally, we have that (cid:107) x − L † b (cid:107) L ≤ (cid:15) (cid:107) L † b (cid:107) L since (cid:107) (cid:101) x − (cid:101) A − (cid:101) b (cid:107) (cid:101) A ≤ (cid:15) (cid:107) (cid:101) A − (cid:101) b (cid:107) (cid:101) A and (cid:107) L † b (cid:107) L = (cid:107) (cid:101) A − (cid:101) b (cid:107) (cid:101) A , which gives an (cid:15) -approximate solution to the original system Ly = b . The above process is equivalent to partition L into block diagonal matrices, solve each linear system with respectto the submatrices and corresponding entries of b , and combine the result., and combine the result.