[PDF] A Potential Reduction Inspired Algorithm for Exact Max Flow in Almost O ˜ ( m 4/3 ) Time

Abstract

We present an algorithm for computing s - t maximum flows in directed graphs in O ˜ ( m 4/3+o(1) U 1/3 ) time. Our algorithm is inspired by potential reduction interior point methods for linear programming. Instead of using scaled gradient/Newton steps of a potential function, we take the step which maximizes the decrease in the potential value subject to advancing a certain amount on the central path, which can be efficiently computed. This allows us to trace the central path with our progress depending only ℓ ∞ norm bounds on the congestion vector (as opposed to the ℓ 4 norm required by previous works) and runs in O( m − − √ ) iterations. To improve the number of iterations by establishing tighter bounds on the ℓ ∞ norm, we then consider the weighted central path framework of Madry \cite{M13,M16,CMSV17} and Liu-Sidford \cite{LS20}. Instead of changing weights to maximize energy, we consider finding weights which maximize the maximum decrease in potential value. Finally, similar to finding weights which maximize energy as done in \cite{LS20} this problem can be solved by the iterative refinement framework for smoothed ℓ 2 - ℓ p norm flow problems \cite{KPSW19} completing our algorithm. We believe our potential reduction based viewpoint provides a versatile framework which may lead to faster algorithms for max flow.

Full PDF

aa r X i v : . [ c s . D S ] S e p A Potential Reduction Inspired Algorithm for Exact Max Flow inAlmost e O ( m / ) Time

Tarun Kathuria ∗ September 8, 2020

Abstract

We present an algorithm for computing s - t maximum ﬂows in directed graphs in e O ( m / o (1) U / ) time. Our algorithm is inspired by potential reduction interior point methodsfor linear programming. Instead of using scaled gradient/Newton steps of a potential function,we take the step which maximizes the decrease in the potential value subject to advancing acertain amount on the central path, which can be eﬃciently computed. This allows us to tracethe central path with our progress depending only ℓ ∞ norm bounds on the congestion vector (asopposed to the ℓ norm required by previous works) and runs in O ( √ m ) iterations. To improvethe number of iterations by establishing tighter bounds on the ℓ ∞ norm, we then consider theweighted central path framework of Madry [Mad13, Mad16, CMSV17] and Liu-Sidford [LS20b].Instead of changing weights to maximize energy, we consider ﬁnding weights which maximizethe maximum decrease in potential value. Finally, similar to ﬁnding weights which maximizeenergy as done in [LS20b] this problem can be solved by the iterative reﬁnement frameworkfor smoothed ℓ - ℓ p norm ﬂow problems [KPSW19] completing our algorithm. We believe ourpotential reduction based viewpoint provides a versatile framework which may lead to fasteralgorithms for max ﬂow. ∗ U.C. Berkeley, [email protected], supported by NSF Grant CCF 1718695.

Introduction

The s - t maximum ﬂow problem and its dual, the s - t minimum cut on graphs are amongst themost fundamental problems in combinatorial optimization with a wide range of applications.Furthermore, they serve as a testbed for new algorithmic concepts which have found uses in otherareas of theoretical computer science and optimization. This is because the max-ﬂow and min-cutproblems demonstrate the prototypical primal-dual relation in linear programs. In the well-known s - t maximum ﬂow problem we are given a graph G = ( V, E ) with m edges and n vertices withedge capacities u e ≤ U , and aim to route as much ﬂow as possible from s to t while restricting themagnitude of the ﬂow on each edge to its capacity.Several decades of work in combinatorial algorithms for this problem led to a large set ofresults culminating in the work of Goldberg-Rao [GR98] which gives a running time bound of O ( m min { m / , n / } log( n m ) log U ) . This bound remained unimproved for many years. In abreakthrough paper, Christiano et al [CKM +

11] show how to compute approximate maximumﬂows in e O ( mn / log( U ) poly (1 /ε )) . Their new approach uses electrical ﬂow computations whichare Laplacian linear system solves which can be solved in nearly-linear time [ST14] to take steps tominimize a softmax approximation of the congestion of edges via a second order approximation. Astraightforward analysis leads to a O ( √ m ) iteration algorithm. However, they present an insight bytrading oﬀ against another potential function and show that O ( m / ) iterations suﬃce. This workled to an extensive line of work exploiting Laplacian system solving and continuous optimizationtechniques for faster max ﬂow algorithms. Lee et al. [LRS13] also present another O ( n / poly(1 /ε )) iteration algorithm for unit-capacity graphs also using electrical ﬂow primitives. Finally Kelneret al. [KLOS14] and Sherman [She13, She17b] present algorithms achieving O ( m o (1) poly(1 /ε )) iteration algorithm for max-ﬂow and its variants, which are based on congestion approximatorsand oblivious routing schemes as opposed to electrical ﬂow computations. This has now beenimproved to near linear time [Pen16, She17a]. Crucially this line of work can only guarantee weakapproximations to max ﬂow due to the poly(1 /ε ) in the iteration complexity.In order to get highly accurate solutions which depend only polylogarithmically on /ε , workhas relied on second-order optimization techniques which use ﬁrst and second-order information(the Hessian of the optimization function). To solve the max ﬂow problem to high accuracy, severalworks have used interior point methods (IPMs) for linear programming [NN94, Ren01]. Thesealgorithms approximate non-negativity/ ℓ ∞ constraints by approximating them by a self-concordant barrier, an approximation to an indicator function of the set which satisﬁes local smoothness andstrong convexity properties and hence can be optimized using Newton’s method. In particular,Daitch and Spielman [DS08] show how to combine standard path-following IPMs and Laplacianlinear system solves to obtain e O ( m √ m log( U/ε )) iterations, matching Goldberg and Rao up tologarithmic factors. The O ( √ m ) iterations is a crucial bottleneck here due to the ℓ ∞ norm beingapproximated by ℓ norm to a factor of √ m . Then Lee and Sidford [LS14] devised a fasterIPM using weighted logarithmic barriers to achieve a e O ( m √ n log( U/ε ) time algorithm. Madry[Mad13, Mad16] opened up the weighted barriers based IPM algorithms for max ﬂow to show thatinstead of ℓ norm governing the progress of each iteration, one can actually make the progressonly maintaining bounds on the ℓ norm. Combining this with insights from [CKM + e O ( m / ) iteration algorithm whichleads to a e O ( m / U / log( m/ε )) time. Note that the algorithm depends polynomially on themaximum capacity edge U and hence is mainly an improvement for mildly large edge capaci-1ies. This work can also be used to solve min cost ﬂow problems in the same running time [CMSV17].Another line of work beyond IPMs is to solve p -norm regression problems on graphs. Suchproblems interpolate between electrical ﬂow problems p = 2 , maximum ﬂow problems p = ∞ andtransshipment problems p = 1 . While these problems can also be solved in O ( √ m ) iterations tohigh accuracy using IPMs[NN94], it was unclear if this iteration complexity could be improveddepending on the value of p . Bubeck et al. [BCLL18] showed that for any self-concordant barrierfor the ℓ p ball, the iteration complexity has to be at least O ( √ m ) thus making progress using IPMsunlikely. They however showed another homotopy-based method, of which IPMs are also a part of,can be used to solve the problem in e O p ( m − p log(1 /ε )) iterations, where O p hides dependencieson p in the runtime. This leads to improvements on the runtime for constant values of p . Next,Adil et al. [AKPS19], inspired by the work of [BCLL18] showed that one can measure the changein p -norm using a second order term based on a diﬀerent function which allows them to obtainapproximations to the p -norm function in diﬀerent norms with strong condition number. Theseresults can be viewed in the framework of relative convexity [LFN18]. Thus, they can focus onjust solving the optimization problem arising from the residual. Using insights from [ CKM + ] ,they arrive at a e O p ( m / log(1 /ε ) -time algorithm. Then follow-up work by Kyng et al. [KPSW19]opened up the tools used by Spielman and Teng [ST14] for ℓ -norm ﬂow problems to show thatone can construct strong preconditioners for the residual problems for mixed ℓ - ℓ p -norm ﬂowproblems, a generalization of ℓ p -norm ﬂow and obtain an e O p ( m o (1) log(1 /ε ) algorithm. Theseresults however do not lead to faster max ﬂow algorithms however due to their large dependence on p .However, Liu and Sidford [LS20b] improving on Madry [Mad16] showed that instead of carefullytuning the weights based on the electrical energy, one can consider the separate problem of ﬁndinga new set of weights under a certain budget constraint to maximize the energy. They showed that aversion of this problem reduce to solving ℓ - ℓ p norm ﬂow problems and hence can be solved in almost-linear time using the work of [KPSW19, AS20]. This leads to a O ( m / o (1) U / ) -time algorithmfor max ﬂow. However, this result still relies on the amount of progress one can take in each iterationbeing limited to the bounds one can ensure on the ℓ norm of the congestion vector, as opposed tothe ideal ℓ ∞ norm. We remark here that there are IPMs for linear programming which only measurecentrality in ℓ ∞ norm as opposed to the ℓ or ℓ norm. In particular [CLS19, LSZ19, vdBLSS20]show how to take a step with respect to a softmax function of the duality gap and trace the centralpath only maintaining ℓ ∞ norm bounds. [Tun95, Tun94] also designed potential reduction basedIPMs which trace the central path only maintaining centrality in ℓ ∞ . In this paper, we devise a faster interior point method for s - t maximum ﬂow in directed graphs.Precisely, our algorithm runs in time e O ( m / o (1) U ) . During the process of writing this paper, wewere informed by Yang Liu and Aaron Sidford [LS20a] that they have also obtained an algorithmachieving the same runtime. They also end up solving the same subproblems that we will end upsolving, although they arrive at it from the perspective of considering the Bregman divergence ofthe barrier as opposed to considering the potential funcion that is the inspiration for our work. Ouralgorithm builds on top of both Madry [Mad16] and Liu-Sidford [LS20b] and is arguably simplerthan both in some regards.In particular, our algorithm is based on potential reduction algorithms which are a kind ofinterior point methods for linear programs. These algorithms are based on a potential function2hich measures both the duality gap as well as accounts for closeness to the boundary via a barrierfunction. The algorithms diﬀer from path-following IPMs in that they have the potential to notstrictly follow the path closely but only trace it loosely, which is also experimentally observed.Usually, the step taken is a scaled gradient step/Newton step on the potential function. Providedthat we can guarantee suﬃcient decrease of the potential function and relate the potential functionto closeness to optimality, we can show convergence. We refer to [Ans96, Tod96, NN94] for excellentintroductions to potential reduction IPMs.We will however use a diﬀerent step; instead of a Newton step, we consider taking the step,subject to augmenting a certain amount of ﬂow in each iteration, which maximizes the decreasein the potential function after taking the step. We then show that this optimization problem canbe eﬃciently solved in e O ( m ) time using electrical ﬂow computations. While we can show that thepotential function decreases by a large amount which guarantees that we can solve the max ﬂowproblem in O ( √ m ) iterations, we forego writing it in this manner as we are unable to argue such astatement when the weights and hence the potential function is also changed. Instead, we stick tokeeping track of the centrality of our ﬂow vector while making suﬃcient progress. Crucially however,the amount of progress made by our algorithm only depends on bounds on the ℓ ∞ of the congestionvector of the update step rather than the traditional ℓ or ℓ norm bounds in [Mad16, LS20b]. Inorder to improve the iteration complexity by obtaining stronger bounds on the ℓ ∞ norm of thecongestion vector, we show that like in Liu-Sidford [LS20b], we can change weights on the barrierterm for each edge. Instead of using energy as a potential function to be maximized, inspired byoracles designed for multiplicative weights algorithms, we use the change in the potential functionitself as the quantity to be maximized subject to a ℓ budget constraint on the change in weights.While we are unaware of how to maximize the ℓ constrained problem, we relax it to an ℓ q constrainedproblem, which we solve using a mixed ℓ - ℓ p norm ﬂow problem using the work of [KPSW19, AS20].Combining this with an application of Hölder’s inequality gives us suﬃciently good control on the ℓ -norm of the weight change while ensuring that our step has signiﬁcantly better ℓ ∞ norm boundson the congestion vector. We believe our potential reduction framework as well as the concept ofchanging weights based on the update step might be useful in designing faster algorithms for maxﬂow beyond our m / running time. Throughout this paper, we will view graphs as having both forward and backward capacities. Specif-ically, we will denote by G = ( V, E, u ) , a directed graph with vertex set V of size n , an edge set E of size m , and two non-negative capacities u − e and u + e for each edge e ∈ E . For the purpose of thispaper, all edge capacities are bounded by U = 1 . Each edge e = ( u, v ) has a head vertex u and atail vertex v . For a vector v ∈ R m , we deﬁne k v k p = ( m P i =1 | v i | p ) /p and k v k ∞ = m max i =1 | v i | and refer toDiag ( v ) ∈ R m × m as the diagonal matrix with the i th diagonal entry equal to v i . Maximum Flow Problem

Given a graph G , we call any assignment of real values to the edgesof E , i.e., f ∈ R m , a ﬂow. For a ﬂow vector f , we view f e as the amount of the ﬂow on edge e andif this value is negative, we interpret it as having a ﬂow of | f e | ﬂowing in the direction opposite tothe edge’s orientation. We say that a ﬂow f is an σ -ﬂow, for some demands σ ∈ R n iﬀ it satisﬁes ﬂow conservation constraints with respect to those demands. That is, we have X e ∈ E + ( v ) f e − X e ∈ E − ( v ) f e = σ v for every vertex v ∈ V E + ( v ) and E − ( v ) is the set of edges of G that are entering and leaving vertex v respectively.We will require P v ∈ V σ v = 0 .Furthermore, we say that a σ -ﬂow f is feasible in G iﬀ f satisﬁes the capacity constraints − u − e ≤ f e ≤ u + e for each edge e ∈ E One type of ﬂows that will be of interest to us are s − t ﬂows, where s (the source ) and t (the sink ) are two distinguishing vertices of G. Formally, an s − t ﬂow is a σ -ﬂow whose demand vector σ = F χ s,t , where F is the value of the ﬂow and χ s,t is a vector with − and +1 at the coordinatescorresponding to s and t respectively and zero elsewhere.Now, the maximum ﬂow problem corresponds to the problem in which we are given a directedgraph G = ( V, E, u ) with integer capacities as well as a source vertex s and a sink vertex t and wantto ﬁnd a feasible s - t ﬂow of maximum value. We will denote this maximum value F ∗ Residual Graphs

A fundamental object in many maximum ﬂow algorithms is the notion of aresidual graph. Given a graph G and a feasible ﬂow σ -ﬂow f in that graph, we deﬁne the residualgraph G f as a graph G = ( V, E, ˆ u ( f )) over the same vertex and edge set as G and such that, foreach edge e = ( u, v ) , it’s forward and backward residual capacities are deﬁned as ˆ u + e ( f ) = u + e − f e and ˆ u − e ( f ) = u − e + f e We will also denote ˆ u e ( f ) = min { ˆ u + e ( f ) , ˆ u − e ( f ) } . When the value of f is clear from context, we willomit writing it explicitly. Observe that the feasibility of f implies that all residual capacities arealways non-negative. Electrical Flows and Laplacian Systems

Let G be a graph and let r ∈ R m ++ be a vector ofedge resistances, where the resistance of edge e is denoted by r e . For a ﬂow f ∈ R E on G , we deﬁnethe energy of f to be E r ( f ) = f ⊤ Rf = P e ∈ E r e f e where R = Diag ( f ) . For a demand χ , we deﬁne theelectrical χ -ﬂow f r to be the χ -ﬂow which minimizes energy f r = arg min B ⊤ f = χ E r ( f ) , where B ∈ R m × n is the edge-vertex incidence matrix. This ﬂow is unique as the energy is a strcitly convex function.The Laplacian of a graph G with resistances r is deﬁned as L = B ⊤ R − B . The electrical χ ﬂowis given by the formula f r = R − BL † χ . We also deﬁne electrical potentials as φ = L † χ There isa long line of work starting from Spielman and Teng which shows how to solve Lφ = χ in nearlylinear time [ST14, KMP14, KOSA13, PS14, CKM +

14, KS16, KLP + p-Norm Flows As mentioned above, a line of work [BCLL18, AKPS19, KPSW19] shows howto solve more general p -norm ﬂow problems. Precisely, given a "gradient" vector g ∈ R E , resistances r ∈ R E + and a demand vector χ , the problem under consideration is OP T = min B ⊤ f = χ X e ∈ E g e f e + r e f e + | f e | p [KPSW19] call such a problem as a mixed ℓ - ℓ p -norm ﬂow problem and denote the expression insidethe min as val ( f ) . The main result of the paper is Theorem 2.1 (Theorem 1.1 in [KPSW19]) . For any even p ∈ [ ω (1) , o (log / − o (1) n )] and an initialsolution f (0) such that all parameters are bounded by poly(log( n )) , we can compute a ﬂow e f satisfyingthe demands χ such that val ( e f ) − OP T ≤ O (poly(log m )) ( val ( f (0) ) − OP T ) + 12 O (poly(log m )) in O ( p / ) m O (1 / √ p ) time.

4e remark that strictly speaking the theorem in [KPSW19] states the error to be polynomialbut [LS20b] observe that their proof actually implies quasi-polynomial error as stated above. Whileour subproblems that we need to solve to change weights cannot be exactly put into this form,we show that mild modiﬁcations to their techniques can be done to then use their algorithm as ablack-box. Hence, we elaborate on their approach below.One main thing to establish in their paper is how the p -norm changes when we move from f to f + δ . Lemma 2.2 (Lemma in [KPSW19]) . We have for any f ∈ R E and δ ∈ R E that f pi + pf p − i δ i + 2 − O ( p ) h p ( f p − i , δ i ) ≤ ( f i + δ i ) p ≤ f pi + pf p − i δ i + 2 O ( p ) h p ( f p − i , δ i ) where h p ( x, δ ) = xδ + δ p Hence, given an initial solution, it suﬃces to solve the residual problem of the form min B ⊤ f =0 g ( f ) ⊤ δ + X e ∈ E h p ( f p − i , δ i ) where g ( f ) i = pf p − i . Next, they notice that bounding the condition number with respect to thefunction h p ( · , · ) actually suﬃces to get linear convergence and hence tolerate quasi-polynomially lowerrors. The rest of the paper goes into designing good preconditioners which allow them to solvethe above subproblem quickly.We will also need some basics about min-max saddle point problems [BNO03]. Given a function f ( x, y ) such that dom ( f, x ) = X and dom ( f, y ) = Y . The problem we will be interested in is of theform min x ∈X max y ∈Y f ( x, y ) Deﬁne the functions f y ( y ) = min x ∈X f ( x, y ) and f x ( x ) = max y ∈Y f ( x, y ) for every ﬁxed y ∈ Y . We havethe following theorem from Section 2.6 in [BNO03] Theorem 2.3. If f ( x, y ) is convex in x and concave in y and let X , Y be convex and closed. Then f x is a convex function and f y is a concave function. √ m Iteration Algorithm

In this section, we ﬁrst set up our IPM framework and show how to recover the √ m iterations boundfor max ﬂow. In the next section, we will then change the weights to obtain our improved runtime.Our framework is largely inspired by [Mad16] and [LS20b] and indeed a lot of the arguments canbe reused with some modiﬁcations. For every edge e = ( u, v ) , we consider assigning two non-negative weights for the forward andbackward edges w + e and w − e . Based on the weights and the edge capacities, for any feasible ﬂow,we deﬁne a barrier functional φ w ( f ) = − X e ∈ E w + e log( u + e − f e ) + w − e log( u − e + f e ) B ⊤ f = F χ and the proximity of the point to the constraints measured through thebarrier φ w ( f ) , known as centrality. Previous IPMs taking a Newton step with respect to the barrierwith a size which ensures that we increase the value of the ﬂow F by a certain amount. Due tothe fact that a Newton step is the minimization of a second order optimization problem, it can beshown that the step can be computed via electrical ﬂow computations. Typically, taking a Newtonstep can be decomposed into progress and centering steps where one ﬁrst takes a progress stepwhich increases the ﬂow value which causes us to lose centrality by some amount. Then one takesa centering step which improves the centrality without increasing the ﬂow value. Depending onthe amount of progress we can make in each iteration such that we can still recenter determinesthe number of iterations our algorithm will take. [Mad16, LS20b] follow this prototype and looselyspeaking the amount of ﬂow value we can increase in each iteration for the progress step dependson the ℓ ∞ norm of the congestion vector, which measures how much ﬂow we can add before wesaturate an edge. However, the bottleneck ends up being the centering step which requires that theﬂow value can only be increased by an amount depending on the ℓ norm of the congestion vectorwhich is a stronger condition than ℓ ∞ norm.[Mad13, Mad16] notes that when the ℓ ∞ and ℓ norms of the congestion vector are large thenincreasing the resistances of the congested edges increases the energy of the resulting electricalﬂow. So he repeatedly increases the weights of the congested edges (called boosting) until thecongested vector has suﬃciently small norm. By using electrical energy of the resulting step as aglobal potential function and analyzing how it evolves over the progress, centering and boostingsteps, they can control the amount of weight change and number of boosting steps necessary toreduce the norm of the congestion vector. Carefully trading these quantities yields their runtimeof e O ( m / ) . To improve on this, Liu and Sidford [LS20b] consider the problem of ﬁnding a setof weight increases which maximize the energy of the resulting ﬂow. As we need to ensure thatthe weights don’t increase by too much, they place a budget constraint on the weight vector. Byshowing that a small amount of weight change suﬃces to obtain good bounds on the congestionvector. Fortunately, this optimization problem ends up being eﬃciently solvable in almost lineartime by using the mixed ℓ - ℓ p norm ﬂow problem of [KPSW19]. However, this step still essentiallyrequires ℓ -norm bounds to ensure centering is possible.In this paper, we will consider taking steps with respect to a potential function. The potentialfunction Φ w comes from potential reduction IPM schemes and trades oﬀ the duality gap with thebarrier. Φ w ( f, s ) = m log (cid:18) f ⊤ sm (cid:19) + φ w ( f ) For self-concordant barriers like weighted log barriers are, the negative gradient −∇ φ w ( f ) is feasiblefor the dual [Ren01] and so for any f ′ feasible for the primal, we have f ′⊤ ( −∇ φ w ( f )) ≥ . We willconsider dual "potential" variables y ∈ R V . Now, like in [Mad16, LS20b], we consider a centralitycondition y v − y u = w + e u + e − f e − w − e u − e + f e for all e = ( u, v ) (3.1)If ( f, y, w ) satsify the above condition, we call it well-coupled . Also, given a tuple ( f, y, w ) and acandidate step ˆ f , deﬁne the forward and backward congestion vectors ρ + , ρ − ∈ R E as ρ + e = | ˆ f e | u + e − f e and ρ − e = | ˆ f e | u − e + f e for all e ∈ E (3.2)6e can now assume via binary search that we know the optimal ﬂow value F ∗ [Mad16]. [Mad16,LS20b] consider preconditioning the graph which allows them to ensure that for a well-coupled pointwe can ensure suﬃcient progress. The preconditioning strategy to ensure this is to add m extra(undirected) edges between s and t of capacity U each. So the max ﬂow value increases at mostby mU . The following lemma can be seen from the proof of Lemma 4.5 in [LS20b] Theorem 3.1.

Let ( f, y, w ) be a well-coupled point for ﬂow value F in a preconditioned graph G .Then we have for every preconditioned edge e that ˆ u e ( f ) = min { u + e − f e , u − e + f e } ≥ F ∗ − F k w k . Inparticular, if k w k ≤ m , then we have ˆ u e ( f ) ≥ F ∗ − F m . If we also have F ∗ − F ≥ m / − η , then ˆ u e ( f ) ≥ m − (1 / η ) / Now that our setup is complete, we can focus on the step that we will be taking. In this section,we will keep the weights all ﬁxed to 1, i.e., w + e = w − e = 1 for all e ∈ E . Hence k w k = 2 m . Considerthe change in the potential function when we move from f to f + ˆ f while keeping the dual variable −∇ φ w ( f ) = By ﬁxed. This change is m log − ( f + ˆ f ) ⊤ ∇ φ w ( f ) m ! − m log (cid:18) − f ⊤ ∇ φ w ( f ) m (cid:19) + φ w ( f + ˆ f ) − φ w ( f ) We are interested in minimizing this quantity which corresponds to maximizing the decrease in thepotential function value while guaranteeing that we send say δ more units of ﬂow ˆ f . Hence theproblem is arg min B ⊤ ˆ f = δχ m log − ( f + ˆ f ) ⊤ ∇ φ w ( f ) m ! + φ w ( f + ˆ f ) Unfortunately, this problem is not convex as the duality gap term is concave in ˆ f . However, weinstead can minimize an upper bound to this term which is convex: arg min B ⊤ ˆ f = δχ φ w ( f + ˆ f ) − ( f + ˆ f ) ⊤ ∇ φ w ( f )= arg min B ⊤ ˆ f = δχ − X e ∈ E w + e log − ˆ f e u + e − f e ! + w − e log f e u − e + f e ! − ˆ f e (cid:18) w + e u + e − f e − w − e u − e + f e (cid:19) as log(1 + x ) ≤ x for non-negative x which holds from duality as mentioned above. We will referto the value of the problem in the last line as the potential decrement and will henceforth denotethe function inside the minimization as ∆Φ w ( f, ˆ f ) . It is instructive to ﬁrst see how the couplingcondition changes if we were to take the optimal step of the above problem, while remaining feasible.To calculate this, from the optimality conditions of the above program, we can say that there existsa ˆ y such that for all e = ( u, v )ˆ y v − ˆ y u = (cid:18) w + e u + e − f e − ˆ f e − w − e u − e + f e + ˆ f e (cid:19) − (cid:18) w + e u + e − f e − w − e u − e + f e (cid:19) = (cid:18) w + e u + e − f e − ˆ f e − w − e u − e + f e + ˆ f e (cid:19) − ( y v − y u ) Hence, if we update y to y + ˆ y and f to f + ˆ f , we get a ﬂow of value F + δ such that the couplingcondition with respect to the new y and f still hold.7ence, we can now focus on actually computing the step and showing what δ we can take toensure that we still satisfy feasibility, i.e., bounds on the ℓ ∞ norm of the congestion vector. Thefunction we are trying to minimize comprises of a self-concordant barrier term and a linear term.Unfortunately, we cannot control the condition number of such a function to optimize it in eﬃcientlyover the entire space as this is arguably as hard as the original problem itself. However, due toself-concordance, the function behaves smoothly enough (good condition number) in a box aroundthe origin but that seemingly doesn’t help us solve the problem over the entire space. Fortunately,a ﬁx for this was already found in [BCLL18]. In particular they (smoothly) extend the functionquadratically outside a box to ensure that the (global) smoothness and strong convexity propertiesinside the box carries over to that outside the box as well while still arguing that the minimizeris the same provided the minimizer of the original problem was inside the box. Speciﬁcally, thefollowing lemma can be inferred from Section 2.2 of [BCLL18]. Lemma 3.2.

Given a function f ( x ) which is L -smooth and µ -strongly convex inside an interval [ − ℓ, ℓ ] . Then, we deﬁne the quadratic extension of f , deﬁned as f ℓ ( x ) =  f ( x ) , for − ℓ ≤ x ≤ ℓf ( − ℓ ) + f ′ ( − ℓ )( x + ℓ ) + f ′′ ( − ℓ )( x + ℓ ) , for x < − ℓf ( ℓ ) + f ′ ( ℓ )( x − ℓ ) + f ′′ ( ℓ )( x − ℓ ) , for x > ℓ  The function f ℓ is C , L -smooth and µ -strongly convex. Furthermore, for any convex function ψ ( x ) provided x ∗ = arg min x ∈X ψ ( x ) + n P i =1 f ( x i ) lies inside n Q i =1 [ − ℓ i , ℓ i ] , then arg min x ∈X ψ ( x ) + n P i =1 f ℓ i ( x i ) = x ∗ Hence, it suﬃces to consider a δ small enough such that the minimizer is the same as for theoriginal problem and we can focus on minimizing this quadratic extension of the function. Forminimization, we can use Accelerated Gradient Descent or Newton’s method. Theorem 3.3 ([Nes04]) . Given a convex function f which satisﬁes D (cid:22) ∇ f ( x ) (cid:22) κD ∀ x ∈ R n with some given ﬁxed diagonal matrix D and some ﬁxed κ . Given an initial point x and an errorparameter < ε < / , the accelerated gradient descent (AGD) outputs x such that f ( x ) − min x f ( x ) ≤ ε ( f ( x ) − min x f ( x )) in O ( √ κ log( κ/ε )) iterations. Each iteration involves computing ∇ f at some point x and projectingthe function onto the subspace deﬁned by the constraints and some linear-time calculations. Notice that the Hessian of the function in the potential decrement problem is a diagonal matrixwith the e th entry being w + e ( u + e − f e − ˆ f e ) + w − e ( u − e + f e + ˆ f e ) So provided ρ + e , ρ − e are less than some small constant, the condition number κ of the Hessian isconstant with respect to the diagonal matrix which is ∇ φ w ( f ) and hence we can use Theorem 3.3to solve it in e O (1) to quasi-polynomially good error. Furthermore notice that the algorithm is justcomputing a gradient and then doing projection and so can be computing using a Laplacian linearsystem solve and hence runs in nearly linear time. Furthermore, quasi-polynomially small error willsuﬃce for our purposes [Mad13, Mad16, LS20b].Now, we just need to ensure that we can control the ℓ ∞ -norm of the congestion vector, as thatcontrols how much ﬂow we can still send without violating constraints. Note further, that we need8o set ℓ while solving the quadratic extension of the potential decrement problem so that it’s greaterthan the ℓ ∞ norm that we can guarantee. We will want both of these to be some constants.As mentioned above, the point of preconditoning the graph is to ensure that the preconditionededges themselves can facilitiate suﬃcient progress. To bound the congestion, we show an analog ofLemma 3.9 in [Mad16]. Lemma 3.4.

Let ( f, y, w ) be a well-coupled solution with value F and let δ = F ∗ − F √ m . Let ˆ f be thesolution to the potential decrement problem. Then we have, ρ + e , ρ − e ≤ . for all edges e .Proof. Consider a ﬂow f ′ which sends δm units of ﬂow on each of the m/ preconditioned edges.Certainly the potential decrement ﬂow ˆ f will have smaller potential decrement than that of f ′ whichis ∆Φ w ( f, f ′ ) = − X e ∈ E w + e log (cid:18) − f ′ e u + e − f e (cid:19) + w − e log (cid:18) f ′ e u + e − f e (cid:19) − f ′ e (cid:18) w + e u + e − f e − w − e u − e + f e (cid:19) ≤ X e ∈ E w + e (cid:18) f ′ e ˆ u + e ( f ) (cid:19) + w − e (cid:18) f ′ e ˆ u − e ( f ) (cid:19) ≤ k w k (cid:18) δF ∗ − F (cid:19) < . k w k m ≤ . where the second inequality follows from − log(1 − x ) ≤ x + x and − log(1 + x ) ≤ − x + x for non-negative x and the third inequality follows from plugging in the value of the ﬂow on thepreconditioned edges and using Lemma 3.1. Finally we use k w k = 2 m . Now it suﬃces to provea lower bound on the potential decrement in terms of the congestion vector. For this, we start byconsidering the inner product of ˆ f with the gradient of the ∆Φ w ( f, ˆ f ) X e ∈ E ˆ f e (cid:18) w + e u + e − f e − ˆ f e − w − e u − e + f e + ˆ f e − w + e u + e − f e + w − e u − e + f e (cid:19) = X e ∈ E w + e ˆ f e (ˆ u + e − ˆ f e )ˆ u + e + w − e ˆ f e (ˆ u − e + ˆ f e )ˆ u − e ! ≤ X e ∈ E . w + e ˆ f e (ˆ u + e ) + w − e ˆ f e (ˆ u − e ) ! ≤ . X e ∈ E w + e ˆ f e (ˆ u + e ) + w − e ˆ f e (ˆ u − e ) ! ≤ . X e ∈ E − w + e log (cid:18) − f ′ e ˆ u + e (cid:19) − w − e log (cid:18) f ′ e ˆ u + e (cid:19) − f ′ e (cid:18) w + e ˆ u + e − w − e ˆ u − e (cid:19) = 2 . w ( f, ˆ f ) ≤ . where the second-to-last inequality follows from x + x / ≤ − log(1 − x ) and − x + x / ≤ − log(1+ x ) .Strictly speaking, the ﬁrst inequality only holds for ˆ f e ≤ ˆ u e ( f ) / . However, instead of consideringthe inner product of ˆ f with the gradient of ∆Φ w ( f, ˆ f ) , we will instead consider it’s quadraticextension with ℓ e = ˆ u e ( f ) / for each edge e . It is easy to see that if ˆ f is outside the box, then9lso the desired inequality still holds (by computing the value the quadratic extension takes on f ′ in the cases outside the box). To ﬁnish the proof, X e ∈ E ˆ f e (cid:18) w + e u + e − f e − ˆ f e − w − e u − e + f e + ˆ f e − w + e u + e − f e + w − e u − e + f e (cid:19) = X e ∈ E w + e ˆ f e (ˆ u + e − ˆ f e )ˆ u + e + w − e ˆ f e (ˆ u − e + ˆ f e )ˆ u − e ! ≥ / X e ∈ E w + e ˆ f e (ˆ u + e ) + w − e ˆ f e (ˆ u − e ) ! ≥ . k ρ k ∞ Hence, combining the above, we get that k ρ k ∞ ≤ . Notice that since k ρ k ∞ < . , the minimizer of the quadratic smoothened function is the same asthe function without smoothing and hence the new step is well-coupled as per the argument above.Hence, in every iteration, we decrease the amount of ﬂow that we could send multiplicatively bya factor of − / √ m and hence in √ m iterations we will get to a suﬃciently small amount ofremaining ﬂow that we can round using one iteration of augmenting paths. This completes our √ m iteration algorithm. m / o (1) U / Time Algorithm

In this section, we show how to change weights to improve the number of iterations in our algorithm.We will follow the framework of Liu and Sidford [LS20b] of ﬁnding a set of weights to add under anorm constraint such that the step one would take with respect to the new set of weights maximizesa potential function. In their case, since the step they are taking is an electrical ﬂow, the potentialfunction considered is the energy of such a ﬂow. As our step is diﬀerent, we will instead takethe potential decrement as the potential function with respect to the new set of weights. Perhapssuprisingly however, we can make almost all their arguments go through with minor modiﬁcations.Let the initial weights be w and say we would like to add a set of weights w ′ . Then we are interestedin maximizing the potential decrement with respect to the new set of weights. This can be seenas similar to designing oracles for multiplicative weight algorithms for two-player games where aplayer plays a move to penalize the other player the most given their current move. Our algorithmﬁrst ﬁnds a ﬁnds a new set of weights and then takes the potential decrement step with respect tothe new weights. Finally, for better control of the congestion vector, we show that one can decreasesome of the weight increase like in [LS20b]. We ﬁrst focus on the problem of ﬁnding the new setof weights. We are going to introduce a set r ′ ∈ R E ++ of "resistances" and will optimize theseresistances and then obtain the weights from them. Let w be the current set of weights and w ′ bethe set of desired changes. Without loss of generality, assume that ˆ u e ( f ) = ˆ u + e ( f ) and now given aresistance vector r ′ , we deﬁne the weight changes as ( w + e ) ′ = r ′ e (ˆ u + e ( f )) and ( w − e ) ′ = ( w + e ) ′ ˆ u − e ( f )ˆ u + e ( f ) This is the same set of weight changes done in [LS20b] in the context of energy maximization. Thisset of weights ensures that our point ( f, y, w ) is well-coupled with respect to w + w ′ as well, i.e., ( w + e ) ′ ˆ u + e ( f ) = ( w − e ) ′ ˆ u − e ( f ) g ( W ) = max r ′ > , k r ′ k ≤ W min B ⊤ ˆ f = δχ ∆Φ w + w ′ ( f, ˆ f ) Here w ′ is based on r ′ in the form written above. While this is the optimization problem we wouldlike to solve, we are unable to do so due to the ℓ norm constraint on the resistances. We willhowever be able to solve a relaxed q -norm version of the problem. g q ( W ) = max r ′ > , k r ′ k q ≤ W min B ⊤ ˆ f = δχ ∆Φ w + w ′ ( f, ˆ f )= max r ′ > , k r ′ k q ≤ W min B ⊤ ˆ f = δχ ∆Φ w ( f, ˆ f ) − X e ∈ E ( w + e ) ′ log (cid:18) − f ′ e u + − f e (cid:19) + ( w − e ) ′ log (cid:18) f ′ e u + − f e (cid:19) − f ′ e (cid:18) ( w + e ) ′ u + e − f e − ( w − e ) ′ u − e + f e (cid:19) Notice that this is a linear (and hence concave) function in w ′ and hence in r ′ and is closed andconvex in ˆ f and the constraints are convex as they are only linear and norm ball constraints. Hence,using Theorem 2.3, we can say that min B ⊤ ˆ f = δχ ∆Φ w ( f, ˆ f ) − X e ∈ E ( w + e ) ′ log (cid:18) − f ′ e u + − f e (cid:19) +( w − e ) ′ log (cid:18) f ′ e u + − f e (cid:19) − f ′ e (cid:18) ( w + e ) ′ u + e − f e − ( w − e ) ′ u − e + f e (cid:19) is concave in r ′ and max r ′ > , k r ′ k q ≤ W ∆Φ w ( f, ˆ f ) − X e ∈ E ( w + e ) ′ log (cid:18) − f ′ e u + − f e (cid:19) +( w − e ) ′ log (cid:18) f ′ e u + − f e (cid:19) − f ′ e (cid:18) ( w + e ) ′ u + e − f e − ( w − e ) ′ u − e + f e (cid:19) is convex in ˆ f . Now, as in [LS20b], we use Sion’s minimax lemma to get min B ⊤ ˆ f = δχ ∆Φ w ( f, ˆ f ) + max r ′ > , k r ′ k q ≤ W − X e ∈ E ( w + e ) ′ log − ˆ f e u + − f e ! + ( w − e ) ′ log f e u + − f e ! − ˆ f e (cid:18) ( w + e ) ′ u + e − f e − ( w − e ) ′ u − e + f e (cid:19) min B ⊤ ˆ f = δχ ∆Φ w ( f, ˆ f ) + W "X e ∈ E g e ( ˆ f ) p /p (4.1)where g e ( ˆ f ) = (ˆ u + e ( f )) log (cid:16) − ˆ f e ˆ u + (cid:17) + ˆ u + e ( f )ˆ u − e ( f ) log (cid:16) ˆ f e ˆ u − e ( f ) (cid:17) and we plugged in the value of w ′ in terms of r ′ and used that max k x k q ≤ W y ⊤ x = W k y k p with /p + 1 /q = 1 . As mentioned above, thefunction inside the minimization problem is convex. Furthermore, from the proof of Theorem 2.3, itcan be inferred that any smoothness and strong convexity properties that the function inside the min-max had carries over on the function after the maximization. Hence as in Section 3, we will considerthe quadratic extension of the function (as a function of f for the function inside the min-max with ℓ e = ˆ u e ( f ) / . This is just the quadratic extension of ∆Φ w ( f, ˆ f ) and the quadratic extension of g e ( f ) . Now, the strategy will be to consider adding ﬂow using this step while the remaining ﬂow tobe routed F ∗ − F ≥ m / − η . After which, running m / − η iterations of augmenting paths gets us tothe optimal solution. We will need to ensure that that throughout the course of the algorithm the ℓ norm of the weights doesnt get too large. For doing that, we will ﬁrst compute the weight changesand then do a weight reduction procedure [LS20b] in order to always ensure that k w k ≤ m .We will take η = 1 / − o (1) − log m ( U ) and W = m η . Provided we can ensure that the k w k ≤ m throughout the course of the algorithm, that the ℓ ∞ of the congestion vector is alwaysbounded by a constant and that we can solve the resulting step in almost-linear time, we will obtainan algorithm which runs in time m / o (1) U / time.11 heorem 4.1. There exists an algorithm for solving s − t maximum ﬂow in directed graphs in time m / o (1) U / time. To summarize, our algorithm starts oﬀ with ( f, y ) = (0 , and w + e = w − e = 1 for all edges e . Then in each iteration, starting with a well-coupled ( f, y, w ) with ﬂow value F and δ = ( F ∗ − F ) /m / − η and W = m η we then solve Equation 4.1 (which is the potential decrement problemwith the new weights) problem to obtain ˆ f which will be the step we will take (and has ﬂow value F + δ and then all that remains is to actually ﬁnd the update weights w ′ which will have a closedform expression in terms of ˆ f and then we perform a weight reduction step to obtain the new w ′ which ensures that we still remain well-coupled for ˆ f and repeat while F ∗ − F ≥ m / − η . Finally,we round the remaining ﬂow using m / − η iterations of augmenting paths. We ﬁrst state the lemmathe proof of which is similar to Lemma 3.4 Lemma 4.2.

Let ( f, y, w ) be a well-coupled solution with value F and let δ = F ∗ − F m / − η . Let ˆ f bethe solution to the potential decrement problem considered in Equation 4.1. Then, we have for alledges e that ρ + e , ρ − e ≤ . and | ˆ f e | ≤ m − η We will prove this lemma in the Appendix A. Next notice that ( f, y ) are still a well-coupledsolution with respect to the new weights w + w ′ as the weights were chosen to ensure that thecoupling condition is unchanged. Lemma 4.3.

Our new weights, after weight reduction, satisfy k w ′ k ≤ m η + o (1) U ≤ m/ and ( f + ˆ f , y + ˆ y ) is well-coupled with respect to w + w ′ Proof.

Using optimality conditions of the program in Equation 4.1, we see that there exists a ˆ y suchthat ˆ y v − ˆ y u = ˆ f e w + e (ˆ u + e − ˆ f e )ˆ u + e − w − e (ˆ u − e + ˆ f e )ˆ u − e ! + W ˆ f e g p − e k g k p − p (cid:18) ˆ u + e ˆ u + e − ˆ f e − ˆ u − e ˆ u − e + ˆ f e (cid:19) where g ∈ R E is the vector formed by taking g e ( ˆ f ) for the e th coordinate. We will take ( r e ) ′ = W g p − e k g k p − p and ( w + e ) ′ = W g p − e k g k p − p (ˆ u + e ) and ( w − e ) ′ = W g p − e k g k p − p (ˆ u + e ˆ u − e ) which satisﬁes the well-coupling condition we want to ensure. Also notice that k r k q = W so wesatisfy the norm ball condition as well. Now, we need to upper bound the ℓ norm of w ′ . We willtake p = √ log m k w ′ k ≤ m /p k w ′ k q ≤ m o (1) X e ∈ E ( w + e ) ′ + ( w − e ) ′ ! /q ≤ m o (1) W U = O ( m η + o (1) U ) as ˆ u + e , ˆ u − e ≤ U . Plugging in the value of η , we get that this is less than m/ . Now, we will performweight reductions to obtain a new set of weights w ′′ such that they still ensure the coupling conditiondoesnt change and we can establish better control on the weights. The weight reduction is procedureis the same as that in [LS20b] where we ﬁnd the smallest non-negative w ′′ such that for all edges ( w + e ) ′ ˆ u + e − ˆ f e − ( w − e ) ′ ˆ u − e + ˆ f e = ( w + e ) ′′ ˆ u + e − ˆ f e − ( w − e ) ′′ ˆ u − e + ˆ f e ( w + e ) ′ ˆ u + e = ( w − e ) ′ ˆ u − e and ˆ u + e − ˆ f e ˆ u − e + ˆ f e = (1 ± O (max { ρ + e , ρ − e } ) ˆ u + e ˆ u − e Hence, it follows that ( w + e ) ′′ + ( w − e ) ′′ ≤ O (max { ρ + e , ρ − e } )(( w + e ) ′ + ( w − e ) ′ ) As | ˆ f e | ≤ m − η from Lemma 4.2, we get k w ′′ k ≤ m − η X e ∈ E W g p − e k g k p − p (ˆ u + e + ˆ u − e ) ≤ O ( m η + o (1) U ) ≤ m/ As before, while this argument is done for the non-quadratically extended function while we areoptimizing the quadartically extended function, as our ρ + e , ρ − e ≤ . , the minimizers are the sameand hence the above argument works.Now, provided that we can show how to solve Equation 4.1 in almost-linear time, we are done.This is because we run the algorithm for m / − η iterations and the ℓ norm of the weights increasesby at most m η + o (1) U in each iteration. Hence the ﬁnal weights are k w k ≤ m + m / η + o (1) U ≤ m/ . So we can use Lemma 3.1 throughout the course of our algorithm. Also, as mentionedabove, notice that the ﬂow ˆ f that we augment in every iteration is just the solution to the potentialdecrement problem with the new weights. Hence, from the argument in Section 3, we alwaysmaintain the well-coupled condition.To show that we can solve the problem in Equation 4.1, we will appeal to the work of [KPSW19].As mentioned above, their work establishes Lemma 2.2 and then shows that for any function whichcan be sandwiched in that form plus a quadratic term which is the same on both sides, one canjust minimize the resulting upper bound to get a solution to the optimization problem with quasi-polynomially low error. Hence, we will focus on showing that the objective function in our problemcan also be sandwiched into terms of this form after which appealing to their algorithm, we will geta high accuracy solution to our problem in almost linear time. The ﬁrst issue that arises is thatsrictly speaking, their algorithm only works for minimizing objectives of the form OP T = min B ⊤ f = χ X e ∈ E g e f e + r e f e + | f e | p whereas for our objective, the p -norm part is not raised to the power p but is just the p -normitself. The solution for this however was already given in Liu-Sidford [LS20b] where they show(Lemma B.3 in their paper) that for suﬃciently nice functions minimizing problems of the form min f ( x ) + h ( g ( x )) can be obtained to high accuracy if we can obtain minimizers to functions of theform f ( x ) + g ( x ) . The conditions they require on the functions are also satisﬁed for our functionsand is a straightforward calculation following the proof in their paper [LS20b]. Hence, we can focuson just showing how to solve the following problem OP T = min B ⊤ ˆ f = χ X e ∈ E − w + e log . − ˆ f e ˆ u + e ! + w − e log . f e ˆ u − e ! + ˆ f e (cid:18) w + e ˆ u + e − w + e ˆ u − e (cid:19)! + ( g e ) . ( ˆ f ) p . denote that we are solving the quadratically smoothened func-tion with the box size being ˆ u e ( f ) / for each e and g e ( ˆ f ) = (ˆ u + e ( f )) log (cid:16) − ˆ f e ˆ u + (cid:17) +ˆ u + e ( f )ˆ u − e ( f ) log (cid:16) ˆ f e ˆ u − e ( f ) (cid:17) Call the term in the sum for a given edge e as val e ( ˆ f ) and the overallobjective function is val ( ˆ f ) . In particular, we consider for a single edge and prove the followinglemma Lemma 4.4.

We have the following for any feasible f and δ ≥ val e ( f ) + δ∂ f val e ( f ) + (9 / δ (cid:18) w + e (ˆ u + e − f ) + w − e (ˆ u − e + f ) (cid:19) + 2 − O ( p ) ( f p − e δ + δ p ) ≤ val e ( f + δ ) and val e ( f + δ ) ≤ val e ( f ) + δ∂ f val e ( f ) + (11 / δ (cid:18) w + e (ˆ u + e − f ) + w − e (ˆ u − e + f ) (cid:19) + 2 O ( p ) ( f p − e δ + δ p ) where ∂ x denotes the derivative of a function with respect to x . We prove this lemma in Appendix A. Let r e = (cid:16) w + e (ˆ u + e − f ) + w − e (ˆ u − e + f ) (cid:17) Lemma 4.5.

Now, given an initial point f such that B ⊤ f = χ and an almost linear time solverfor the following problem min B ⊤ δ =0 X e ∈ E δ e α e + (11 / O ( p ) (( r e + f p − e ) δ + δ p ) where the α e vector is the gradient of val at a given point f , we can obtain an ˆ f in e O p (1) calls tothe solver such that val ( ˆ f ) ≤ OP T + 1 / poly log m The proof is similar to the proof of the iteration complexity of gradient descent for smooth andstrongly convex function and it follows from [LFN18, KPSW19]. Note that since [KPSW19] give analmost linear time solver for exactly the subproblem in the above lemma provided the resistancesare quasipolynomially bounded, we are done. This is because Section D.1 in [LS20b] already provesthat the resistances are quasipolynomially bounded.

In this paper, we showed how to use steps inspired by potential reduction IPMs to solve maxﬂow in directed graphs in O ( m / o (1) U / ) time. We believe our framework for taking the stepcorresponding to the maximum decrease of the potential function may be useful for other problemsincluding ℓ p norm minimization. In particular, can one set up a homotopy path for which steps aretaken according to a potential function. Presumably if this can be done, this might also oﬀer hintsfor how to use ideas corresponding to diﬀerent homotopy paths induced by other potential functions(rather than the central path we consider) to solve max ﬂow faster. Finally, there is no reason tobelieve that the procedure for selecting weight changes corresponding to the potential decrementbeing maximized to be the best way to change weights. This may lead to a faster algorithm as wellif one can ﬁnd another strategy which establishes tighter control on weight changes. A questionalong the way to such a strategy might be to understand how the potential decrement optimumchanges as we change weights/resistances. Such an analog for change in energy of electrical ﬂow aswe change resistances is used in [CKM +

11, Mad16, LS20b]. Another open problem that remains isobtaining faster algorithms for max ﬂow on weighted graphs with logarithmic dependence on U asopposed to the polynomial dependence in this paper.14 cknowledgements We would like to thank Jelena Diakonikolas, Yin Tat Lee, Yang Liu, Aaron Sidford and Daniel Spiel-man for helpful discussions. We also thank Jelani Nelson for several helpful suggestions regardingthe presentation of the paper.

References [AKPS19] Deeksha Adil, Rasmus Kyng, Richard Peng, and Sushant Sachdeva,

Iterative reﬁnementfor lp-norm regression , Proceedings of the Thirtieth Annual ACM-SIAM Symposiumon Discrete Algorithms, SODA, 2019.[Ans96] Kurt Anstreicher,

Potential reduction algorithms .[AS20] Deeksha Adil and Sushant Sachdeva,

Faster p-norm minimizing ﬂows, via smoothedq-norm problems , Proceedings of the Thirtieth Annual ACM-SIAM Symposium onDiscrete Algorithms, SODA, 2020.[BCLL18] Sébastien Bubeck, Michael B. Cohen, Yin Tat Lee, and Yuanzhi Li,

An homotopymethod for lp regression provably beyond self-concordance and in input-sparsity time ,Proceedings of the 50th Annual ACM SIGACT Symposium on Theory of Computing,STOC, 2018.[BNO03] Dimitri P. Bertsekas, Angelia NediÂťc, and Asuman E. Ozdaglar,

Convex analysis andoptimization , 2003.[CKM +

11] Paul Christiano, Jonathan A. Kelner, Aleksander Madry, Daniel A. Spielman, andShang-Hua Teng,

Electrical ﬂows, laplacian systems, and faster approximation of max-imum ﬂow in undirected graphs , Proceedings of the 43rd ACM Symposium on Theoryof Computing, STOC, 2011.[CKM +

14] Michael B. Cohen, Rasmus Kyng, Gary L. Miller, Jakub W. Pachocki, Richard Peng,Anup B. Rao, and Shen Chen Xu,

Solving SDD linear systems in nearly m log1/2 n time ,Symposium on Theory of Computing, STOC, 2014.[CLS19] Michael B. Cohen, Yin Tat Lee, and Zhao Song, Solving linear programs in the currentmatrix multiplication time , Proceedings of the 51st Annual ACM SIGACT Symposiumon Theory of Computing, STOC, 2019.[CMSV17] Michael B. Cohen, Aleksander Madry, Piotr Sankowski, and Adrian Vladu,

Negative-weight shortest paths and unit capacity minimum cost ﬂow in õ ( m W ) time ,Proceedings of the Twenty-Eighth Annual ACM-SIAM Symposium on Discrete Algo-rithms, SODA, 2017.[DS08] Samuel I. Daitch and Daniel A. Spielman, Faster approximate lossy generalized ﬂow viainterior point algorithms , Proceedings of the 40th Annual ACM Symposium on Theoryof Computing (Cynthia Dwork, ed.), 2008.[GR98] Andrew V. Goldberg and Satish Rao,

Beyond the ﬂow decomposition barrier , J. ACM (1998), no. 5, 783–797. 15KLOS14] Jonathan A. Kelner, Yin Tat Lee, Lorenzo Orecchia, and Aaron Sidford, An almost-linear-time algorithm for approximate max ﬂow in undirected graphs, and its multicom-modity generalizations , Proceedings of the Twenty-Fifth Annual ACM-SIAM Sympo-sium on Discrete Algorithms, SODA, 2014.[KLP +

16] Rasmus Kyng, Yin Tat Lee, Richard Peng, Sushant Sachdeva, and Daniel A. Spielman,

Sparsiﬁed cholesky and multigrid solvers for connection laplacians , Proceedings of the48th Annual ACM SIGACT Symposium on Theory of Computing, STOC (DanielWichs and Yishay Mansour, eds.), 2016.[KMP14] Ioannis Koutis, Gary L. Miller, and Richard Peng,

Approaching optimality for solvingSDD linear systems , SIAM J. Comput. (2014), no. 1, 337–354.[KOSA13] Jonathan A. Kelner, Lorenzo Orecchia, Aaron Sidford, and Zeyuan Allen Zhu, A simple,combinatorial algorithm for solving SDD systems in nearly-linear time , Symposium onTheory of Computing Conference, STOC’13, 2013.[KPSW19] Rasmus Kyng, Richard Peng, Sushant Sachdeva, and Di Wang,

Flows in almost lin-ear time via adaptive preconditioning , Proceedings of the 51st Annual ACM SIGACTSymposium on Theory of Computing, STOC, 2019.[KS16] Rasmus Kyng and Sushant Sachdeva,

Approximate gaussian elimination for laplacians- fast, sparse, and simple , IEEE 57th Annual Symposium on Foundations of ComputerScience, FOCS (Irit Dinur, ed.), 2016.[LFN18] Haihao Lu, Robert M. Freund, and Yurii E. Nesterov,

Relatively smooth convex opti-mization by ﬁrst-order methods, and applications , SIAM Journal on Optimization (2018), no. 1, 333–354.[LRS13] Yin Tat Lee, Satish Rao, and Nikhil Srivastava, A new approach to computing maximumﬂows using electrical ﬂows , Symposium on Theory of Computing Conference, STOC,2013.[LS14] Yin Tat Lee and Aaron Sidford,

Path ﬁnding methods for linear programming: Solvinglinear programs in õ(vrank) iterations and faster algorithms for maximum ﬂow , 55thIEEE Annual Symposium on Foundations of Computer Science, FOCS, 2014.[LS20a] Yang Liu and Aaron Sidford,

Faster divergence maximization for faster maximum ﬂow ,arXiv preprints, 2020.[LS20b] Yang P. Liu and Aaron Sidford,

Faster energy maximization for faster maximum ﬂow ,STOC (2020).[LSZ19] Yin Tat Lee, Zhao Song, and Qiuyi Zhang,

Solving empirical risk minimization in thecurrent matrix multiplication time , Conference on Learning Theory, COLT 2019, 25-28June, 2019.[Mad13] Aleksander Madry,

Navigating central path with electrical ﬂows: From ﬂows to match-ings, and back , 54th Annual IEEE Symposium on Foundations of Computer Science,FOCS, 2013.[Mad16] ,

Computing maximum ﬂow with augmenting electrical ﬂows , IEEE 57th AnnualSymposium on Foundations of Computer Science, FOCS, 2016.16Nes04] Yurii Nesterov,

Introductory lectures on convex optimization: A basic course , KluwerAcademic Publishers, 2004.[NN94] Yurii E. Nesterov and Arkadii Nemirovskii,

Interior-point polynomial algorithms inconvex programming , Siam studies in applied mathematics, vol. 13, SIAM, 1994.[Pen16] Richard Peng,

Approximate undirected maximum ﬂows in O ( m polylog( n )) time , Pro-ceedings of the Twenty-Seventh Annual ACM-SIAM Symposium on Discrete Algo-rithms, SODA 2016, Arlington, VA, USA, January 10-12, 2016 (Robert Krauthgamer,ed.), SIAM, 2016, pp. 1862–1867.[PS14] Richard Peng and Daniel A. Spielman, An eﬃcient parallel solver for SDD linearsystems , Symposium on Theory of Computing, STOC, 2014.[Ren01] James Renegar,

A mathematical view of interior-point methods in convex optimization ,MPS-SIAM series on optimization, SIAM, 2001.[She13] Jonah Sherman,

Nearly maximum ﬂows in nearly linear time , 54th Annual IEEE Sym-posium on Foundations of Computer Science, FOCS, 2013.[She17a] ,

Area-convexity, l ∞ regularization, and undirected multicommodity ﬂow , Pro-ceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing,STOC, 2017.[She17b] , Generalized preconditioning and undirected minimum-cost ﬂow , Proceedingsof the Twenty-Eighth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA,2017.[ST14] Daniel A. Spielman and Shang-Hua Teng,

Nearly linear time algorithms for precondi-tioning and solving symmetric, diagonally dominant linear systems , SIAM J. MatrixAnalysis Applications (2014), no. 3, 835–885.[Tod96] Michael J. Todd, Potential-reduction methods in mathematical programming , Math.Program. (1996), 3–45.[Tun94] Levent Tunçel, Constant potential primal-dual algorithms: A framework , Math. Pro-gram. (1994), 145–159.[Tun95] , On the convergence of primal-dual interior-point methods with wide neighbor-hoods , Comp. Opt. and Appl. (1995), no. 2, 139–158.[vdBLSS20] Jan van den Brand, Yin Tat Lee, Aaron Sidford, and Zhao Song, Solving tall denselinear programs in nearly linear time , STOC (2020).

A Missing Proofs

Proof. [of Lemma 4.2] We follow the strategy used in the proof of Lemma 3.4. Recall that theproblem we are trying to understand is min B ⊤ ˆ f = δχ ∆Φ w ( f, ˆ f ) + W "X e ∈ E g e ( ˆ f ) p /p g e ( ˆ f ) = (ˆ u + e ( f )) log (cid:16) − ˆ f e ( f )ˆ u + e ( f ) (cid:17) + ˆ u + e ( f )ˆ u − e ( f ) log (cid:16) ˆ f e ˆ u − e ( f ) (cid:17) . As in Lemma 3.4, we willconsider a ﬂow f ′ which sends δm units of ﬂow on each of the m/ preconditioned edges. Certainly,the objective value of the above function at ˆ f will have a smaller value than that at f ′ . For the ﬁrstterm ∆Φ w ( f, ˆ f ) , running the same argument as in Lemma 3.4, we get that ∆Φ w ( f, ˆ f ) ≤ k w k (cid:18) δF ∗ − F (cid:19) ≤ . m η For the the second term, we use log(1 − x ) ≤ − x + x and log(1 + x ) ≤ x + x , to get that g e ( f ) ≤ ˆ f e (cid:16) ˆ u + e ( f )ˆ u + e ( f ) (cid:17) ≤ f e where we have used that ˆ u + e ( f ) ≤ ˆ u − e ( f ) . Now, since there isnon-zero ﬂow on the preconditioned edges, we get that W "X e ∈ E g e ( f ′ ) p /p ≤ W ( δ/m ) m o (1) ≤ m η + o (1) (cid:18) F ∗ − F m ( m / − η ) (cid:19) ≤ . m η − o (1) U using p = √ log n , the fact that F ∗ − F ≤ mU and the value of δ = F ∗ − F m / − η . Also using thevalue of η , we can see that this term is less than . m η . Hence, combining the two, weget that the objective value at ˆ f is less than . m η . As the objective function is made upof two non-negative quantities, we can obtain two inequalities using this upper bound by droppingone term from the objective value each time. For the second part, we ignore the ﬁrst term of theobjective function and lower bound the second term using the fact that log(1 + x ) ≥ x + x / and log(1 − x ) ≥ − x + x / . m η ≥ W "X e ∈ E g e ( ˆ f ) p /p ≥ W | g e ( ˆ f ) |≥ W ˆ f e (1 + ˆ u + e ( f ) / ˆ u e ( f )) ≥ W ˆ f e This gives us that | ˆ f e | ≤ . m − η by plugging in the value of W = m η For the ﬁrst part now, assume for the sake of contradiction that ρ e > . , otherwise we are done.Now, dropping the second term we want to establish that u e ( f ) ≤ m η , which we will do so by aproof similar to the proof of Lemma 4.3 in [Mad16]. Now using the argument as in Lemma 3.4, weget for an edge e = ( u, v ) , . m η ≥ ∆Φ w ( f, ˆ f ) ≥ . X e ∈ E ˆ f e (cid:18) w + e u + e − f e − ˆ f e − w − e u − e + f e + ˆ f e − w + e u + e − f e + w − e u − e + f e (cid:19) = 12 . f ⊤ B ˆ y . δχ ⊤ ˆ y = F ∗ − F m / − η χ ⊤ ˆ y ≥ χ ⊤ ˆ y/ y s − ˆ y t ) / ≥ (ˆ y u − ˆ y v ) / (cid:18) w + e u + e − f e − ˆ f e − w − e u − e + f e + ˆ f e − w + e u + e − f e + w − e u − e + f e (cid:19) ≥ f e (cid:18) w + e ( u + e − f e ) + w − e ( u − e + f e ) (cid:19) ≥ ρ e u e ( f ) ≥ . u e ( f ) where the ﬁrst and second equalities follows from optimality and feasibility conditions of the poten-tial decrement problem respectively and the third inequality follows from the condition that we runthe program while the ﬂow left to augment is at least m / − η . This implies that / ˆ u e ( f ) ≤ m η .Multiplying this with | ˆ f e | ≤ . m − η , we get that ρ e ≤ . , which ﬁnishes the proof. We alsoneed to argue the inequality ˆ y s − ˆ y t ≥ ˆ y u − ˆ y v . The optimality conditions of ˆ y u − ˆ y v = ˆ f e w + e ( u + e − f e − ˆ f e )( u e − f e ) + w − e ( u − e + f e )( u − e + f e + ˆ f e ) ! and noticing that the quantity in brackets in the right hand side above is non-negative, tells us thatthere is a fall in potential along the ﬂow. This along with noticing that the sum of the potentialdiﬀerence in a directed cycle is zero, tells us that the graph induced by just the ﬂow ˆ f is a DAG.Since, it’s a DAG, it can be decomposed into disjoint s − t paths along which ﬂow is sent and everyedge belongs to one of these paths. Hence, the potential diﬀerence across an edge is less than thepotential diﬀerence across the whole path which is the potential diﬀerence between s and t andhence, we are done.As before, all these arguments go through with the quadratically smoothened cases cases ratherthan the original function to still get the same bounds and since ρ e ≤ . , the minimizers of thetwo are the same which completes the proof. Proof. [of Lemma 4.4] Note that while we are solving for the quadratically smoothened version ofthe problem, we can assume we solve it for the non-smoothened version in the box correspondingto a congestion of at most . as the extension is C and will ensure that any inequalities we needhenceforth (upto the second order terms) are bounded as well.There are two terms, one corresponding to the potential decrement term and the other is asimilar expression but raised to the p th power. We tackle the ﬁrst term ﬁrst. This is easily doneusing Taylor’s theorem. The function is g ( x + y ) = − log(1 − ( x + y ) /u ) − ( x + y ) /u . Computing theﬁrst two derivatives with respect to y , we get that g ′ ( x + y ) = u − x − y − /u and g ′′ ( x + y ) = u − x − y ) .Now, using Taylor’s theorem, we get that g ( x + y ) = g ( x ) + g ′ ( x ) y + 12 g ′′ ( x + ζ ) y g ( x ) + y (cid:18) u − x − y − u (cid:19) + y (cid:18) u − x − ζ ) (cid:19) for some ζ such that − u/ ≤ x + ζ ≤ u/ which easily gives us the bound g ( x ) + y (cid:18) u − x − y − u (cid:19) + (9 / y (cid:18) u − x ) (cid:19) ≤ g ( x + y ) ≤ g ( x ) + y (cid:18) u − x − y − u (cid:19) + (11 / y (cid:18) u − x ) (cid:19) Similarly for − log(1 + x/u ) + x/u .Now, for the second term, we will largely follow the strategy of [KPSW19]. Now for the p th order term, we have a function g ( x ) = u log(1 − x/u ) + u u log(1 − x/u ) . We ﬁrst use Lemma2.2 with f i = g ( x ) and δ i = g ( x + y ) − g ( x ) to get g ( x + y ) p ≤ g ( x ) p + pg ( x ) p − ( g ( x + y ) − g ( x )) + 2 O ( p ) ( g ( x ) p − ( g ( x + y ) − g ( x )) + ( g ( x + y ) − g ( x )) p ) Now, adding and subtracting pg ( x ) p − yg ′ ( x ) from both sides and noticing that g ( x + y ) − g ( x ) − yg ′ ( x ) ≤ from concavity of g , we get g ( x + y ) p ≤ g ( x ) p + pyg ( x ) p − g ′ ( x ) + 2 O ( p ) ( g ( x ) p − ( g ( x + y ) − g ( x )) + ( g ( x + y ) − g ( x )) p ) Now, notice that using inequalities of log(1 − x/u ) and log(1 + x/u ) , to get x ≤ g ( x ) ≤ x andwe also use Taylor’s theorem get that g ( x + y ) − g ( x ) ≤ | xy | + | y | ) g ( x + y ) p ≤ g ( x ) p + pyg ( x ) p − g ′ ( x ) + 2 O ( p ) ( x p − ( x y + y ) + 2 p − ( x p y p + y p ) ≤ g ( x ) p + pyg ′ ( x ) + 2 O ( p ) ( x p − y + y p ) where we have used ( x + y ) p ≤ p − ( x p + y p ) and that y ≤ xx