[PDF] Efficiently Computing Maximum Flows in Scale-Free Networks

Abstract

We study the maximum-flow/minimum-cut problem on scale-free networks, i.e., graphs whose degree distribution follows a power-law. We propose a simple algorithm that capitalizes on the fact that often only a small fraction of such a network is relevant for the flow. At its core, our algorithm augments Dinitz's algorithm with a balanced bidirectional search. Our experiments on a scale-free random network model indicate sublinear run time. On scale-free real-world networks, we outperform the commonly used highest-label Push-Relabel implementation by up to two orders of magnitude. Compared to Dinitz's original algorithm, our modifications reduce the search space, e.g., by a factor of 275 on an autonomous systems graph. Beyond these good run times, our algorithm has an additional advantage compared to Push-Relabel. The latter computes a preflow, which makes the extraction of a minimum cut potentially more difficult. This is relevant, for example, for the computation of Gomory-Hu trees. On a social network with 70000 nodes, our algorithm computes the Gomory-Hu tree in 3 seconds compared to 12 minutes when using Push-Relabel.

Full PDF

EEﬃciently Computing Maximum Flows in Scale-Free Networks

Thomas Bl¨asius Tobias Friedrich Christopher Weyand

Abstract

We study the maximum-ﬂow/minimum-cut problem on scale-free networks, i.e., graphs whose degree distribution followsa power-law. We propose a simple algorithm that capitalizeson the fact that often only a small fraction of such a networkis relevant for the ﬂow. At its core, our algorithm augmentsDinitz’s algorithm with a balanced bidirectional search. Ourexperiments on a scale-free random network model indicatesublinear run time. On scale-free real-world networks, weoutperform the commonly used highest-label Push-Relabelimplementation by up to two orders of magnitude. Comparedto Dinitz’s original algorithm, our modiﬁcations reduce thesearch space, e.g., by a factor of 275 on an autonomoussystems graph.Beyond these good run times, our algorithm has anadditional advantage compared to Push-Relabel. The lattercomputes a preﬂow, which makes the extraction of a minimumcut potentially more diﬃcult. This is relevant, for example,for the computation of Gomory-Hu trees. On a social networkwith 70 000 nodes, our algorithm computes the Gomory-Hutree in 3 seconds compared to 12 minutes when using Push-Relabel.

The maximum ﬂow problem is arguably one of the mostfundamental graph problems that regularly appears asa subtask in various applications [2, 32, 35]. The go-to general-purpose algorithm for computing ﬂows inpractice is the highest-label Push-Relabel algorithm byCherkassky and Goldberg [10], which is also part of theboost graph library [33]. Beyond that, the BK-algorithmby Boykov and Kolmogorov [7] or its later iteration [17]should be used for instances appearing in computervision. Our main goal in this paper is to provide a ﬂowalgorithm tailored towards scale-free networks. Suchnetworks are characterized by their heavy-tailed degreedistribution resembling a power-law, i.e., they are sparsewith few vertices of comparatively high degree and manyvertices of low degree.At its core, our algorithm is a variant of Dinitz’salgorithm [12]. Dinitz’s algorithm is an augmentingpath algorithm that iteratively increases the ﬂow alongcollections of shortest paths in the residual network. Ineach iteration, at least one edge on every shortest pathgets saturated, thereby increasing the distance betweensource and sink in the residual network. To exploitthe structure of scale-free networks, we make use of thefacts that, ﬁrstly, shortest paths tend to span only asmall fraction of such networks, and secondly, a balancedbidirectional breadth ﬁrst search is able to ﬁnd the shortest paths very eﬃciently [6, 5]. Using a bidirectionalsearch to compute the collection of shortest paths inDinitz’s algorithm directly translates this eﬃciency tothe ﬁrst iteration, as the residual network initiallycoincides with the ﬂow network. Though the structureof the residual network changes in later iterations, ourexperiments show that the run time improvementsachieved by using a bidirectional search remain high.Scaling experiments with geometric inhomogeneousrandom graphs (GIRGs) [8], in fact indicate that theﬂow computation of our algorithm runs in sublinear time.In comparison, previous algorithms (Push-Relabel, BK,and unidirectional Dinitz) require slightly super-lineartime. This is also reﬂected in the high speedups weachieve on real-world scale-free networks.With the ﬂow computation itself being so eﬃcient,the total run time for computing the maximum ﬂowfor a single source-sink pair in a scale-free network isheavily dominated by loading the graph and buildingdata structures. Thus, our algorithm is particularlyrelevant when we have to compute multiple ﬂows in thesame network. This is, e.g., the case when computingthe Gomory-Hu tree [20] of a network. The Gomory-Hutree is a compact representation of the minimum s - t cutsfor all source-sink pairs ( s, t ). It can be computed withGusﬁeld’s algorithm [21] using n − n vertices. Using our bidirectionalﬂow algorithm as the subroutine for ﬂow computationsin Gusﬁeld’s algorithm lets us compute the Gomory-Hutree of, e.g., the soc-slashdot instance with 70 k nodesand 360 k edges in only 2 . GIRGs are a generative network model closely related tohyperbolic random graphs [25]. They resemble real-world networksin regards to important properties such as degree distribution,clustering, and distances. a r X i v : . [ c s . D S ] S e p nsurprisingly, our algorithm is outperformed by theBK-algorithm on a segmentation instance from computervision. Moreover, Push-Relabel performs best on alayered network that was speciﬁcally constructed toevaluate ﬂow algorithms. However, we would arguethat this type of instance is rather artiﬁcial. Our ﬁndings can be summarizedin the following main contributions. • We provide a simple and eﬃcient ﬂow-algorithmthat signiﬁcantly outperforms previous algorithmson scale-free networks. • It’s eﬃciency on non-scale-free instances makesit a potential replacement for the Push-Relabelalgorithm for general-purpose ﬂow computations. • Our algorithm is well suited to compute the Gomory-Hu tree of comparatively large instances. • There are situations, where computing a ﬂow withthe Push-Relabel algorithm is signiﬁcantly moreexpensive than computing a preﬂow. This standsin contrast to previous observations [10, 11].

The maximum ﬂow problem hasbeen for a long time and still is subject of active research.In the following, we brieﬂy discuss only the work mostrelated to our result. For a more extensive overview onthe topic of ﬂows, we refer to the survey by Goldbergand Tarjan [19].Our algorithm is based on Dinitz’s Algorithm [12],which belongs to the family of augmenting path al-gorithms originating from the Ford-Fulkerson algo-rithm [16]. Augmenting path algorithms use the residualnetwork to represent the remaining capacities and iter-atively increase the ﬂow by augmenting it with pathsfrom source to sink in the residual network, until nosuch path exists. At every point in time, a valid ﬂow isknown and at the end of execution, non-reachability inthe residual network certiﬁes maximality.From this perspective, the

Push-Relabel algo-rithm [18] does the reverse. At every point in time,the sink is not reachable from the source in the residualnetwork, thereby guaranteeing maximality, while the ob-ject maintained throughout the algorithm is a so-called preﬂow and the algorithm stops once the preﬂow is ac-tually a ﬂow. This is achieved using two operations push and relabel ; hence the name. Diﬀerent variants of thePush-Relabel algorithm mainly diﬀer with regards to theorder in which operations are applied. A strategy per-forming well in practice is the highest-label strategy [10].The extensive empirical study by Ahuja et al. [1] onten diﬀerent algorithms shows that the highest-label Push-Relabel algorithm indeed performs the best out ofthe ten. The only small caveat with these experimentsis the fact that they are based on artiﬁcial networksthat are speciﬁcally generated to pose diﬃcult instances.Our experiments show that the structure of the instancematters in the sense that it impacts diﬀerent algorithmsdiﬀerently; potentially yielding diﬀerent rankings ondiﬀerent types of instances. The so-called pseudoﬂowalgorithm by Hochbaum [23] was later shown to slightlyoutperform (low single-digit speedups on most instances)the highest-label Push-Relabel algorithm; again basedon artiﬁcial instances [9].Boykov and Kolmogorov [7] gave an algorithmtailored speciﬁcally towards instances that appear incomputer vision; outperforming Push-Relabel on theseinstances. It was later reﬁned by Goldberg et al. [17].Most related to our studies is the work by Halim etal. [22] who developed a distributed ﬂow algorithm forMapReduce to compute ﬂows on huge social networks.

In this section we introduce the concept of network ﬂowand describe Dinitz’s algorithm [12].

Network Flows.

A ﬂow network is a directedgraph G = ( V, E ) with source and sink vertices s, t ∈ V ,and a capacity function c : V × V → N with c ( u, v ) = 0 if( u, v ) (cid:54)∈ E . A ﬂow f on G is a function over vertex pairs f : V × V → Z satisfying three constrains: (I) capacity f ( u, v ) ≤ c ( u, v ) (II) asymmetry f ( u, v ) = − f ( v, u ) and(III) conservation (cid:80) v ∈ V f ( u, v ) = 0 for u ∈ V \ { s, t } .We call an edge ( u, v ) ∈ E saturated if f ( u, v ) = c ( u, v ).Denote the value of a ﬂow f as (cid:80) v ∈ V f ( s, v ). Themaximum ﬂow problem, max-ﬂow for short, is theproblem of ﬁnding a ﬂow of maximum value.Given a ﬂow f in G , we deﬁne a network G f calledthe residual network . G f has the same set of nodes andcontains the directed edge ( u, v ) if f ( u, v ) < c ( u, v ). Thecapacity c (cid:48) of edges in G f is given by the residual capacityin the original network, i.e., c (cid:48) ( u, v ) = c ( u, v ) − f ( u, v ).An s - t path in G f is called an augmenting path . Dinitz’s Algorithm.

One can solve max-ﬂow byiteratively increasing the ﬂow on augmenting paths, yield-ing the famous Ford-Fulkerson algorithm [16]. Dinitz’salgorithm belongs to the family of augmenting path algo-rithms [2]. In contrast to the Ford-Fulkerson algorithm,Dinitz groups augmentations into rounds.Let d s ( v ) be the distance from s to vertex v in G f .We deﬁne a subgraph of G f called the layered network by restricting the edge set to edges ( u, v ) of G f for which d s ( u ) + 1 = d s ( v ), i.e., edges that increase the distanceto the source. We call a ﬂow of some network a blockingﬂow if every s - t path contains at least one edge that issaturated by this ﬂow, i.e., there is no augmenting path.ach round, Dinitz’s algorithm (see Algorithm 1)augments a set of edges that constitutes a blocking ﬂowof the layered network. One can ﬁnd such a set ofedges by iteratively augmenting s - t paths in the layerednetwork until source and sink become disconnected.After augmenting a blocking ﬂow, the distance betweenthe terminals in the residual network strictly increases. Algorithm 1:

Dinitz’s Algorithm. while s-t path in residual network do build layered network while s-t path in layered network do augment ﬂow with s-t path Asymptotic Running Time.

To better under-stand how our modiﬁcations impact the run time, webrieﬂy sketch how Dinitz running time of O ( n m ) isobtained. Since d s ( t ) increases each round, the numberof rounds is bounded by n −

1. Each round consists oftwo stages: building the layered network and augment-ing a blocking ﬂow. To build the layered network, thedistances from the source to every vertex in the resid-ual network are needed. The layered network can beconstructed in O ( m ) using a breadth-ﬁrst search (BFS).Asymptotically, however, this is dominated by the timeto ﬁnd the blocking ﬂow. Finding the paths of theblocking ﬂow is done with a repeated graph traversal,usually using a depth-ﬁrst search (DFS). The numberof found paths is bounded by m , because each foundpath saturates at least one edge, removing it from thelayered network. A single DFS can be done in amortized O ( n ) time as follows. Edges that are not part of an s - t path in the layered network do not need to be lookedat more than once during one round. This is achievedby remembering for each node which edges of the lay-ered network were already found to have no remainingpath to the sink. Each subsequent DFS will start wherethe last one left oﬀ. Thus, per round, the depth-ﬁrstsearches have a combined search space of O ( m ), whileeach individual search additionally visits the nodes onone s - t path which is O ( n ). Eﬃcient Dinitz Implementation.

Typical im-plementations represent the graph by adding a reversedtwin for each edge. Furthermore, neither the residualnetwork nor the layered network are constructed ex-plicitly. The residual network is implicitly deﬁned bythe capacities and ﬂow values on edges and the layerednetwork by a distance labeling. This conveniently elimi-nates the need to modify the network structure duringthe algorithm. When, e.g., saturating an edge duringaugmentation, this implicitly removes the edge from theresidual network and layered network. However, with this representation, the BFS and DFS are performed onall edges and must check if edges are part of the residualor layered network when they are encountered. Thebound for the BFS is unaﬀected and the amortizationargument for the DFS extends to edges that are notpart of the layered and/or residual network. During theaugmentation of the blocking ﬂow, a counter into theadjacency list of each vertex indicates which outgoingedges were already processed this round.

Practical Performance.

The practical perfor-mance of Dinitz’s algorithm is far better than its worst-case bound. Actually, O ( n ) as the length of the foundaugmenting path is very unrealistic. In our experiments d s ( t ) remains mostly below 10, implying that the num-ber of rounds is signiﬁcantly lower than n −

1. Also, thenumber of found augmenting paths during one rounds isfar below O ( m ). In unweighted networks, for example,a DFS saturates all edges of the found path resultingin a bound of O ( m ) to ﬁnd a blocking ﬂow. In fact,Dinitz’s algorithm has a tight upper bound of O ( n / m )in unweighted networks [14, 24]. We adapt a common Dinitz implementation to exploitthe speciﬁc structure of scale-free networks. We achievea signiﬁcant speedup by using the fact that a ﬂow andcut respectively often depend only on a small fractionof the network. The following three modiﬁcations eachtackle a performance bottleneck. Bidirectional Search.

Recently, sublinear run-ning time was shown for balanced bidirectional searchin a scale-free network model [5, 6]. We use a bidi-rectional breadth-ﬁrst-search to compute the distancesthat deﬁne the layered network during each round ofDinitz’s algorithm. A forward search is performed fromthe source and a backward search from the sink, eachtime advancing the search that incurs the lower cost toadvance one layer. A shortest s - t path is found whena vertex is discovered that was already seen from theother direction. Note that, for our purpose, the bidirec-tional search has to ﬁnish the current layer when sucha vertex is discovered, because all shortest paths mustbe found. Figure 1 visualizes the diﬀerence in exploredvertices between a normal and a bidirectional BFS. Theaugmentations with DFS are restricted to the visitedpart of the layered network, meaning the search spaceof the BFS plus the next layer.The distance labeling obtained by the bidirectionalBFS requires a change to the DFS. The purpose of thelayered network is to contain all edges on shortest s - t paths. The DFS identiﬁes edges ( u, v ) of the layered https://cp-algorithms.com/graph/dinic.html s ts Figure 1: Search space of a breadth-ﬁrst search from asource s to a sink t unidirectional (left) and bidirectional(right). The blue area represents the vertices that areexplored, i.e., whose outgoing edges were scanned, bythe forward search and the green area the backwardsearch. In the gray area are vertices that are seenduring exploration of the last layer, but not yet explored.Vertices in the intersection of the upcoming layers of thebackward and forward search are marked orange.network by checking if they increase the distance fromthe source, i.e., d s ( u )+1 = d s ( v ). However, we no longerobtain the distances from the source for all relevantvertices. For vertices processed by the backward search,distances to the sink d t ( v ) are known instead. Toresolve the problem, we allow edges that either increasedistance from the source or decrease distance to thesink, i.e., d s ( u ) + 1 = d s ( v ) or d t ( u ) − d t ( v ). Thisdeviates from the deﬁnition of the layered network. Butsince edges on shortest s - t paths must both, increase thedistance from the source and decrease the distance tothe sink, we do not miss any relevant edges. Time Stamps.

The bidirectional search reducesthe search space of the breadth-ﬁrst search and depth-ﬁrst search substantially, potentially to sublinear. Theinitialization, however, still requires linear time. Itincludes the following. For the BFS, distances fromthe source and to the sink must be initialized to inﬁnity.For the augmentations, one counter per node has to beinitialized to zero.To avoid the linear initializations, we introduce timestamps to indicate if a vertex was seen during the currentround. The initialization of distances and counters isdone lazily as vertices are discovered during the BFS.Another detail of our implementation is that we use beginand end indices into an array instead of a dynamicallygrowing queue for the BFS. We allocate this memory inadvance and override the data each round.

Skip Next Forward Layer.

Recall that we iden-tify edges of the layered network by checking if theyincrease the distance from the source or decrease thedistance to the sink. Therefore the DFS proceeds alongedges outgoing from the last forward search layer inde- pendent from the target vertex being seen only by theforward search (gray in Figure 1) or also by the backwardsearch (orange in Figure 1). However, the former type ofvertex cannot be part of a shortest s - t path. By savingthe number of explored layers of the forward search wecan avoid the exploration of such vertices, thus limitingthe DFS to vertices colored blue, green, or orange inFigure 1. With this optimization, the combined searchspace during augmentation (lines 3,4 in Algorithm 1)is almost limited to the search space of the BFS. Theonly additional edges that are visited originate from theintersection of the forward and backward search. In this section, we investigate the performance of our al-gorithm

DinitzOPT . First, we compare it to establishedapproaches on real-world networks in Section 4.1. Weadditionally examine the scaling behavior and how thecomparison is aﬀected by problem size, i.e., is there anasymptotic improvement over other algorithms? Then,Section 4.2 evaluates to which extent the diﬀerent op-timizations contribute to better run times and searchspace. In Section 4.3 we analyze the algorithms in a spe-ciﬁc application (Gomory-Hu trees) and compare theirusability beyond the speed of the actual ﬂow computa-tion. To this end, we test three diﬀerent approaches toobtain a cut with the Push-Relabel algorithm. Lastly,we extent our considerations to other types of networksin Section 4.4 and discuss why the results on scale-freenetworks diﬀer from previous studies. Recall that bidi-rectional search was found to perform particularly wellon heterogeneous networks. In this section we com-pare our new approach to three existing algorithms:Dinitz [12], Push-Relabel [18], and the Boykov-Kolmogorov (BK) algorithm [7]. We modiﬁed theirrespective implementations to support our experiments.This also includes some minor performance-relevantchanges listed in the appendix (see Section A.1). The ex-periments include two synthetic and eight real-worldnetworks. All networks are undirected and all but visualize-us and actors are unweighted. Further de-tails regarding the datasets can be found in Table 2.We restrict our experiments in this section to the ﬂowcomputation only. That is, the measurements excludethe time it takes to initialize intermediate data struc-tures before and after ﬂow computations as well as thecreation of the graph structure. For Push-Relabel weonly measure the computation of the preﬂow , which issuﬃcient to determine the value of the ﬂow/cut. Fig- The code will be available upon publication. b-pages-tvshow girg10000 soc-slashdot girg100000 soc-flickr visualize-us dogster as-skitter actors brainInstance10 T i m e [ m s ] DinitzDinitzOPTPushRelabelBK-Algorithm lowhigh

Figure 2: Runtime comparison of ﬂow computations. The 20 computed ﬂows per instance are divided into low and high terminal pairs. For low , the terminal degree is between 0.75 and 1.25 times the average degree. For high ,it is between 10 and 100 times the average degree. Pairs are chosen uniformly at random from all vertices with therespective degree.ure 2 shows the resulting run times. For this plot, theterminals were chosen uniformly at random from theset of vertices with degree close to the average ( low ) orconsiderably higher degree ( high ).One can see that Dinitz and Push-Relabel displaycomparable times while BK is slightly slower on mostlarge instances. DinitzOPT consistently outperforms theother algorithms by one to three orders of magnitude.The variance is also higher for DinitzOPT with low pairsapproximately one order of magnitude faster on averagethan high pairs. This is best seen in the girg100000 instance and suggests that DinitzOPT is able to betterexploit easy problem instances. For all other algorithmsthe eﬀect of the terminal degree on the run time is barelynoticeable. Another observation is that all algorithmsdisplay drastically lower run times than their respectiveworst-case bounds would suggest.The times in our experiments are close to whatone might expect from linear algorithms. For example,Dinitz computes a ﬂow on the as-skitter instance inone second. Considering the tight O ( mn / ) bound inunweighted networks and assuming the throughput persecond to be around 10 — which is a generous guessfor graph algorithms — would result in an estimateof 30 minutes per ﬂow. In this context, there arealso experimental results that appear to conﬂict withour results. Earlier studies found Dinitz to be slowerthan Push-Relabel and both algorithms clearly super-linear on a series of synthetic instances [1]. However,these synthetic instances exhibit speciﬁcally crafted hardstructures that are placed between designated source andsink vertices. These instances thus present substantiallymore challenging ﬂow problems. We assume the lowtimes in our experiments to be caused by the scale-freenetwork structure and, to a lesser degree, the simplicity of the problem instances when choosing a random pair ofnodes as terminals. Furthermore, most of our instancesare unweighted and undirected. Eﬀect of the Terminal Degree.

In the following,we discuss the eﬀect of terminal degree and structureof the cut on the run time of Dinitz and DinitzOPT.Note that the terminal degree is an upper bound onthe size of the cut in unweighted networks. Moreover,the terminal degree in our experiments is based on theaverage degree, which is assumed to be constant inmany real-world networks [3]. Thus, the O ( mC ) boundfor augmenting path based algorithms, with C beingthe size of the cut, implies not only a linear boundfor the eight unweighted networks in our experiments,but would also explain faster low pairs. Surprisingly,DinitzOPT exploits low terminal degrees much morethan Dinitz. Another explanation for faster low pairsis that many cuts are close around one terminal, whichis consistent with previous observations about cuts inscale-free networks [29, 34]. Moreover, Dinitz tends toperform well when the source side of the cut is small [30].Although this does not fully explain why DinitzOPTis more sensitive to the terminal degree, we observein Section 4.3 that Dinitz slows down massively whenthe source degree is high, even with low sink degree.Since DinitzOPT always advances the side with smallervolume during bidirectional search it does not matterwhich terminal has the higher degree. Scaling.

We perform additional experiments to an-alyze the scaling behavior of the algorithms. Since realnetworks are scarce and ﬁxed in size, we generate syn-thetic networks to gradually increase the size while keep-ing the relevant structural properties ﬁxed. GeometricInhomogeneous Random Graphs (GIRGs) [8], a gen-eralization of Hyperbolic Random Graphs [25], are a Number of Nodes10 T i m e [ m s ] DinitzDinitzOPTPushRelabelBK-Algorithm

Figure 3: Runtime scaling of ﬂow algorithms. The plotshows the average time per ﬂow over multiple GIRGsand terminal pairs. Two linear and a quadratic functionwere added for reference.scale-free generative network model that captures manyproperties of real-world networks. The eﬃcient gener-ator [4] allows us to benchmark our algorithms on dif-ferently sized networks with similar structure. Figure 3and Figure 4 show the results.We measure the run time over a series of GIRGs withthe number of nodes growing exponentially from 1000 to1 024 000 with 10 iterations each. In each iteration, wesample a new random graph with average degree 10,power-law exponent 2.8, dimension 1, and temperature 0.The run time for each algorithm is then averaged over10 uniform random pairs of vertices with degree between10 and 20. Standard deviation is shown as error bars.The lower half of the symmetric error bars seems longerdue to the log-axis. We add ﬁve functions in blackas reference: a quadratic and two linear functions inFigure 3 and n . and n . in Figure 4.Dinitz, Push-Relabel and BK show a near-linearrunning time. Compared to the linear functions inFigure 3, Dinitz and Push-Relabel seem to scale slightlyworse than linear, while DinitzOPT scales better thanlinear. In a construction with super-sink and super-source, a similar scaling was observed for Push-Relabelon the Yahoo Instant Messenger graph [27]. We addedthe function n . to Figure 4, because it is the theoreticalupper bound for the bidirectional search on hyperbolicrandom graphs with the chosen power-law exponent [5].Also it appears to be a good estimate for Dinitz runningtime with just the ﬁrst optimization of bidirectionalsearch (DinitzBi). It was previously observed thatbidirectional search on hyperbolic random graphs withthe chosen parameters usually scales like n . [5], whichﬁts the run time of DinitzOPT in our experiments.Finally, the standard deviation and shape of the Number of Nodes10 T i m e [ m s ] DinitzDinitzBiDinitzStampDinitzOPT

Figure 4: Scaling of Dinitz variants. This plot diﬀersfrom Figure 3 only in the set of displayed algorithms.curve conﬁrms our claim that the run time of DinitzOPTis more sensitive to the graph structure. In fact,comparison with our intermediate versions of Dinitzshows that, while bidirectional search improved run timethe most, each successive optimization increased thesensitivity to the graph structure.

In this section weevaluate the performance impact of the changes discussedin Section 3. We present a search space analysis andin-depth proﬁler results . In addition to the unmodiﬁedDinitz, we consider four incrementally more optimizedversions of the algorithm: DinitzBi, DinitzReset, Dinitz-Stamp, and DinitzOPT. Each algorithm corresponds toadding one optimization to the previous ones. Experimental Setup.

All optimizations can beapplied in any order and combination. Instead ofconsidering all combinations, we individually add themin a speciﬁc order, such that the next change alwaystackles a performance bottleneck. In fact, additionalbenchmarks reveal that the current optimization speedsup the computation more than enabling all otherremaining changes together.The experiments and benchmarks in this sectionconsider 1000 uniform random terminal pairs close tothe average degree on the as-skitter instance. Theaverage distance between source and sink in the initialnetwork is 4.2. The average number of rounds until amaximum ﬂow is found is 4.8, where the last round runsonly the BFS to verify that no augmenting path exists.Only counting rounds before the last round, 2.9 units ofﬂow are found on average per round. Out of the 1000cuts, 882 have value equal to the degree of the smallerterminal. Table 1 shows proﬁler results and search spacefor Dinitz and the optimized versions of the algorithm. We used the Intel VTune proﬁler. able 1: Total run times and search space of visited edges for the ﬁve intermediate versions of our Dinitzimplementation during the computation of 1000 ﬂows in as-skitter . Terminals are chosen like low pairs inFigure 2. The ﬁrst seven columns show times in seconds accumulated over all ﬂow computations. BUILD is theconstruction of the residual network that is reused for all ﬂow computations, RESET means clearing ﬂow on edgesbetween computations, INIT includes initialization of distances and counters per round, BFS and DFS refer tothe respective subroutines, FLOW is the summed time during ﬂow computations (sum of BFS, DFS, INIT), andTOTAL is the run time of the whole application including reading the graph from ﬁle. The last three columnscontain the search space relative to the number of edges in the graph in percent. Search space columns for BFSand DFS are per round, while the FLOW column lists the search space per ﬂow, e.g., Dinitz visits on average65.66% of all edges per BFS and every edge is visited about 5.58 times on overage in one ﬂow computation.MaxFlow Search Space [%]BUILD RESET INIT BFS DFS FLOW TOTAL BFS DFS FLOWDinitz 0.50 56.79 14.87 405.46 426.80 847.13 904.85 65.66 63.64 558.04DinitzBi 0.55 58.15 21.02 2.78 8.94 32.73 91.82 0.26 1.87 8.38DinitzReset 0.50 — 20.73 2.47 8.01 31.20 32.06 0.26 1.87 8.38DinitzStamp 0.55 — — 2.51 10.30 12.81 13.72 0.26 1.87 8.38DinitzOPT 0.55 — — 2.40 1.06 3.46 4.22 0.26 0.20 2.03Additionally, Figure 5 compares the search space withand without bidirectional search.

Bidirectional Search.

Dinitz takes 15 minutesto compute the 1000 ﬂows and the search space perﬂow is more than ﬁve times the number of edges onaverage. Almost all of that time is spent in BFS or DFS.The bidirectional Dinitz reduces the ﬂow-time from 14minutes to 30 seconds, an improvement by a factor of 25.The search space is reduced by factors of 252 for BFS,34 for DFS, and 67 per ﬂow. It is interesting to note,that the search space of BFS during the last round ofeach ﬂow changes even more. In this round the BFS willﬁnd no s-t path. The bidirectional search visits 39 edgeson average, while the normal breadth-ﬁst-search visits44% of the graph. This not only emphasizes that thecuts are close around one terminal, but also shows thatthe bidirectional search heavily exploits this structure.The run time does not fully reﬂect this drasticreduction in search space, because DFS and BFS nolonger dominate the ﬂow computation. The initializationtime per round increased by 50%, which can be explainedby the additional distance label per node to store thedistance to the sink (now 3 ints instead of 2). Althoughthe initialization is a simple linear operation in thenumber of nodes, it takes twice as long as BFS and DFScombined. Actually, the performance of initializationheavily depends on the data layout. We decided tostore node data interleaved instead of in separate buﬀers.This data layout reduces memory loads and facilitatescache locality because all data for one node is fetchedat once. On the other hand, the choice hinders eﬃcientinitialization with SIMD instructions. The real bottleneck, however, is to reset the ﬂowvalues between computations. RESET takes almost afull minute which is twice as long as computing the ﬂows.

Reset ﬂow between computations.

Betweenﬂow computations, the residual capacity of all edgeshas to be reset before another ﬂow can be found. Afterchanging the BFS to a bidirectional search, resettingthe ﬂow on all edges between computations dominatesthe run time. To reduce the time of our benchmarks,and to make the code more eﬃcient in situations wheremultiple ﬂows are computed in the same network, weaddress this bottleneck. Instead of explicitly resettingﬂow values for all edges, we remember the edges thatcontain ﬂow and reset only those. The number of edgeswith positive ﬂow is typically very small in comparisonto the whole network. Additionally, edges that containﬂow are visited during the algorithm anyway. By storingchanged edges during DFS, reset ﬂow takes at most aslong as augmenting the ﬂow in the ﬁrst place. In fact,the time to reset ﬂow is so low, it is not detected bythe proﬁler. This change is not mentioned in Section 3because it does not speed up a single ﬂow computation.This change completely eliminates the time forRESET, while other operations are not aﬀected. Thetotal time to compute all 1000 ﬂows is thus three timeslower with the ﬂow computation making up for almostall spent time. The slowest part of the ﬂow computationitself is still the initialization with 21 of the 31 seconds.

Time Stamps.

The distance labels and countersper node are initialized each round. Using time stampseliminates the need for initialization completely whileadding a small overhead to DFS. The ﬂow computation

LLBFSDFS

UNI-directional Search

Forward SearchNext Forward LayerIntersectionNext Backward LayerBackward SeachALLBFSDFS

BI-directional Search

Figure 5: Average number of edges visited per ﬂowcomputation for the terminal pairs used in Table 1,partitioned as in Figure 1.

Forward/Backward Search represent the edges explored by the respective search.

Next Forward/Backward Layer denote the edges thatwould be explored in the next step of the BFS. Edges inthe

Intersection originate from vertices in both upcomingBFS-layers. The BFS and DFS bars show the edges thatare actually visited by the algorithm. The shaded areaindicates the edges skipped by our last optimization(from DinitzStamp to DinitzOPT in Table 1) and isexcluded in the sum on the right.gets 2.4 times faster with 13 seconds instead of 31.After introducing the time stamps, the DFS is the newbottleneck and makes up for about 80% of ﬂow time.

Skip Next Forward Layer.

As discussed in Sec-tion 3, this change prevents the DFS from visiting ver-tices beyond the last layer of the forward search thatare not also seen by the backwards search. In Figure 5the skipped part is shaded. This optimization reducesthe average search space for DFS during one round fromalmost 2% of all edges to just 0.2%. The improvementin search space is reﬂected by the proﬁler results. DFSis sped up from 10 seconds to just one second, which isfaster than the BFS. The resulting time to compute all1000 ﬂows is 3.46 seconds, which is only 7 times slowerthan building the adjacency list in the beginning. Intotal, the time to compute the ﬂows with the optimizedDinitz is 245 times faster than the unmodiﬁed Dinitz.

Misc.

Since the BFS is the slowest part of the ﬁnalalgorithm, we add another low-level optimization forundirected networks. Line-by-line load analysis showsthat more time is spent during the backward search thanthe forward search. The backward search from the sinkhas to consider incoming instead of outgoing edges butour implementation only maintains an adjacency list ofoutgoing edges. However, for each incoming edge thereis an outgoing twin edge with a reference to the incomingedge. This reference is used to determine the residualcapacity of the incoming edge to check if the incomingedge is part of the residual network. We can save a memory lookup in the hot code ofthe algorithm, by determining the residual capacity ofthe incoming edge without loading it into memory. Theresidual capacity of an edge is obtained by subtractingthe ﬂow from the capacity. In undirected networks thecapacity of an edge is the same as that of its twin.Additionally, consistency of ﬂow links the ﬂow of bothedges. Thus we can compute the residual capacity ofincoming edges by looking only at the outgoing edges.The change improves performance by 20 to 40 percentin undirected networks.Note that a similar optimization is possible fordirected networks: one can cache the capacity of theback edge in each twin. This concept is known and wasapplied in previous ﬂow implementations , however weonly use the optimization for undirected networks. In the last sections weobserved that heterogeneous network structure yieldseasy ﬂow problems that can be solved signiﬁcantlyfaster than the construction of the adjacency list.This performance becomes important in applicationsthat require multiple ﬂows to be found in the samenetwork. Gomory-Hu trees [20] ﬁt this setting and haveapplications in graph clustering [15]. A Gomory-Hu tree(GH-tree) of a network is a weighted tree on the sameset of vertices that preserves minimum cuts, i.e., eachminimum cut between any two vertices s and t in thetree is also a minimum s - t cut in the original network.Thus, they compactly represent s - t cuts for all vertexpairs of a graph. For the construction of a GH-tree,there exists a very simple algorithm by Gusﬁeld [21] thatrequires n − Flow Computation on Gusﬁeld Pairs.

Figure 6shows the same networks and algorithms as in Figure 2but with terminal pairs sampled out of the n − https://github.com/Zagrosss/maxflow b-pages-tvshow girg10000 soc-slashdot girg100000 soc-flickr visualize-us dogster as-skitter actors brainInstance10 T i m e [ m s ] DinitzDinitzOPTPushRelabelBK-Algorithm gh

Figure 6: Runtime comparison of ﬂow computations. The 10 terminal pairs per instance are uniformly chosen outof the n − gh pairs measured for the soc-slashdot instance aresolved by DinitzOPT and Push-Relabel faster than onemicrosecond which is the precision of our measurements.This suggests, that these algorithms are more sensitiveto the varying diﬃculty of the ﬂow computations for gh pairs. Our speedup over the Push-Relabel algorithm on gh pairs is not as pronounced as for the random pairs inSection 4.1. On the dogster instance PR is even fasterthan DinitzOPT.To further investigate why gh pairs are this easy tosolve, we analyze a complete run of all pairs needed byGusﬁeld’s algorithm on the soc-slashdot instance. InGusﬁeld’s algorithm each vertex is the source once, thusthe average degree of the source is the average degreeof the graph (10.24). In contrast, the average degreeof the sink is ca. 1500, which hinders the beneﬁt ofbidirectional search. Uni-directional Dinitz slows downby a factor of 15 when computing the ﬂows with switchedterminals. The average distance between two verticesin the original network is 4.16, but interestingly herethe average distance from source to sink is 1.78. Out ofthe 70 k ﬂow computations, 56 k are trivial cuts aroundone terminal. Computing a ﬂow for a single s-t pairtakes 2.76 rounds on average with the last round only toconﬁrm that the ﬂow is optimal. Before the last roundon average 5.56 ﬂow is being found per round.DinitzOPT and Push-Relabel are both extremelyfast on gh pairs. DinitzOPT takes 2.5 seconds tocompute all n = 70 k required ﬂows, while PR needs5 seconds. To obtain the 5 seconds for PR we exclusivelymeasured the preﬂow computation, but PR is notlimited by the time to compute the preﬂow. Actually,the entire computation of the Gomory-Hu tree on the soc-slashdot instance takes 12 minutes with Push- Relabel and 2.6 seconds with DinitzOPT. Instead ofbeing caused by the Gusﬁeld logic — which actuallymakes up less than 3% of the run time when usingDinitzOPT as oracle — the bottleneck when using PR asa cut oracle is not the ﬂow computation, but initializationand extracting the cut. The drastic diﬀerence in runtime is in part due to the optimizations we added toDinitzOPT to reduce time between ﬂow computations,while the Push-Relabel implementation recreates theauxiliary data structures, except the adjacency list,before each ﬂow. However, in the following we will seethat a large amount of Push-Relabels run time is actuallynecessary to extract the cuts for Gusﬁeld’s algorithm. Measuring Gusﬁeld’s Algorithm.

In Gusﬁeld’salgorithm we have to iterate over all vertices in thesource-side of the cut. Extracting these with the Push-Relabel algorithm is slower than with Dinitz. We outlinethe three approaches to extract the cut with Push-Relabel and show that each has major drawbacks.The PR algorithm is executed in two stages. Theﬁrst stage computes a preﬂow and the second stageconverts the preﬂow into a ﬂow. Often, computinga preﬂow is suﬃcient, because one obtains the valueof the max-ﬂow/min-cut and can determine a cut byﬁnding all sink-reaching vertices in the residual network.Since Gusﬁeld requires the source-side of the cut, thecomplement of the found set of vertices can be used.This approach is computationally expensive, because ofthe high sink degree.Given a max-ﬂow, one partition of a min-cut canalso be identiﬁed by reachability from the source in theresidual network. Since the source usually has a smallerdegree during Gusﬁeld’s algorithm, the source-side ofthe cut is small. This approach is eﬃcient and can beused for Dinitz. However, for PR it requires the preﬂowto be converted into a ﬂow. Asymptotically, the ﬁrststage (preﬂow) dominates the second (convert) stage, onvertT-SideSwap initializepreflowconvertfind cut 736s1062s3333s

Figure 7: Distribution of spent time during Gusﬁeld’salgorithm on the soc-slashdot instance with threeapproaches to use the Push-Relabel algorithm as a min-cut oracle. We split the measurements into initialization,preﬂow, conversion, and cut identiﬁcation. The timeoverhead for measurement, logging, and the logic ofGusﬁeld’s algorithm is included in the numbers on theright but excluded in the bars.but in practice this is not always the case. In the paperthat proposed the current PR implementation [10] theauthors experiment with diﬀerent implementations ofthe conversion and ﬁnd a method whose ”running time[. . . ] is a small fraction of the running time of the ﬁrststage”. Other works ﬁnd that 95% of time is spent instage one [11]. Our experiments in Section 4.1 are in linewith these ﬁndings and thus only the time for the ﬁrststage of PR is shown there. However, Gusﬁeld pairs poseeasily solvable ﬂow instances due to the low distancebetween source and sink. Thus, the simplicity causesthe second stage of PR to dominate the ﬁrst one.The drawbacks of the two previous approachescan be avoided in undirected networks by computingthe preﬂow from sink to source. Without preﬂow-conversion, a cut can then be extracted by determiningthe vertices that can reach the original source in theresidual network. The drawback of this method is thatthe preﬂow computation slows down massively.In short, the three approaches to extract the source-side of a min-cut with the Push-Relabel algorithm are:

Convert.

Compute a preﬂow from the source, convertit into a ﬂow, then run BFS from the source.

T-Side.

Compute a preﬂow from the source, run BFSbackwards from the sink, then take complement.

Swap.

Compute a preﬂow from sink to source, then runBFS backwards from the source.Figure 7 shows the distribution of run time usingthese approaches to run Gusﬁelds algorithm on the soc-slashdot instance. The convert approach is thefastest with just above 12 minutes followed by

T-side with 18 minutes and swap with almost an hour. Theinitialization time provides a reference, because it is approximately 7 minutes for all approaches. We notehere that the initialization for PR creates some Boostrelated data and performs an operation linear in thenumber of edges.We see that the ﬂow computation is actually reallyfast for convert . It takes only about 5 seconds of these12 minutes. The initialization dominates this time andthe conversion is also far slower than the ﬂow itself.Surprisingly, the ﬂow takes twice as long for

T-side than for convert , although only the way to identify thecut was changed. This is, because we ﬁnd other min-cuts and thus obtain a diﬀerent GH-tree while processingdiﬀerent terminal pairs. We also implemented the

T-side approach for DinitzOPT to verify the correctnessof the computed cuts and trees. Interestingly runningthis takes 4.5 minutes which is a factor 100 slower thanidentifying the cut via the source-side for DinitzOPT.Similarly for PR, we observe that the cut identiﬁcation,which was almost unnoticeable for convert , makes upmost of the computation time for

T-side .Lastly, the swap approach takes more than 4 times aslong as the convert approach. As the degree of the sink issigniﬁcantly larger than the source, the ﬂow computationslows down massively. It goes from 5 seconds to 47minutes. Recall that the unmodiﬁed Dinitz slows downby a factor of 15 when switching source and sink.In conclusion, all three methods perform signiﬁcantlyworse than DinitzOPT, not because PR ﬂow computa-tions are slow, but because the initialization and cutidentiﬁcation already take orders of magnitude longerthan the complete process with DinitzOPT. Both meth-ods to avoid the four minutes run time of stage two ofthe Push-Relabel algorithm imply even worse perfor-mance cost; either due to a breadth-ﬁrst search that hasto traverse almost the whole graph (T-side) or due tosigniﬁcantly slower preﬂow computations (Swap).

After evaluatingthe performance on heterogeneous networks we extendour experiments to networks of diﬀerent structures. Weconsider the following networks: an Erd˝os-R´enyi ran-dom graph [13] ( er100000 ), an Erd˝os-R´enyi randomgraph with uniform random weights in [500 , er100000 weighted ), an Erd˝os-R´enyi random graphwith super terminals ( er100000 super ), a generatedlayered network [1] ( layered10000 ), the road networkof Pennsylvania ( roadNet-PA ), and a liver CT scan asa regular 6-connected grid ( liver.n6c100 ). Further de-tails regarding the datasets can be found in Section A.2.Figure 8 shows the performance of the ﬂow algo-rithms on these instances. The performance on theErd˝os-R´enyi graphs is similar to our results for het-erogeneous networks; the BK-algorithm is the slowest, r100000 er100000_weighted er100000_super layered10000 roadNet-PA liver.n6c100Instance10 T i m e [ m s ] DinitzDinitzOPTPushRelabelBK-Algorithm

Figure 8: Run time of max-ﬂow computations for various networks. Each point corresponds to one s - t ﬂow. Foreach instance we computed 50 s - t ﬂows. The instances er100000 super , layered10000 , and liver.n6c100 havedesignated terminals. For er100000 , er100000 weighted , and roadNet-PA terminals are chosen uniformly atrandom. Unlike the experiments in Section 4.1, the algorithms rebuild their internal data structures includingthe adjacency list before each ﬂow computation. This was necessary to prevent the BK-algorithm from reusingsearch-trees, which makes the instances with given terminal pairs trivial after the ﬁrst run.followed by Dinitz, Push-Relabel, and DinitzOPT in thisorder. Note that a running time close to O ( √ n ) wasshown for bidirectional search on Erd˝os-R´enyi randomgraphs [6]. Neither weights nor higher-degree terminalschange how the algorithms compare among each other.The layered network, which is speciﬁcally con-structed to produce a computationally diﬃcult ﬂowinstance [1], is indeed more diﬃcult than the others.In the layered network, Push-Relabel is at least ﬁvetimes faster than Dinitz. DinitzOPT is 10-20% slowerthan Dinitz. After all, our optimizations trade a smalloverhead during ﬂow computation for the possibility ofsublinear running time on particularly easy instances.For the road network, the choice of the algorithmdoes not matter as much as for the other instances.The choice of the terminal pair, however, aﬀects theperformance immensely. With a diameter of almost 800and a very homogeneous degree distribution, the uniformrandom choice of terminal pairs produces problems ofvarying diﬃculty. Dinitz, BK, and DinitzOPT capitalizeon the easier pairs, while Push-Relabel shows lessvariance between pairs.Lastly, the liver scan produces diﬀerent results thanprevious instances. The BK-algorithm was speciﬁcallydesigned for this kind of network structure and applica-tion. Unsurprisingly, the BK-algorithm performs best,followed by Push-Relabel, Dinitz, and DinitzOPT. We presented a modiﬁed version of Dinitz’s algorithmwith greatly improved run time and search space on real-world and generated scale-free networks. The scalingbehavior appears to be sublinear, which matches previous theoretical and empirical observations about the runningtime of balanced bidirectional search in scale-free randomnetworks. While these theoretical bounds apply duringthe ﬁrst round of our algorithm, it is still unknownwhether the analysis can be extended to account forthe changes in the residual network. Our experiments,however, indicate that the search space remains small insubsequent rounds.We observe that that the low diameter and heteroge-neous degree distribution leads to small and unbalancedcuts that our algorithm ﬁnds very eﬃciently. The ﬂowcomputations required to compute a Gomory-Hu treeare even easier, making usually insigniﬁcant parts ofthe tested algorithms be a bottleneck. For example, thepreﬂow conversion leads to Push-Relabel being greatlyoutperformed by our algorithm in this setting.Our results on other types of instances show thattheir structural properties play a huge role when com-paring ﬂow algorithms. It is not surprising thatour algorithm is outperformed by the BK-algorithm,which was speciﬁcally designed for vision problems, on liver.n6c100 . Moreover, the experiments on the arti-ﬁcial layered1000 instance indicate that Push-Relabelis more robust regarding hard instances. On scale-freenetworks, however, we drastically improve performanceover existing algorithms. eferences [1] Ravindra K. Ahuja, Murali Kodialam, Ajay K. Mishra,and James B. Orlin. Computational investigationsof maximum ﬂow algorithms.

European Journal ofOperational Research , 97(3):509–542, 1997. doi:10.1016/S0377-2217(96)00269-X .[2] Ravindra K. Ahuja, Thomas L. Magnanti, and James B.Orlin.

Network ﬂows: Theory, algorithms and applica-tions . Prentice-Hall, Inc., 1993.[3] Albert-L´aszl´o Barab´asi.

Network science . Cambridgeuniversity press, 2016.[4] Thomas Bl¨asius, Tobias Friedrich, Maximilian Katz-mann, Ulrich Meyer, Manuel Penschuck, and Christo-pher Weyand. Eﬃciently Generating Geometric In-homogeneous and Hyperbolic Random Graphs. In , volume 144 of

Leibniz International Proceed-ings in Informatics (LIPIcs) , pages 21:1–21:14. SchlossDagstuhl–Leibniz-Zentrum fuer Informatik, 2019. doi:10.4230/LIPIcs.ESA.2019.21 .[5] Thomas Blsius, Cedric Freiberger, Tobias Friedrich,Maximilian Katzmann, Felix Montenegro-Retana, andMarianne Thieﬀry. Eﬃcient Shortest Paths in Scale-Free Networks with Underlying Hyperbolic Geome-try. In , volume107 of

Leibniz International Proceedings in Informatics(LIPIcs) , pages 20:1–20:14. Schloss DagstuhlLeibniz-Zentrum fuer Informatik, 2018. doi:10.4230/LIPIcs.ICALP.2018.20 .[6] Michele Borassi and Emanuele Natale. KADABRA isan ADaptive Algorithm for Betweenness via RandomApproximation. In , volume 57 of

Leibniz In-ternational Proceedings in Informatics (LIPIcs) , pages20:1–20:18. Schloss Dagstuhl–Leibniz-Zentrum fuer In-formatik, 2016. doi:10.4230/LIPIcs.ESA.2016.20 .[7] Y. Boykov and V. Kolmogorov. An experimentalcomparison of min-cut/max- ﬂow algorithms for energyminimization in vision.

IEEE Transactions on PatternAnalysis and Machine Intelligence , 26(9):1124–1137,2004. doi:10.1109/TPAMI.2004.60 .[8] Karl Bringmann, Ralph Keusch, and Johannes Lengler.Geometric inhomogeneous random graphs.

TheoreticalComputer Science , 760:35–54, 2019. doi:10.1016/j.tcs.2018.08.014 .[9] Bala G. Chandran and Dorit S. Hochbaum. A com-putational study of the pseudoﬂow and push-relabelalgorithms for the maximum ﬂow problem.

OperationsResearch , 57(2):358–376, 2009.[10] B. V. Cherkassky and A. V. Goldberg. On imple-menting the push—relabel method for the maximumﬂow problem.

Algorithmica , 19(4):390–410, 1997. doi:10.1007/pl00009180 .[11] U. Derigs and W. Meier. Implementing Goldberg’smax-ﬂow-algorithm A computational investigation.

Zeitschrift fr Operations Research , 33(6):383–403, 1989. doi:10.1007/BF01415937 .[12] Yeﬁm Dinitz. Algorithm for Solution of a Problem ofMaximum Flow in Networks with Power Estimation.

Soviet Mathematics Doklady , 11:1277–1280, 1970.[13] Paul Erd˝os and Alfr´ed R´enyi. On random graphs,i.

Publicationes Mathematicae (Debrecen) , 6:290–297,1959.[14] Shimon Even and R. Endre Tarjan. Network ﬂowand testing graph connectivity.

SIAM Journal onComputing , 4(4):507–518, 1975. doi:10.1137/0204043 .[15] Gary William Flake, Robert E. Tarjan, and KostasTsioutsiouliklis. Graph clustering and minimum cuttrees.

Internet Mathematics , 1(4):385–408, 2004. doi:10.1080/15427951.2004.10129093 .[16] L. R. Ford and D. R. Fulkerson. Maximal ﬂow througha network.

Canadian Journal of Mathematics , 8:399404,1956. doi:10.4153/CJM-1956-045-5 .[17] Andrew V. Goldberg, Sagi Hed, Haim Kaplan, Robert E.Tarjan, and Renato F. Werneck. Maximum Flows byIncremental Breadth-First Search. In , LectureNotes in Computer Science, pages 457–468. Springer,2011. doi:10.1007/978-3-642-23719-5_39 .[18] Andrew V. Goldberg and Robert E. Tarjan. A newapproach to the maximum-ﬂow problem.

Journal ofthe ACM , 35(4):921–940, 1988. doi:10.1145/48014.61051 .[19] Andrew V. Goldberg and Robert E. Tarjan. Eﬃcientmaximum ﬂow algorithms.

Commun. ACM , 57(8):82–89, 2014. doi:10.1145/2628036 .[20] R. E. Gomory and T. C. Hu. Multi-Terminal NetworkFlows.

Journal of the Society for Industrial and AppliedMathematics , 9(4):551–570, 1961.[21] Dan Gusﬁeld. Very simple methods for all pairs networkﬂow analysis.

SIAM Journal on Computing , 19(1):143–155, 1990. doi:10.1137/0219009 .[22] Felix Halim, Roland H.C. Yap, and Yongzheng Wu.A MapReduce-Based Maximum-Flow Algorithm forLarge Small-World Network Graphs. In , pages 192–202, 2011. ISSN: 1063-6927. doi:10.1109/ICDCS.2011.62 .[23] Dorit S. Hochbaum. The pseudoﬂow algorithm: A newalgorithm for the maximum-ﬂow problem.

OperationsResearch , 56(4):992–1009, aug 2008. doi:10.1287/opre.1080.0524 .[24] Alexander V. Karzanov. On ﬁnding a maximum ﬂow ina network with special structure and some applications.

Matematicheskie Voprosy Upravleniya Proizvodstvom ,5:81–94, 1973.[25] Dmitri Krioukov, Fragkiskos Papadopoulos, MaksimKitsak, Amin Vahdat, and Mari´an Bogu˜n´a. Hyperbolicgeometry of complex networks.

Physical Review E ,82(3), 2010. doi:10.1103/physreve.82.036106 .[26] Jrme Kunegis. KONECT – The Koblenz NetworkCollection. In

Proc. Int. Conf. on World Wide WebCompanion , pages 1343–1350, 2013. URL: http://onect.cc/ .[27] Kevin Lang. Finding good nearly balanced cuts inpower law graphs. Technical Report YRL-2004-036,Yahoo! Research Labs, 2004.[28] Jure Leskovec and Andrej Krevl. SNAP Datasets:Stanford large network dataset collection. http://snap.stanford.edu/data , 2014.[29] Jure Leskovec, Kevin J. Lang, Anirban Dasgupta, andMichael W. Mahoney. Community structure in largenetworks: Natural cluster sizes and the absence of largewell-deﬁned clusters.

Internet Mathematics , 6(1):29–123, 2009. doi:10.1080/15427951.2009.10129177 .[30] Lorenzo Orecchia and Zeyuan Allen Zhu. Flow-basedalgorithms for local graph clustering. In

Proceedings ofthe Twenty-Fifth Annual ACM-SIAM Symposium onDiscrete Algorithms . Society for Industrial and AppliedMathematics, 2014. doi:10.1137/1.9781611973402.94 .[31] Ryan A. Rossi and Nesreen K. Ahmed. The net-work data repository with interactive graph analyt-ics and visualization. In

AAAI , 2015. URL: http://networkrepository.com .[32] Satu Elisa Schaeﬀer. Graph clustering.

ComputerScience Review , 1(1):27–64, 2007. doi:10.1016/j.cosrev.2007.05.001 .[33] Boris Sch¨aling.

The boost C++ libraries . Boris Sch¨aling,2011. URL: https://theboostcpplibraries.com/ .[34] S.-W. Son, H. Jeong, and J. D. Noh. Random ﬁeld isingmodel and community structure in complex networks.

The European Physical Journal B , 50(3):431–437, 2006. doi:10.1140/epjb/e2006-00155-4 .[35] Tanmay Verma and Dhruv Batra. MaxFlow revisited:An empirical comparison of maxﬂow algorithms fordense vision problems. In

Procedings of the BritishMachine Vision Conference 2012 . British MachineVision Association, 2012. doi:10.5244/c.26.61 . AppendixA.1 Implementation Details

Experiments weredone on a Dell XPS 15 9570 Laptop with an Intel Corei7-8750H CPU.

BK-Algorithm.

As a BK implementation we usethe one that was written for the original paper [7]provided on the web page of Vladimir Kolmogorov .For each s - t ﬂow we add edges with huge capacitybetween s, t and the virtual terminals. After the ﬂow iscomputed, we remove these edges again. This O (1) workis included in time measurements. We apply the reusetrees feature and mark the changed terminals betweenﬂow computations accordingly. Internal memory isallocated on network construction and not per ﬂow.There is a BK implementation available in Boost .We found the original one easier to use, because itsinterface is tailored towards multiple ﬂow computationsand provides easy and eﬃcient access to the found cut. Push-Relabel

The original implementation, usedfor example in [35], is no longer available . We use theC++ version of the original implementation providedin Boost . The Boost version is mostly the same code(up to same variable names) ported to C++, but is datastructure agnostic. Therefore, we had to reimplementedthe linearised adjacency list data structure used in theoriginal implementation. Dinitz and DinitzOPT.

Our implementation isbased on a version of Dinitz that is usually used inprogramming competitions . We changed the graphrepresentation to a linear adjacency list of outgoingedges. Edges are sorted by originating vertex in lineartime. Each node stores a range of edges into this list.This is the same structure used for the Push-Relabelimplementation. Performance-wise, the data structuresigniﬁcantly reduces the time to build large networks,but ﬂow time remains the same. We use an array of size n as a queue, because during BFS each vertex is pushedat most once. We allocate memory for distance labels,counter, and the queue in advance when the network isbuilt instead of per ﬂow. In the unidirectional BFS, onecould break when the sink is encountered but we ﬁnishthe current layer for the purpose of measuring searchspace. Undirected Networks.

We support ﬂow for undi-rected networks. A simple way to do this, is to representeach undirected edge as two directed edges, which was http://pub.ist.ac.at/~vnk/software.html was https://cp-algorithms.com/graph/dinic.html done for Push-Relabel. However, each directed edgealready implies two edges in the residual network: onewith the given capacity, and a reversed twin edge withno capacity. To avoid storing four times the amount ofedges, the twin edge can be used to implement undi-rected ﬂow. By giving the twin edge the same capacityas its counterpart, the exact same implementation canbe used for undirected as well as directed networks. Non-integer capacities.

We use 64-bit ﬂoatingpoint numbers instead of integers to represent ﬂowvalues and capacities, because some applications usenon-integer capacities. The same implementation can beused but requires more memory and additional checksto handle ﬂoating point imprecision. We applied this toDinitz, PR, and BK and observed a performance dropof approximately 10% for all algorithms. Note that therange in which 64-bit ﬂoats exactly represent integralnumbers even exceeds the range of 32-bit integers.Precision issues are cause by the inﬁnity capacity edgesintroduced for BK. To resolve this, the representation ofinﬁnity on these edges must be chosen according to therange of capacities.

A.2 Data.

We obtained the datasets from the Univer-sity of Koblenz (KONECT) [26], the Network Repositorywebsite [31], as well as the Stanford Network AnalysisProject (Snap) [28].Furthermore, we used the GIRG generator byBl¨asius et al. [4] mostly with default parameters. Weimplemented the ER model and the layered networkconstruction from Ajuja et al. [1]. The parameters forER are n = 100000 and p = 0 .

02. The parameters forthe layered network are taken from the largest instancein their paper (W=71, L=141, d=10).Lastly, the liver.n6c100 instance is from theUniversity of Western Ontario. It is a regular 3D gridwith 170x170x144 nodes, 6 edges per node, capacitiesup to 100, and a super sink/source.We converted all instances to a text-based edgelist with zero-based indices. In Section 4.4 we use thedirected DIMACS format instead.able 2: Instances used in this paper. The road network was undirected and is converted to directed DIMACSformat. In this case, the number of edges refers to the undirected version.instance directed weighted nodes edges avg. degree sourcefb-pages-tvshow 4K 17K 8.87 Network Repositorygirg10000 10K 60K 11.99 generatedsoc-slashdot 70K 360K 10.24 Network Repositorygirg100000 100K 600K 12.00 generatedsoc-ﬂickr 514K 3.2M 12.42 Network Repositoryvisualize-us (cid:88) (cid:88) (cid:88) (cid:88)

10K 100K 9.96 generatedroadNet-PA ( (cid:88) ) 1.1M 1.5M 2.83 U. Stanfordliver.n6c100 (cid:88) (cid:88)(cid:88) (cid:88)