[PDF] Memory Efficient Max Flow for Multi-label Submodular MRFs

Abstract

Multi-label submodular Markov Random Fields (MRFs) have been shown to be solvable using max-flow based on an encoding of the labels proposed by Ishikawa, in which each variable X_i is represented by \ell nodes (where \ell is the number of labels) arranged in a column. However, this method in general requires 2\,\ell^2 edges for each pair of neighbouring variables. This makes it inapplicable to realistic problems with many variables and labels, due to excessive memory requirement. In this paper, we introduce a variant of the max-flow algorithm that requires much less storage. Consequently, our algorithm makes it possible to optimally solve multi-label submodular problems involving large numbers of variables and labels on a standard computer.

Full PDF

11 Memory Efﬁcient Max Flow for Multi-labelSubmodular MRFs

Thalaiyasingam Ajanthan,

Student Member, IEEE,

Richard Hartley,

Fellow, IEEE, and Mathieu Salzmann,

Member, IEEE

Abstract —Multi-label submodular Markov Random Fields (MRFs) have been shown to be solvable using max-ﬂow based on anencoding of the labels proposed by Ishikawa, in which each variable X i is represented by (cid:96) nodes (where (cid:96) is the number of labels)arranged in a column. However, this method in general requires (cid:96) edges for each pair of neighbouring variables. This makes itinapplicable to realistic problems with many variables and labels, due to excessive memory requirement. In this paper, we introduce avariant of the max-ﬂow algorithm that requires much less storage. Consequently, our algorithm makes it possible to optimally solvemulti-label submodular problems involving large numbers of variables and labels on a standard computer. Index Terms —Max-ﬂow, Mutli-label submodular, Memory efﬁciency, Flow encoding, Graphical models. (cid:70)

NTRODUCTION

Ishikawa [1] introduced a max-ﬂow-based method toglobally minimize the energy of multi-label MRFs with con-vex edge terms. In [2], this method was extended to energyfunctions satisfying the multi-label submodularity condition,analogous to the submodularity condition for MRFs withbinary labels. In the general case, however, this methodrequires (cid:96) directed edges for each pair of neighbouringvariables. For instance, for a × , 4-connectedimage with 256 labels, it would require approximately × × × × × ≈ GB of memoryto store the edges (assuming 4 bytes per edge). Clearly, thisis beyond the storage capacity of most computers.In this paper, we introduce a variant of the max-ﬂowalgorithm that requires storing only two (cid:96) -dimensional vec-tors per variable pair instead of the (cid:96) edge capacities ofthe standard max-ﬂow algorithm. In the example discussedabove, our algorithm would therefore use only 4 GB ofmemory for the edges. As a result, our approach lets usoptimally solve much larger problems.More speciﬁcally, in contrast to the usual augmentingpath algorithm [3], we do not store the residual edge ca-pacities at each iteration. Instead, our algorithm recordstwo (cid:96) -dimensional ﬂow-related quantities for every pair ofneighbouring variables. We show that, at any stage of thealgorithm, the residual edge capacities can be computedfrom these ﬂow-related quantities and the initial edge ca-pacities. This, of course, assumes that the initial capacitiescan be computed by some memory-efﬁcient routine, whichis almost always the case in computer vision. • T. Ajanthan and R. Hartley are with the College of Engineering andComputer Science, Australian National University, Canberra and Data61,CSIRO, Canberra.E-mail: [email protected] • M. Salzmann is with the Computer Vision Laboratory, ´Ecole Polytech-nique F´ed´erale de Lausanne. • Data61 (formerly NICTA) is funded by the Australian Government asrepresented by the Department of Broadband, Communications and theDigital Economy and the Australian Research Council (ARC) throughthe ICT Centre of Excellence program.

The optimality of Ishikawa’s formalism made it amethod of choice as a subroutine in many approxi-mate energy minimization algorithms, such as multi-labelmoves [4], [5] and IRGC [6]. Since our approach can simplyreplace the standard max-ﬂow algorithm [7] in Ishikawa-type graphs, it also allows us to minimize the energy ofmuch larger non-submodular MRFs in such approximatetechniques. Furthermore, due to the similarity to standardmax-ﬂow, our algorithm can easily be extended to handledynamic MRFs [8] and also be accelerated using the parallelmax-ﬂow technique [9].We demonstrate the effectiveness of our algorithm on theproblems of stereo correspondence estimation and imageinpainting. Our experimental evaluation shows that ourmethod can solve much larger problems than standard max-ﬂow on a standard computer and is an order of magnitudefaster than state-of-the-art message-passing algorithms [10],[11], [12]. Our code is available at https://github.com/tajanthan/memf.A preliminary version of this paper is appeared in [13].This extended version contains a polynomial time version ofthe MEMF algorithm, a discussion on the equivalence withmin-sum message passing and an experiment to evaluatethe empirical time complexity.

RELIMINARIES

Let X i be a random variable taking label x i ∈ L . A pairwiseMRF deﬁned over a set of such random variables can berepresented by an energy of the form E ( x ) = (cid:88) i ∈V θ i ( x i ) + (cid:88) ( i,j ) ∈E θ ij ( x i , x j ) , (1)where θ i and θ ij denote the unary potentials ( i.e ., data costs)and pairwise potentials ( i.e ., interaction costs), respectively.Here, V is the set of vertices, e.g ., corresponding to pixelsor superpixels in an image, and E is the set of edges in theMRF, e.g ., encoding a 4-connected or 8-connected grid overthe image pixels. a r X i v : . [ c s . D S ] F e b Fig. 1:

Example of an Ishikawa graph. The graph incorporate edgeswith inﬁnite capacity from U i : λ to U i : λ +1 , not shown in the graph.Here the cut corresponds to the labeling x = { , } where thelabel set L = { , , , } . In this work, we consider a pairwise MRF with anordered label set L = { , , · · · , (cid:96) − } , and we assumethat the pairwise terms satisfy the multi-label submodularity condition [2]: θ ij ( λ (cid:48) , µ ) + θ ij ( λ, µ (cid:48) ) − θ ij ( λ, µ ) − θ ij ( λ (cid:48) , µ (cid:48) ) ≥ , (2)for all λ, λ (cid:48) , µ, µ (cid:48) ∈ L , where λ < λ (cid:48) and µ < µ (cid:48) . Fur-thermore, we assume that the pairwise potentials can becomputed either by some routine or can be stored in anefﬁcient manner. In other words, we assume that we do notneed to store each individual pairwise term. Note that, incomputer vision, this comes at virtually no loss of generality. Ishikawa [1] introduced a method to represent the multi-label energy function (1) in a graph. The basic idea behindthe Ishikawa construction is to encode the label X i = x i of avertex i ∈ V using binary-valued random variables U i : λ , onefor each label λ ∈ L . In particular, the encoding is deﬁned as u i : λ = 1 if and only if x i ≥ λ , and otherwise. The Ishikawagraph is then an st -graph ˆ G = ( ˆ V ∪ { , } , ˆ E ) , where the setof nodes and the set of edges are deﬁned as follows: ˆ V = { U i : λ | i ∈ V , λ ∈ { , · · · , (cid:96) − }} , (3) ˆ E = ˆ E v ∪ ˆ E c , ˆ E v = { ( U i : λ , U i : λ ± ) | i ∈ V , λ ∈ { , · · · , (cid:96) − }} , ˆ E c = { ( U i : λ , U j : µ ) , ( U j : µ , U i : λ ) | ( i, j ) ∈ E , U i : λ , U j : µ ∈ ˆ V} , where ˆ E v is the set of vertical edges and ˆ E c is the set of crossedges. Note that, the nodes U i : (cid:96) and U i :0 are identiﬁed asnode 0 and node 1 respectively. We denote the Ishikawaedges by e ij : λµ ∈ ˆ E (contains edges in both directions)and their capacities by φ ij : λµ . We also denote by e i : λ thedownward edge ( U i : λ +1 , U i : λ ) . An example of an Ishikawagraph is shown in Fig. 1.In an st -graph, a labeling x is represented by a “cut” inthe graph (a “cut” partitions the nodes in the graph into twodisjoint subsets ˆ V and ˆ V , with ∈ ˆ V and ∈ ˆ V ). Then,the value of the energy function E ( x ) is equal to the sum

1. Some authors denote the nodes 0 and 1 by s and t . of the capacities on the edges from ˆ V to ˆ V . In an Ishikawagraph, if the downward edge e i : λ is in the “cut”, then vertex i takes label λ . In MRF energy minimization, each vertex i takes exactly one label x i , which means that exactly oneedge e i : λ must be in the min-cut of the Ishikawa graph. Thisis ensured by having inﬁnite capacity for each upward edge e ii : λλ +1 , i.e ., φ ii : λλ +1 = ∞ for all i ∈ V and λ ∈ L . Notethat, by construction of the Ishikawa graph, the capacities φ and the energy parameters θ are related according to thefollowing formula: θ i ( λ ) = φ ii : λ +1 λ = φ i : λ , (4) θ ij ( λ, µ ) = (cid:88) λ (cid:48) >λµ (cid:48) ≤ µ φ ij : λ (cid:48) µ (cid:48) + (cid:88) λ (cid:48) ≤ λµ (cid:48) >µ φ ji : µ (cid:48) λ (cid:48) . Finding the minimum energy labeling is a min-cut prob-lem, which can be solved optimally using the max-ﬂowalgorithm [3] when the edge capacities are non-negative. Asshown in [2], a multi-label submodular energy function can berepresented by an Ishikawa graph with non-negative edgecapacities φ and can therefore be minimized optimally bymax-ﬂow. The most popular max-ﬂow algorithm in computer vi-sion [7] is an augmenting path algorithm that ﬁnds a pathfrom node 0 to node 1 through positive edges (called an augmenting path ) and then pushes the maximum ﬂow with-out exceeding the edge capacities (called augmentation ). Theaugmentation operation changes the edge capacities in thegraph, and therefore, the residual graph needs to be stored.That is, when applied to the Ishikawa graph, the max-ﬂow algorithm stores (cid:96) values per pair of neighbouringvariables. For large number of labels and of variables, thememory requirement is high and, in many practical prob-lems, exceeds the capacity of most computers. Let us assume that the max-ﬂow algorithm is applied to theIshikawa graph. As the algorithm proceeds, the capacitieson the edges in the graph change in response to the ﬂow.Here, instead of storing the residual graph, we proposerecording the ﬂow that has been applied to the graph.However, since storing the ﬂow would also require (cid:96) values per variable pair, we propose recording two (cid:96) -dimensional quantities related to the ﬂow between pair ofvariables. More precisely, for each directed edge ( i, j ) ∈E + , we record the sum of outgoing ﬂows from each node U i : λ to the nodes U j : µ for all µ ∈ { , · · · , (cid:96) − } . Wecall this quantity an exit-ﬂow , denoted by Σ ij : λ (deﬁnedbelow in Eq. 6). We show that these exit-ﬂows allow us toreconstruct a permissible ﬂow (deﬁned below in Def. 3.3),which in turn lets us compute the residual edge capacitiesfrom the initial ones. Importantly, while ﬂow reconstructionis not unique, we show that all such reconstructions areequivalent up to a null ﬂow (Def. 3.4), which does not affectthe energy function. Note that this idea can be applied to E + denotes the set of directed edges between the vertices in theMRF, i.e ., if ( i, j ) ∈ E then, ( i, j ) ∈ E + and ( j, i ) ∈ E + . any augmenting path algorithm, as long as the residualgraph can be rapidly constructed.For increased efﬁciency, we then show how ﬁnding anaugmenting path can be achieved in a simpliﬁed Ishikawagraph (called block-graph ) that amalgamates the nodes ineach column into blocks. We then perform augmentation,which translates to updating our exit-ﬂows, in this block-graph. As a side effect, since an augmenting path in ourblock-graph corresponds to a collection of augmentingpaths in the Ishikawa graph, our algorithm converges infewer iterations than the standard max-ﬂow implementa-tion of [7]. EMORY EFFICIENT FLOW ENCODING

Before we introduce our memory efﬁcient max ﬂow algo-rithm, let us describe how the cumulative ﬂow can be storedin a memory efﬁcient manner. This technique can be used inany augmenting path ﬂow algorithm, by reconstructing theresidual edge capacities whenever needed.Let us assume that the max-ﬂow algorithm is applied tothe Ishikawa graph. At some point in the algorithm, ﬂowhas passed along many of the edges of the graph.

Deﬁnition 3.1.

A ﬂow is a mapping ψ : ˆ E → IR , denoted by ψ ij : λµ for the edges e ij : λµ , that satisﬁes the anti-symmetrycondition ψ ij : λµ = − ψ ji : µλ for all e ij : λµ ∈ ˆ E .A ﬂow is called conservative if the total ﬂow into a nodeis zero for all nodes, except for the source and the terminal, i.e ., (cid:88) j,µ | e ji : µλ ∈ ˆ E ψ ji : µλ = 0 ∀ U i : λ ∈ ˆ V . (5)Given ψ , the residual capacities of the Ishikawa graphare updated as φ = φ − ψ , where φ represents the initialedge capacities. Furthermore, we call the ﬂow restricted toeach column column-ﬂows , which we denote by ψ i : λ ; i ∈V , λ ∈ L .At ﬁrst sight, it might seem that, to apply the max-ﬂowalgorithm, it is necessary to keep track of all the values ψ ij : λµ , which would require the same order of storageas recording all the edge capacities. Below, however, weshow that it is necessary to store only O ( (cid:96) ) values for each ( i, j ) ∈ E , instead of O ( (cid:96) ) .To this end, the ﬂow values that we store in our algo-rithm, namely source-ﬂows and exit-ﬂows are deﬁned below. Deﬁnition 3.2.

1) For each i ∈ V , the ﬂow out from thesource node ψ i : (cid:96) − is called a source-ﬂow .2) For each ( i, j ) ∈ E + and λ ∈ { , · · · , (cid:96) − } , we deﬁnean exit-ﬂow as Σ ij : λ = (cid:88) µ ψ ij : λµ . (6)We will show that these source-ﬂows and exit-ﬂowspermit the ﬂow ψ to be reconstructed up to equivalence.Now, let us deﬁne some additional properties of ﬂow,which will be useful in our exposition. Deﬁnition 3.3.

A ﬂow ψ is called permissible if φ ij : λµ − ψ ij : λµ ≥ for all e ij : λµ ∈ ˆ E .

3. A conservative ﬂow is often referred to as a ﬂow in the literature. (a) ψ (b) ψ (cid:48) (c) ψ ≡ ψ (cid:48) Fig. 2:

An example of two equivalent ﬂow representations withthe same exit-ﬂows. Note that each red arrow represents thevalue ψ ij : λµ and the opposite arrows ψ ji : µλ are not shown.Furthermore, the exit-ﬂows Σ are shown next to the nodes andthe initial edges φ are not shown. In (c) , the ﬂow ψ (cid:48) is obtainedfrom ψ by passing ﬂow around a loop. Deﬁnition 3.4.

A ﬂow ψ is called null if the total ﬂow intoa node is zero for all nodes including the source and theterminal, i.e ., satisﬁes Eq. 5 for all U i : λ ∈ ˆ V ∪ { , } .Note that a null ﬂow does not change the energy func-tion represented by the st -graph and it is identical to passingﬂow around loops. Also, if ψ is a null ﬂow then so is − ψ .Furthermore, note that the energy function encoded byan st -graph is a quadratic pseudo-boolean function [14], anda reparametrization of such a function is identical to a nullﬂow in the corresponding st -graph. Lemma 3.1.

Two sets of capacities φ and φ (cid:48) represent the sameenergy function exactly (not up to a constant), written as E φ ≡ E φ (cid:48) , if and only if φ (cid:48) − φ is a null ﬂow.Proof. This lemma is a restatement of the reparametrizationlemma of [10], [15] in the context of st -graphs.Let φ and φ (cid:48) be two sets of residual capacities obtainedfrom an initial set of capacities φ by passing two ﬂows ψ and ψ (cid:48) , i.e ., φ = φ − ψ and φ (cid:48) = φ − ψ (cid:48) . If φ and φ (cid:48) areequivalent, then, by Lemma 3.1, ( φ − ψ ) − ( φ − ψ (cid:48) ) = ψ (cid:48) − ψ is a null ﬂow. Hence ψ (cid:48) can be obtained from ψ by passingﬂow around loops in the graph. See Fig. 2.We can now state our main theorem. Theorem 3.1.

Let φ be the initial capacities of an Ishikawagraph, and let Σ be a set of exit-ﬂows. Suppose that ψ and ψ (cid:48) aretwo conservative ﬂows compatible with Σ , meaning that (6) holdsfor both ψ and ψ (cid:48) , and that ψ and ψ (cid:48) have identical source-ﬂows.Then E φ − ψ ≡ E φ − ψ (cid:48) . The idea is then as follows. If a permissible conservativeﬂow ψ is obtained during an augmenting path ﬂow algo-rithm, but only the exit-ﬂows Σ ij : λ are retained for each ( i, j ) ∈ E + and label λ , then one wishes, when required, toreconstruct the ﬂow ψ on a given edge ( i, j ) ∈ E . Althoughthe reconstructed ﬂow ψ (cid:48) may not be identical with the ﬂow ψ , the two will result in equivalent energy functions (not justequal up to a constant, but exactly equal for all assignments).In the augmenting path algorithm, the current ﬂow valuesare only needed temporarily, one edge at a time, to ﬁnd anew augmenting path, and hence do not need to be stored,as long as they can be rapidly computed.Now we prove Theorem 3.1. Proof.

First we will prove that ψ and ψ (cid:48) have identicalcolumn-ﬂows. For a conservative ﬂow ψ i : λ −  ψ i : λ − + (cid:88) ( i,j ) ∈E + Σ ij : λ  = 0 , (7) Fig. 3:

Given φ and Σ (left), ﬂow reconstruction is formulatedas a max-ﬂow problem (right). Here the nodes with positive exit-ﬂows are connected to the source (0) and those with negative exit-ﬂows are connected to the terminal (1). for all i ∈ V and λ ∈ { , . . . , (cid:96) − } . Since ψ and ψ (cid:48) arecompatible with Σ and have identical source-ﬂows, ψ i : λ = ψ (cid:48) i : λ for all i ∈ V and λ = L . Hence they have identicalcolumn-ﬂows.Now we will prove the equivalence. Given a ﬂow ψ , letus denote its restriction to the edges e ij : λµ for all λ, µ ∈{ , . . . , (cid:96) − } for some ( i, j ) ∈ E by ψ ij , i.e . restriction tocross edges only. Since both ψ ij and ψ (cid:48) ij satisfy Eq. 6, ψ (cid:48) ij − ψ ij is a null ﬂow. Furthermore, since both ψ and ψ (cid:48) haveidentical column-ﬂows, ψ (cid:48) − ψ = ( φ − ψ ) − ( φ − ψ (cid:48) ) is anull ﬂow and, by Lemma 3.1, E φ − ψ ≡ E φ − ψ (cid:48) . Note that, from Eq. 7 it is clear that given the source-ﬂows ψ i : (cid:96) − ; i ∈ V , the column-ﬂows ψ i : λ ; i ∈ V , λ ∈ L canbe computed in a top-down fashion. Now we turn to theproblem of ﬁnding the ﬂows along the cross-edges e ij : λµ .Given the set of exit-ﬂows Σ , the objective is to ﬁnd apermissible ﬂow ψ (cid:48) satisfying Eq. 6. Note that there existsa permissible conservative ﬂow ψ compatible with Σ andhence we ﬁnd ψ (cid:48) such that ψ (cid:48) − ψ is a null ﬂow. We do thisby considering one edge ( i, j ) ∈ E at a time and reconstructthe ﬂow by formulating a small max-ﬂow problem.Considering all the nodes U i : λ and U j : µ for a givenpair ( i, j ) , we join them with edges with initial capacities φ ij : λµ . Nodes with positive exit-ﬂow Σ ij : λ are joined to thesource with edges of capacities | Σ ij : λ | . Similarly, those withnegative exit-ﬂow are joined to the terminal. See Fig. 3.Note that, in this network, the edges from the source canbe thought of as “supply” and the edges to the terminalcan be thought of as “demand”. Since the total supplyequals the total demand in this network and there existsa permissible ﬂow ψ ij compatible with Σ ( i.e ., satisfyingthe supply-demand equality), the maximum ﬂow solutionof this network ψ (cid:48) ij is compatible with Σ , i.e ., satisﬁes Eq. 6.In fact we are interested in non-negative residual capacities φ (cid:48) ij = φ ij − ψ (cid:48) ij which are readily available in this network.This problem can be solved using a greedy augmentingpath algorithm. While this graph has O ( (cid:96) ) nodes and O ( (cid:96) ) edges, this remains perfectly tractable, since we only con-sider one edge ( i, j ) at a time. Therefore, ultimately, ﬂowreconstruction can be done efﬁciently.At this point, given the initial capacities φ , the source-ﬂows ψ i : (cid:96) − ; i ∈ V and the set of exit-ﬂows Σ , we haveshown how to reconstruct the non-negative residual edgecapacities φ (cid:48) . This requires O ( |V| + |E| (cid:96) ) values to be stored. OLYNOMIAL T IME M EMORY E FFICIENT M AX F LOW

We now introduce our polynomial time memory efﬁcientmax ﬂow algorithm, which minimizes multi-label submodu- lar MRF energies with pairwise interactions. Our algorithmfollows a similar procedure as the standard Edmonds-Karpalgorithm [16], in that it iteratively ﬁnds the shortest aug-menting path and then pushes the maximum ﬂow through itwithout exceeding the edge capacities. However, instead ofstoring the residual graph, we store exit-ﬂows as proposedin Section 3, which, at any stage of the algorithm, wouldallow us to compute the residual graph. Below, we discusshow one can ﬁnd an augmenting path and update the exit-ﬂows, i.e ., perform augmentation, without storing the fullIshikawa graph.

Our algorithm ﬁnds an augmenting path in a subgraphof the Ishikawa graph, called lower-graph . In particular, thelower-graph contains only a subset of Ishikawa edges whichsatisfy the lowest-cross-edge property.

Deﬁnition 4.1.

Consider a directed edge ( i, j ) ∈ E + . Foreach node U i : λ , the lowest-cross-edge is deﬁned as, the edge e ij : λµ where µ is the smallest value such that φ ij : λµ > .More speciﬁcally, in addition to the vertical edges ˆ E v ,the lower-graph contains the lowest-cross-edges. Therefore,we only store O ( (cid:96) ) edges per variable pair ( i, j ) . Now,the relationship between augmenting paths in the originalIshikawa graph and the lower-graph can be characterizedby the following theorem. Theorem 4.1.

Given the Ishikawa graph, there is an augmentingpath in the lower-graph if and only if there exists an augmentingpath in the Ishikawa graph.Proof.

Since the lower-graph is a subgraph of the Ishikawagraph, if there is an augmenting path in the lower-graph,then there exists an augmenting path in the Ishikawa graph.We will now prove the converse. Consider a directededge ( i, j ) ∈ E + . Let e ij : λµ and e ij : λµ (cid:48) be two positivecapacity edges from U i : λ and e ij : λµ (cid:48) be the lowest-cross-edge. Then, due to the upward inﬁnite capacity edgesfrom U j : µ (cid:48) (cid:32) U j : µ , there is a positive capacity path from U i : λ (cid:32) U j : µ through the lowest-cross-edge e ij : λµ (cid:48) . Thisproves the theorem.This enables us to ﬁnd all the augmenting paths in theIshikawa graph by searching in a smaller graph that has O ( (cid:96) ) edges per variable pair ( i, j ) .Note that, as mentioned earlier, we ﬁnd the shortest augmenting path in this lower-graph. However, by contrastto the Edmonds-Karp algorithm [16], the path distance iscomputed considering zero distance for the inﬁnite capacityedges and unit distance for other edges, instead of unit dis-tance for all the edges. The intuition for this modiﬁcation isthat, the inﬁnite capacity edges will never become saturated(or eliminated from the graph) for the entire course of thealgorithm. Note that, with this deﬁnition of path distance,the augmenting paths in both lower-graph and Ishikawagraph have same length. This will enable us to prove thepolynomial time bound of our algorithm in a similar manneras the standard Edmonds-Karp algorithm. Note that, even inthis case, the shortest augmenting path can be found usinga Breadth First Search (BFS) scheme. Algorithm 1

Memory Efﬁcient Max Flow (MEMF) - Polyno-mial Time Version

Require: φ (cid:46) Initial Ishikawa capacities Σ ← (cid:46) Initialize exit-ﬂows ¯ φ ← lower-graph( φ ) (cid:46) Store the lowest-cross-edges repeat p ← shortest augmenting path( ¯ φ ) (cid:46) Sec. 4.1 ( ¯ φ, Σ) ← augment( p, ¯ φ ) (cid:46) Sec. 4.2 for each edge e ij : λµ becomes saturated do φ ij ← compute edges( φ , Σ , i, j ) (cid:46) Sec. 3.1 ¯ φ ij ← lower-graph( φ ij , i, j ) (cid:46) Sec. 4.1 until no augmenting paths possible return get labelling( ¯ φ ) (cid:46) Find the cut using BFS

Now, given an augmenting path p , we want to push themaximum permissible ﬂow through it. The edges in theaugmenting path p are updated in the similar manner as inthe usual max-ﬂow algorithm. In addition to that, for eachcross edge e ij : λµ ∈ ˆ E c that is in the augmenting path, theexit-ﬂows are updated as follows: Σ ij : λ = Σ ij : λ + α , (8) Σ ji : µ = Σ ji : µ − α , where α is the maximum possible ﬂow along the path p .After the ﬂow augmentation, the lower-graph needsto be updated to maintain the lowest-cross-edge property.Note that the lowest-cross-edge property may be violateddue to the following reasons:1) A new lowest-cross-edge e ij : λµ is created due to a ﬂowalong the edge e ji : µλ .2) A new lowest-cross-edge e ij : λµ is created due to asaturating ﬂow along e ij : λµ (cid:48) for some µ (cid:48) < µ , i.e ., theedge e ij : λµ (cid:48) disappears from the Ishikawa graph.Note that, during an augmentation, if a new lowest-cross-edge is created due to a ﬂow in the opposite direction(case-1 above), then the new lowest-cross-edge is knownand the lower-graph can be updated directly, i.e ., the newlowest-cross-edge can be stored.On the other hand, if a cross edge becomes saturated(case-2), then we need to run the ﬂow reconstruction al-gorithm to ﬁnd the new lowest-cross-edge and update thelower-graph. This can be done in a memory efﬁcient man-ner, since it only involves one edge ( i, j ) ∈ E at a time. Our polynomial time memory efﬁcient max ﬂow is summa-rized in Algorithm 1. Let us brieﬂy explain the subroutinesbelow. lower-graph : Given the initial Ishikawa edge capac-ities φ , this subroutine constructs the lower-graph (withedge capacities ¯ φ ) by retaining the lowest-cross-edges fromeach node U i : λ ∈ ˆ V , for each directed edge ( i, j ) ∈ E + (see Sec. 4.1). If the input to this subroutine is the Ishikawacapacities φ ij corresponding to the edge ( i, j ) ∈ E , then itretains the lowest-cross-edges ¯ φ ij . shortest augmenting path : Given the lower-graphparameters ¯ φ , this subroutine ﬁnds the shortest augmenting p using BFS, as discussed in Section 4.1. augment : Given the path p , this subroutine ﬁndsthe maximum possible ﬂow through the path and updatesthe lower-graph and the set of exit-ﬂows, as discussed inSection 4.2. In addition, if a new lowest-cross-edge is createddue to a ﬂow in the opposite direction (case-1 in Sec. 4.2),then it also updates the lower-graph capacities ¯ φ . compute edges : Given the initial Ishikawa edge ca-pacities φ and the set of exit-ﬂows Σ , this subroutinecomputes the non-negative residual Ishikawa capacities φ ij corresponding to the given edge ( i, j ) . This is accomplishedby solving a small max-ﬂow problem (see Sec. 3.1). get labelling : This subroutines ﬁnds the partition ofthe lower-graph by running BFS.As discussed above, the exit-ﬂows Σ require O ( (cid:96) ) stor-age for each edge ( i, j ) ∈ E . In addition, the lower-graph canhave at most O ( |V| (cid:96) ) nodes and O ( |E| (cid:96) ) edges. Further-more, recall that we assume that the initial Ishikawa edgecapacities φ can be stored efﬁciently. Therefore, ultimately,the space complexity of our algorithm is O (( |V| + |E| ) (cid:96) ) = O ( |E| (cid:96) ) . Let us now prove the polynomial time bound ofour algorithm. We follow the time complexity analysis of the standardEdmonds-Karp algorithm [16] to derive a polynomial timebound on our algorithm. In particular, ﬁrst the analysisproves that the shortest path distance from source (node0) to any node is monotonically increasing with each ﬂowaugmentation. Then, it derives a bound on the number ofaugmentations. In fact, the number of augmentations of ourMEMF algorithm also has the same bound as the Edmonds-Karp algorithm.

Theorem 4.2.

If the MEMF algorithm is run on the Ishikawagraph ˆ G = ( ˆ V ∪ { , } , ˆ E ) with source and terminal , thenthe total number of augmentations performed by the algorithm is O ( | ˆ V|| ˆ E| ) .Proof. The proof follows the steps of standard proof of theEdmonds-Karp algorithm. See Appendix A for details.Let us analyze the time complexity of each sub-routine below. Note that, both the subroutines short-est augmenting path and augment runs in O ( |E| (cid:96) ) time,since, in the worst case, both subroutines need to check eachedge in the lower-graph. However, compute edges requiredto run the ﬂow reconstruction algorithm which takes O ( (cid:96) ) time for each variable pair ( i, j ) (assuming a small max-ﬂow problem with (cid:96) nodes and (cid:96) edges, solved usingthe most efﬁcient algorithm [17], see Sec. 3.1). Also lower-graph requires O ( (cid:96) ) time for each variable pair ( i, j ) , sinceit needs to check each of the Ishikawa edges. Hence, theworst case running time of each iteration ( i.e ., augmentationstep) is O ( |E| (cid:96) + K ( (cid:96) + (cid:96) )) = O ( |E| (cid:96) + |E| (cid:96) ) = O ( |E| (cid:96) ) ,where K is the maximum number of ﬂow-reconstructions( i.e ., saturated cross edges) at an augmentation step. Sincethe number of augmentations is bounded by O ( | ˆ V|| ˆ E| ) , theworst case running time of the entire execution of the MEMFalgorithm is O ( |V| (cid:96) |E| (cid:96) |E| (cid:96) ) = O ( |V||E| (cid:96) ) . This is O ( (cid:96) ) slower than the standard Edmonds-Karp algorithmon the Ishikawa graph. Note that, however, MEMF requires O ( (cid:96) ) less memory. Fig. 4:

To ﬁnd an augmenting path in a memory efﬁcient manner,we propose a simpliﬁed representation of the Ishikawa graph interms of blocks corresponding to consecutive non-zero edges ineach column i . FFICIENT ALGORITHM

In the previous section, we have provided a general purposepolynomial time max-ﬂow algorithm that is also memoryefﬁcient. However, for computer vision applications, the BKalgorithm [7] is shown to be signiﬁcantly faster than thestandard max-ﬂow implementations, even though it lacksthe polynomial time guarantee. The basic idea is to maintaina search tree throughout the algorithm instead of buildingthe search tree from scratch at each iteration.Motivated by this, we also propose doing search-tree-recycling similar to the BK algorithm. Since we lose the poly-nomial time guarantee, for increased efﬁciency, we furthersimplify the Ishikawa graph. In particular, we ﬁnd an aug-menting path in a block-graph , that amalgamates the nodesin each column into blocks. Since an augmenting path inour block-graph corresponds to a collection of augmentingpaths in the Ishikawa graph, our algorithm converges infewer iterations than the BK algorithm.

As mentioned above, we ﬁnd an augmenting path in ablock-graph , whose construction is detailed below.Given the parameters φ , we rely on the fact that thereexists a label λ such that φ i : λ = 0 for each i ∈ V . In fact, itis easy to see that in each column i , if all φ i : λ are positive,then there exists a trivial augmenting path from U i : (cid:96) to U i :0 ,and the minimum along the column can be subtracted fromeach φ i : λ . Now, at each column i , we partition the nodes U i : λ for all λ ∈ { , · · · , (cid:96) − } into a set of blocks , such thateach node in a block is connected with positive edges φ i : λ .Let us denote these blocks by B i : γ , where γ is indexed frombottom to top starting from 0. Note that there is no edgebetween B i : γ and B i : γ ± . As depicted by Fig. 4, our block-graph then contains only the blocks and the edges betweenthe blocks.The edges in the block-graph are obtained as follows.Let us consider a directed edge ( i, j ) ∈ E + . We add an edge B i : γ → B j : δ , where δ is the smallest value such that φ ij : λµ > for some U i : λ ∈ B i : γ and U j : µ ∈ B j : δ . While doing this,

4. We called this a simpliﬁed graph in [13]. we also enforce that there is no edge B i : γ (cid:48) → B j : δ (cid:48) suchthat γ (cid:48) > γ and δ (cid:48) < δ . The reasoning behind this is that,because of the upward inﬁnite-capacity edges between thenodes U i : λ and U i : λ +1 , we have the following:1) If a node U j : µ can be reached from U i : λ through positiveedges, then the nodes U j : µ (cid:48) , for all µ (cid:48) ≥ µ , can also bereached.2) If a node U j : µ can be reached from U i : λ through positiveedges, then it can also be reached from the nodes U i : λ (cid:48) ,for all λ (cid:48) ≤ λ .Hence, an edge B i : γ → B j : δ indicates the fact that thereis some positive ﬂow possible from any node U i : λ ∈ B i : γ (cid:48) ,for all γ (cid:48) ≤ γ , to any node U j : µ ∈ B j : δ (cid:48) , for all δ (cid:48) ≥ δ . Inother words, the set of edges obtained by this procedure issufﬁcient.Now, the relationship between augmenting paths in theoriginal Ishikawa graph and in our block-graph can becharacterized by the following theorem. Theorem 5.1.

Given the set of Ishikawa graph parameters φ ,there is an augmenting path in the block-graph if and only if thereexists an augmenting path in the Ishikawa graph.Proof. The basic idea of this proof is the same as the proofof Theorem 4.1. See Appendix B.Note that the block-graph can only be used to ﬁnd anaugmenting path; the quantity of the maximum permissibleﬂow cannot be determined in this graph. Therefore, thecapacity of an edge B i : γ → B j : δ is not important, but itis important to have these edges. Note also that the block-graph is constructed incrementally for each edge ( i, j ) ∈ E .Hence, it only requires us to store the Ishikawa graph pa-rameters φ ij corresponding to the edge ( i, j ) . Furthermore,since the block-graph G b is sparse, an augmenting path canbe found fairly quickly.Furthermore, similar to the BK algorithm, we ﬁnd anaugmenting path P b using BFS and maintain the searchtree throughout the algorithm, by repairing it wheneverthe block-graph is updated. However, since the block-graphneeds to be reconstructed after each augmentation, forsimplicity, we maintain a single tree . More speciﬁcally, wegrow the search tree from source (node 0), in a breadth ﬁrstmanner, and if sink (node 1) is reached, then the augmentingpath P b is found. Now, given an augmenting path P b in the block-graph, wewant to push the maximum permissible ﬂow through it.More speciﬁcally, since P b corresponds to a set of augment-ing paths { p b } in the Ishikawa graph, we will push themaximum ﬂow through each path p b , until no such pathexists. This could be achieved by constructing the subgraph ˆ G p of the Ishikawa graph corresponding to the augmentingpath P b , and then ﬁnding each of the augmenting path p b bysearching in ˆ G p . This would require us to either store ˆ G p (notmemory efﬁcient) or call the ﬂow reconstruction algorithmtoo many times.

5. The BK algorithm maintains two trees, source-tree and sink-tree,but we only maintain the source-tree.

Fig. 5:

An example ﬂow-loop ˜ m (1 , , α ij ) in the block-graph (left)is equivalent to the summation of two ﬂow-loops m (3 , , α ) and m (4 , , α ) in the Ishikawa graph (right), with α ij = α + α . Instead, we propose breaking down the augmentationoperation in the block-graph into a sequence of ﬂow-loopsand a subtraction along a column. Then, the maximumﬂow through the path can be pushed in a greedy manner,by pushing the maximum ﬂow through each ﬂow-loop.Before describing this procedure in detail, we introduce thefollowing deﬁnitions.

Deﬁnition 5.1.

A ﬂow-loop m ( λ, µ, α ) in the Ishikawagraph is deﬁned as the following sequence of operations:First, a value α is pushed down the left column from U i : (cid:96) to U i : λ , then across from U i : λ to U j : µ , and ﬁnally up theright column from U j : µ to U j : (cid:96) . Thus, applying the ﬂow-loop m ( λ, µ, α ) corresponds to replacing φ by φ + ∆ , where ∆ i : λ (cid:48) = − α ∀ λ (cid:48) ≥ λ , ∆ ij : λµ = − α , ∆ ji : µλ = α , ∆ j : µ (cid:48) = α ∀ µ (cid:48) ≥ µ . Deﬁnition 5.2.

A ﬂow-loop ˜ m ( γ, δ, α ) in the block-graph G b is deﬁned by the following sequence of operations: Firsta value α is pushed down the left column from U i : (cid:96) to B i : γ , then across from B i : γ to B j : δ , and ﬁnally up the rightcolumn from B j : δ to U j : (cid:96) .Note that, for a ﬂow-loop ˜ m ( γ, δ, α ) to be permissible,block B i : γ must contain node U i : (cid:96) − . Note also that the ﬂow-loop ˜ m ( γ, δ, α ) can be thought of as a summation of ﬂow-loops m ( λ, µ, α (cid:48) ) , where U i : λ ∈ B i : γ and U j : µ ∈ B j : δ (cid:48) , forall δ (cid:48) ≥ δ (see Fig. 5).Given these deﬁnitions, one can easily see that the aug-mentation operation along the path P b can be broken downinto a sequence of ﬂow-loops ˜ m ( γ, δ, α ) and a subtractionalong the last column k , as illustrated in Fig. 6. Now, wepush the maximum permissible ﬂow through P b , using thefollowing greedy approach.For each edge B i : γ → B j : δ that is part of the path P b , we apply a ﬂow-loop ˜ m ( γ, δ, α ij ) , where α ij is themaximum permissible ﬂow through the edge B i : γ → B j : δ .In fact, applying this ﬂow-loop translates to reconstructingthe Ishikawa edge capacities φ ij corresponding to edge ( i, j ) and then applying ﬂow-loops m ( λ, µ, α (cid:48) ) for all λ ≥ ˇ λ and Fig. 6: An augmentation operation is broken down into a sequenceof ﬂow-loops ˜ m ( γ, δ, α ) , and a subtraction along the column k .The augmenting path P s is highlighted in red. Algorithm 2

Memory Efﬁcient Max Flow (MEMF) - EfﬁcientVersion

Require: φ (cid:46) Initial Ishikawa capacities Σ ← , T ← ∅ (cid:46) Initialize exit-ﬂows and search tree G b ← block-graph( φ ) (cid:46) Initial block-graph repeat ( T, P b ) ← augmenting path( G b , T ) (cid:46) Sec. 5.1 Σ ← augment( P b , φ , Σ ) (cid:46) Sec. 5.2 for each edge ( i, j ) ∈ E affected by augmentation do φ ij ← compute edges( φ , Σ , i, j ) (cid:46) Sec. 3.1 G ijb ← block-graph( φ ij , i, j ) (cid:46) Sec. 5.1 T ← repair tree( T, G b ) (cid:46) Repair search tree until no augmenting paths possible return get labelling( T ) (cid:46) Read from search tree µ ≥ ˇ µ , starting from ˇ λ and ˇ µ , until no permissible ﬂow-loop m ( λ, µ, α (cid:48) ) exists, with ˇ λ and ˇ µ the smallest valuessuch that U i : λ ∈ B i : γ and U j : µ ∈ B j : δ . Finally, in the lastcolumn k , all the values φ k : λ are positive, and the minimumalong column k is subtracted from each φ k : λ . It is easy to seethat this approach pushes the maximum permissible ﬂowthrough the path P b .Since, for each edge ( i, j ) , we do not store all the (cid:96) capacities, but only the (cid:96) exit-ﬂows Σ , augmentation mustthen also update these values. Fortunately, there is a directrelation between the ﬂow-loops and Σ . To see this, let usconsider the example ﬂow-loop ˜ m (1 , , α ij ) shown in Fig. 5.Applying this ﬂow-loop updates the corresponding exit-ﬂows as Σ ij :3 = Σ ij :3 + α , (9) Σ ji :1 = Σ ji :1 − α , Σ ij :4 = Σ ij :4 + α , Σ ji :4 = Σ ji :4 − α . Similar updates can be done for all ﬂow-loops in our proce-dure. Note that the edge B i : γ → B j : δ represents a collectionof possible paths from all the nodes U i : λ ∈ B i : γ to all thenodes U j : µ ∈ B j : δ (cid:48) , for all δ (cid:48) ≥ δ . Therefore, unlike in thefull Ishikawa graph, after applying a ﬂow-loop, the portionof the graph G ijb corresponding to edge ( i, j ) ∈ E needs tobe reconstructed. This, however, can be done in a memoryefﬁcient manner, since it only involves one edge ( i, j ) at atime. Our memory efﬁcient max-ﬂow (

MEMF ) method is summa-rized in Algorithm 2. Let us brieﬂy explain the subroutinesbelow. block-graph : Given the initial Ishikawa parameters φ , this subroutine constructs the block-graph by amalga-mating nodes into blocks as described in Section 5.1. Ifthe input to the subroutine is the Ishikawa capacities φ ij corresponding to the edge ( i, j ) ∈ E , then it constructs theblock-graph portion G ijb . augmenting path : Given the block-graph G b and thesearch tree T , this subroutine ﬁnds an augmenting path P b by growing the search tree, as discussed in Section 5.1. augment : Given the path P b , this subroutine pushesthe maximum permissible ﬂow through it by applying ﬂow-loops ˜ m ( γ, δ, α ) and then subtracting the minimum from thelast column, as discussed in Section 5.2. compute edges : This is the same subroutine as inAlgorithm 1. (see Sec. 4.3). repair tree : This subroutine is similar to the adoption stage of the BK algorithm. Given the reconstructed block-graph, the search tree T is repaired by checking for valid parent for each orphan node. See section 3.2.3 in [7] for moredetail. get labelling : This subroutine directly reads the op-timal labelling from the search tree T .As discussed Section 4.3, the space complexity of ouralgorithm is O ( |E| (cid:96) ) . For the rest of the paper, this efﬁcientversion of the algorithm is referred to as MEMF. QUIVALENCE WITH M ESSAGE PASSING

In this section, we will give a more insightful interpretationof our max-ﬂow algorithm, by showing equivalence withthe min-sum message passing algorithm. In particular, ﬁrstwe will characterize the notion of an augmenting path inthe message passing context. Then, we will show that, themax-ﬂow algorithm is, in spirit, equivalent to min-summessage passing, for multi-label submodular MRFs. Finally,we observe the relationship between the set of exit-ﬂowsin our algorithm and the set of messages in the messagepassing algorithm. To this end, ﬁrst let us deﬁne the multi-label graph which will be used to explain the equivalence.

An alternative way of representing the multi-label energyfunction (1) is by deﬁning indicator variables x i : λ ∈ { , } ,where x i : λ = 1 if and only if x i = λ . For a given i , exactlyone of x i : λ ; λ ∈ L can have value 1. In terms of the indicatorvariables, the energy function (1) may be written as E θ ( x ) = (cid:88) i ∈V (cid:88) λ ∈L θ i : λ x i : λ + (cid:88) ( i,j ) ∈E (cid:88) λ,µ ∈L θ ij : λµ x i : λ x j : µ , (10)where the values θ are a particular set of parameters de-termining the energy function. One may deﬁne a graph,called a multi-label graph , with nodes denoted by X i : λ ; i ∈V , λ ∈ L , as shown in Fig. 7. This graph represents theenergy function. Given a labelling x , the value of the energyfunction is obtained by summing the weights on all nodes Fig. 7: Example of a multi-label grpah. Here the nodes representthe unary potentials θ i : λ and the edges represent the pairwisepotentials θ ij : λµ . with x i : λ = 1 (in other words x i = λ ) plus the weights θ ij : λµ such that x i : λ = 1 and x j : µ = 1 .Furthermore, given the Ishikawa edge capacities φ theparameters θ can be calculated using Eq. 4. Note that, inthis case, E θ ( x ) = E φ ( x ) for all labellings x . The energy function (10) can be written in different waysas a sum of unary and pairwise terms . In particular, theremay exist a different set of parameters θ (cid:48) such that E θ ( x ) = E θ (cid:48) ( x ) for all x , denoted as E θ ≡ E θ (cid:48) . The conditions underwhich E θ ≡ E θ (cid:48) are well known [10], [15]. Lemma 6.1.

Two energy functions E θ and E θ (cid:48) are equivalent ifand only if there exist values m ij : λ and m ji : µ for ( i, j ) ∈ E and λ, µ ∈ L such that θ (cid:48) ij : λµ = θ ij : λµ − m ij : λ − m ji : µ ,θ (cid:48) i : λ = θ i : λ + (cid:88) ( k,i ) ∈E + m ik : λ . The values of m ij : λ constitute a message m ij passed fromthe edge ( i, j ) to the node i ; it may be thought of as amessage vector (indexed by λ ). A message m ij causes values m ij : λ to be swept out of all the edges θ ij : λµ and adds ontothe nodes θ i : λ . Messages are passed in both directions froman edge ( i, j ) .Note that, reparametrization provides an alternative wayof implementing the min-sum message passing algorithm.In particular, the objective of min-sum message passing isto reparametrize the energy function so that the constantterm θ c is maximized, while keeping the parameters θ non-negative, where the constant term is deﬁned as follows: θ c = (cid:88) i ∈V min λ ∈L θ i : λ . (11)For more details see [10], [18]. We will now give an equivalent notion of an augmentingpath in the context of the multi-label graph. To this end, we will ﬁrst understand the motivation behind ﬁnding anaugmenting path in the Ishikawa graph.Let us consider an augmenting path in the Ishikawagraph. If it is a trivial augmenting path, i.e ., it is an aug-menting path along a column from nodes U i : (cid:96) to U i :0 , thenpushing the maximum ﬂow along the path translates intosubtracting the minimum value from each φ i : λ ; λ ∈ L . Inthe multi-label graph, it is trivially equivalent to subtractingthe minimum from each θ i : λ ; λ ∈ L .Let us consider a more interesting augmenting path inthe Ishikawa graph, which contains at least one cross edge e ij : λµ ∈ ˆ E c . Similar to the discussion in Section 5.2, one caneasily see that, the augmentation operation can be brokendown into a sequence of ﬂow-loops m ( λ, µ, α ) (see Def. 5.1)and a subtraction along a column. This intuitively suggeststhat an augmenting path in the Ishikawa graph can betranslated into a trivial augmenting path by passing ﬂowaround loops, i.e ., they differ by a null ﬂow. Therefore, themotivation of ﬁnding an augmenting path, is to pass ﬂowaround loops to get a trivial augmenting path.Note that, the notion of a trivial augmenting path in themulti-label graph is, θ i : λ > ∀ λ ∈ L , (12)for some i ∈ V . Furthermore, by Lemma 3.1 and byLemma 6.1, a ﬂow-loop (or null ﬂow) corresponds to areparametrization of the multi-label energy function. Hence,the notion of an augmenting path in the multi-label graphcan be characterized as, ﬁnding a set of reparametrizationsthat makes θ i : λ positive for all λ ∈ L for some i ∈ V . Note that, as we have discussed above, an augmentingpath in the Ishikawa graph can be translated into a triv-ial augmenting path by passing ﬂow around loops. Fur-thermore, pushing the maximum ﬂow through a trivialaugmenting path is simply a subtraction of the minimumvalue min λ ∈L φ i : λ from each φ i : λ . In fact, one can accu-mulate the total ﬂow passed from source to sink, which isexactly the constant term deﬁned in Eq. 11. Hence, max-ﬂow tries to maximize θ c , by passing ﬂow around loops( i.e ., reparametrizing), while keeping the edge capacities φ non-negative. This is, in spirit, equivalent to the min-sum message passing algorithm. Note that, the optimalityof min-sum message passing for the case of multi-labelsubmodular MRFs, is observed in [15], [19]. As mentioned above, from Lemma 3.1 and Lemma 6.1, it isclear that, a ﬂow-loop corresponds to a reparametrization ofthe multi-label energy function. In this section, we will ﬁndthe equivalent reparametrization of a ﬂow-loop m ( λ, µ, α ) .This will later allow us to understand the relationshipbetween the set of exit-ﬂows and the set of messages. Letus now state and prove our theorem. Fig. 8: A ﬂow m (2 , , α ) in the Ishikawa graph (left) and itsequivalent reparametrization in the multi-label graph (right).Note that, the exit-ﬂow vectors ( Σ ij , Σ ji ) and the correspondingmessage vectors ( m ij , m ji ) are shown next to the nodes. Theorem 6.1.

Applying a ﬂow-loop m ( λ, µ, α ) in the Ishikawagraph is equivalent to a reparametrization in the multi-labelgraph, with messages m ij : λ (cid:48) = − α ∀ λ (cid:48) ≥ λ , (13) m ji : µ (cid:48) = α ∀ µ (cid:48) ≥ µ . Proof.

Let the Ishikawa parameters be φ and the multi-labelgraph parameters be θ and assume that the ﬂow is appliedbetween columns i and j . Also after the ﬂow the parametersbe φ (cid:48) and θ (cid:48) respectively. Since θ can be calculated from φ using Eq. 4, E θ ≡ E φ . Similarly E φ (cid:48) ≡ E θ (cid:48) . Also fromLemma 3.1, E φ ≡ E φ (cid:48) . Hence, E θ ≡ E φ ≡ E φ (cid:48) ≡ E θ (cid:48) . Nowfrom Deﬁnition 5.1, φ (cid:48) i : λ (cid:48) = φ i : λ (cid:48) − α ∀ λ (cid:48) ≥ λ , (14) φ (cid:48) j : µ (cid:48) = φ j : µ (cid:48) + α ∀ µ (cid:48) ≥ µ ,φ (cid:48) ij : λµ = φ ij : λµ − α ,φ (cid:48) ji : µλ = φ ji : µλ + α . Substituting in Eq. 4, θ (cid:48) i : λ (cid:48) = θ i : λ (cid:48) − α ∀ λ (cid:48) ≥ λ , (15) θ (cid:48) j : µ (cid:48) = θ j : µ (cid:48) + α ∀ µ (cid:48) ≥ µ ,θ (cid:48) ij : λ (cid:48) µ (cid:48) = θ ij : λ (cid:48) µ (cid:48) − α ∀ λ (cid:48) < λ, µ (cid:48) ≥ µ ,θ (cid:48) ij : λ (cid:48) µ (cid:48) = θ ij : λ (cid:48) µ (cid:48) + α ∀ λ (cid:48) ≥ λ, µ (cid:48) < µ . Now, since E θ ≡ E θ (cid:48) , by Lemma 6.1, there exists messages m ij : λ and m ji : µ such that, θ (cid:48) ij : λµ = θ ij : λµ − m ij : λ − m ji : µ , (16) θ (cid:48) i : λ = θ i : λ + (cid:88) ( k,i ) ∈E + m ki : λ . With a little bit of calculation, one can see that, the messagestake the following form m ij : λ (cid:48) = − α ∀ λ (cid:48) ≥ λ , (17) m ji : µ (cid:48) = α ∀ µ (cid:48) ≥ µ . Note that, for a permissible ﬂow m ( λ, µ, α ) , the parameters φ (cid:48) and θ (cid:48) are non-negative.This equivalence is shown in Fig. 8 for an exampleﬂow-loop m (2 , , α ) . Note that, as shown in the ﬁgure, the ﬂow α through an edge φ ij : λµ may be recorded in the setof exit-ﬂows Σ . Furthermore, as shown in the ﬁgure, therelationship between the set of exit-ﬂows and the set ofmessages can be written as, Σ ij : λ = m ij : λ − − m ij : λ ∀ λ ∈ { , · · · , (cid:96) − } , (18) Σ ji : µ = m ji : µ − − m ji : µ ∀ µ ∈ { , · · · , (cid:96) − } . ELATED WORK

The approaches that have been proposed to minimize multi-label submodular MRFs can be roughly grouped into twocategories: Those based on max-ﬂow and those based onan LP relaxation of the problem. Below, we brieﬂy reviewrepresentative techniques in each category.

The most popular method to minimize a multi-label sub-modular MRF energy is to construct the Ishikawa graph [1]and then apply a max-ﬂow algorithm to ﬁnd the min-cutsolution. Broadly speaking, there are three different kinds ofmax-ﬂow algorithms: those relying on ﬁnding augmentingpaths [3], the push-relabel approach [20] and the pseudo-ﬂow techniques [21]. Even though numerous implementa-tions are available, the BK method [7] is arguably the fastestimplementation for 2D and sparse 3D graphs. Recently,for dense problems, the IBFS algorithm [22] was shown tooutperform the BK method in a number of experiments [23].All the above-mentioned algorithms, however, require thesame order of storage as the Ishikawa graph and hence scalepoorly. Two approaches have nonetheless been studied toscale the max-ﬂow algorithms. The ﬁrst one explicitly relieson the N-D grid structure of the problem at hand [24], [25].The second one makes use of distributed computing [9],[26], [27]. Unfortunately, both these approaches require ad-ditional resources (disk space or clusters) to run max-ﬂowon an Ishikawa graph. By contrast, our algorithm lets usefﬁciently minimize the energy of much larger Ishikawa-type graphs on a standard computer. Furthermore, usingthe method of [9], it can also be parallelized.

One memory-efﬁcient way to minimize a multi-label sub-modular MRF energy consists of formulating the problemas a linear program and then maximize the dual usingmessage-passing techniques [18]. Many such algorithmshave been studied [10], [11], [12], [15]. Even though thesealgorithms are good at approximating the optimal solu-tion (also theoretically optimal for multi-label submodularMRFs [19]), as evidenced by the comparison of [28] and byour experiments, they usually take much longer to convergeto the optimal solution than max-ﬂow-based techniques.

XPERIMENTS

We evaluated our algorithm on the problems of stereocorrespondence estimation and image inpainting. For stereocorrespondence estimation, we employed six instances fromthe Middlebury dataset [29], [30]: Tsukuba, Venus, Saw-tooth, Map, Cones and Teddy, and one instance from theKITTI dataset [31] (see Fig. 9). For Tsukuba and Venus, weused the unary potentials of [32], and for all other stereo Fig. 9:

Left and right images of the stereo instance from theKITTI dataset. The images are of size × , and we setthe number of labels to 40. This image pair was chosen arbitrarilyas a representative of the dataset. Augmenting path length F r equen cy Median = 5Mean = 14Maximum = 218

Fig. 10:

Lengths of augmenting paths found by our algorithm forthe Tsukuba stereo instance (see Sec. 8.1). Each bar indicates theproportion of paths of a certain length. For example, out of allaugmenting paths of them were of length 2. The red arrowindicates the median length. cases, those of [33]. For inpainting, we used the Penguinand House images employed in [32], and we used the sameunary potentials as in [32]. In all the above cases, we usedpairwise potentials that can be expressed as θ ij ( x i , x j ) = w ij θ ( | x i − x j | ) , (19)where, unless stated otherwise, the regularizer θ ( | x i − x j | ) is the quadratic function. Furthermore, we employed a 4-connected neighbourhood structure, in all our experiments.We compare our results with two max-ﬂow implemen-tations: the BK algorithm [7] and Excesses IncrementalBreadth First Search (EIBFS) [34] (which we ran on theIshikawa graph), and three LP relaxation-based algorithms:Tree Reweighted Message Passing (TRWS) [10], Subgradientbased Dual Decomposition (DDSG) [11] and the AdaptiveDiminishing Smoothing algorithm (ADSal) [12]. For DDSGand ADSal, we used the Opengm [35] implementations. Forthe other algorithms, we employed the respective authors’implementations.In practice, we only ran the BK algorithm and EIBFS ifthe graph could be stored in RAM. Otherwise, we providean estimate of their memory requirement. For LP relaxation-based methods, unless they converged, we ran the algo-rithms either for 10000 iterations, or for 50000 seconds,whichever occurred ﬁrst. Note that the running times re-ported for our algorithm include graph construction. All ourexperiments were conducted on a 3.4 GHz i7-4770 CPU with16 GB RAM.The memory consumption and running times of the al-gorithms are provided in Table 1. Altogether, our algorithmlets us solve much larger problems than the BK algorithmand EIBFS, and is an order of magnitude faster than state-of-the-art message-passing algorithms. In this section, we empirically analyze various propertiesof our algorithm. First, note that, at each iteration, i.e ., Problem Memory [MB] Time [s]BK EIBFS DDSG ADSal TRWS MEMF BK EIBFS DDSG ADSal TRWS MEMFTsukuba 3195 2495 258 252 287 > > > >

208 494 219 57 > >

939 5024 1200 - - > > Teddy *72303 *55063

939 5025 1200 - - > > KITTI *88413 *67316 > > > Penguin *173893 *130728 236 1123

663 - - > > > House *521853 *392315 689 2389 > > > TABLE 1:

Memory consumption and runtime comparison with state-of-the-art baselines for quadratic regularizer (see para. 2 of Sec. 8,for details on the algorithms). A “*” indicates a memory estimate, and “ > ” indicates that the algorithm did not converge to theoptimum within the speciﬁed time. Note that our algorithm has a memory consumption O ( (cid:96) ) times lower than the max-ﬂow-basedmethods and is an order of magnitude faster than message-passing algorithms. Compared to EIBFS, our algorithm is only 4 – 7 timesslower, but requires 12 – 23 times less memory, which makes it applicable to more realistic problems. In all stereo problems, TRWScached the pairwise potentials in an array for faster retrieval, but in the case of inpainting, it was not possible due to excessive memoryrequirement. at each augmentation step, our algorithm performs morecomputation than standard max-ﬂow. Therefore, we wouldlike our algorithm to ﬁnd short augmenting paths and toconverge in fewer iterations than standard max-ﬂow. Below,we analyze these two properties empirically.In Fig. 10, we show the distribution of the lengths of theaugmenting paths found by our algorithm for the Tsukubastereo instance. Note that the median length is only 5. Asa matter of fact, the maximum length observed over allour experiments was 1073 for the KITTI data. Nevertheless,even in that image, the median length was only 15. Notethat, since our algorithm ﬁnds augmenting paths in theblock-graph, the path lengths are not directly comparableto those found by other max-ﬂow-based methods. In termsof number of augmentations, we found that our algorithmonly required between 35% and 50% of the total number ofaugmentations of the BK algorithm.Next, we ﬁxed the number of labels but varied the imagesize and compare the running times of the max-ﬂow algo-rithms, for Tsukuba and Penguin instances in Fig. 11a, 11b.Similarly, we ﬁxed the image size but varied the number oflabels and report the running times in Fig. 11c, 11d. By doingthis, we try to estimate the empirical time complexity of ouralgorithm. Note that, similar to other max-ﬂow algorithms,MEMF exhibited near-linear performance with respect to theimage size and near-cubic performance with respect to thenumber of labels, in these experiments.Finally, we report the percentage of time taken by eachsubroutine of our algorithm, for Tsukuba and Penguin in-stances in Fig. 12. Note that the individual time complexitiesof the subroutines compute edges and block-graph are O ( (cid:96) ) and O ( (cid:96) ) , respectively. Therefore, they become dominantwhen the number of labels is large, and hence the cor-responding percentages of time are high, particularly forPenguin. Since our algorithm can simply replace standard max-ﬂowin Ishikawa-type graphs, we replaced the BK method withour MEMF procedure in the IRGC algorithm [6], whichminimizes MRFs with some non-convex pairwise poten-tials (or regularizers) by iteratively building and solvingan Ishikawa graph. This lets us tackle much larger non-submodular problems. In particular, we computed inpaint-

Problem Memory [MB] Time [s]BK MEMF BK MEMFPenguin-128/10 4471

332 224

498 106 - House-256/60 *137248 - TABLE 2:

Memory consumption and runtime comparison ofIRGC+expansion with either the BK method or our MEMFalgorithm as subroutine (see Sec. 8.2). Here, “Penguin-128/10”corresponds to the Penguin problem with 128 labels and the trun-cated quadratic function with truncation value 10 as regularizer.A “*” indicates a memory estimate. Compared to BK method,MEMF is only 4 – 11 times slower but requires 13 – 18 times lessmemory, which makes it applicable to much larger MRFs. ing results on Penguin by using all labels, as opposed tothe down-sampled label sets used in [6]. The results of theIRGC+expansion algorithm, with the BK method and withMEMF are summarized in Table 2.

Since robust regularizers are highly effective in computervision, we tested our algorithm by choosing Huber lossfunction [36] as the regularizer, θ ( | x i − x j | ) = (cid:26) | x i − x j | if | x i − x j | ≤ δδ (cid:0) | x i − x j | − δ (cid:1) otherwise , (20)where δ is the Huber value. The results are summarized inTable 3. In this experiment, the Huber value was set to 4for Tsukuba, Venus and Sawtooth, 6 for Map, 20 for Conesand Teddy, 10 for KITTI and 25 for Penguin and House.Note that, the Ishikawa graph for a Huber regularizer issigniﬁcantly smaller, i.e ., the number of edges per variablepair is O ( δ(cid:96) ) , instead of O ( (cid:96) ) . Even in this case, ouralgorithm lets us solve much larger problems than the BKalgorithm and EIBFS, and is an order of magnitude fasterthan state-of-the-art message-passing algorithms. We parallelized our algorithm based on the dual-decomposition technique of [9] and evaluated on the samesix stereo instances from the Middlebury dataset [29], [30].The relative times t m /t s , where t m stands for the multi-thread time and t s for the single-thread one, are shown No of pixels T i m e ( s ) BKEIBFSMEMFy = xy = x (a) Tsukuba No of pixels T i m e ( s ) MEMFy = xy = x (b) Penguin No of labels T i m e ( s ) BKEIBFSMEMFy = x y = x (c) Tsukuba

32 64 128 25610 No of labels T i m e ( s ) BKEIBFSMEMFy = x y = x (d) Penguin Fig. 11:

Running time plots (in logarithmic scale) by changing the image size ( a , b ) and by changing the number of labels ( c , d ),for Tsukuba and Penguin (see Sec. 8.1). The dashed lines provide the reference slopes. Note that, all algorithms exhibited near-linearperformance with respect to the number of pixels and near-cubic performance with respect to the number of labels, but MEMF required O ( (cid:96) ) less memory. The plots of BK algorithm and EIBFS are not complete, since we could not run them due to excessive memoryrequirement. Problem Memory [MB] Time [s]BK EIBFS TRWS MEMF BK EIBFS TRWS MEMFTsukuba 1715 1385 287

198 28Venus 3375 2719 638

211 57Sawtooth 3348 2698 633

467 34Map 2680 2116 494 > - - 1118 Teddy *42155 *32167 5025 - - 6879

KITTI *42161 *32627 6416 - - > Penguin *33487 *25423

663 - - > House *100494 *76295 > TABLE 3:

Memory consumption and runtime comparison with state-of-the-art baselines for Huber regularizer (see Sec. 8.3). A “*”indicates a memory estimate, and “ > ” indicates that the algorithm did not converge to the optimum within the speciﬁed time. Notethat our algorithm has a much lower memory consumption than the max-ﬂow-based methods and is an order of magnitude fasterthan message-passing algorithms. Compared to EIBFS, our algorithm is 7 – 11 times slower, but requires 7 – 10 times less memory,which makes it applicable to more realistic problems. In all stereo problems, TRWS cached the pairwise potentials in an array for fasterretrieval, but in the case of inpainting, it was not possible due to excessive memory requirement. aug m en t i ng_pa t haug m en t c o m pu t e_edge s b l o ck − g r aph r epa i r _ t r ee R un t i m e % Tsukuba, 16 labelsPenguin, 256 labels

Fig. 12:

Percentage of time taken by each subroutine (see Sec. 8.1).Note that, in Penguin, due to large number of labels, the percent-ages of time spend on compute edges and block-graph are high. in Fig. 13 for two and four threads. In this experiment,for all problems, the image grid was split vertically intotwo and four equally-sized blocks, respectively. Note thatthis spliting strategy is fairly arbitrary, and may affect theperformance of the multi-threaded algorithm. In fact ﬁndingbetter splits may itself be a possible future direction.

ONCLUSION

We have introduced a variant of the max-ﬂow algorithmthat can minimize multi-label submodular MRF energiesoptimally, while requiring much less storage. Furthermore,our experiments have shown that our algorithm is an orderof magnitude faster than state-of-the-art methods. We there-fore believe that our algorithm constitutes the method ofchoice to minimize Ishikwa type graphs when the completegraph cannot be stored in memory. T s u k uba V enu s S a w t oo t h M ap C one s T edd y R e l a t i v e t i m e Fig. 13:

Our algorithm can be accelerated using the parallel max-ﬂow technique (see Sec. 8.4). The relative times ranged from 0.56to 0.99 with 2-threads and from 0.39 to 0.83 with 4-threads.In Teddy, in the case of 2-threads, the multi-threaded algorithmperforms almost the same as the single-threaded algorithm, whichmay be due to bad splits. A PPENDIX AT IME C OMPLEXITY A NALYSIS OF THE P OLYNOMIALTIME VERSION OF

MEMF

Let us denote the Ishikawa graph with ˆ G = ( ˆ V ∪ { , } , ˆ E ) and the lower-graph with ˆ G s = ( ˆ V ∪ { , } , ˆ E s ) . In thissection, we denote the nodes with u , v , etc. Also the notation u ≤ u means, the node u and u are in the same columnwhere u is below u . Let ˆ G sf denotes the residual graph ofthe lower-graph after the ﬂow f and similarly ˆ E sf denotesthe set of non-zero residual edges. Let d f ( u, v ) denotes theshortest path distance from u to v calculated by MEMF.

6. The superscript s is used to restate the fact, that the lower-graph isa subgraph of the Ishikawa graph. Lemma A.1.

If the MEMF algorithm is run on the Ishikawagraph ˆ G = ( ˆ V ∪ { , } , ˆ E ) with source and terminal , thenfor any node v ∈ ˆ V , the shortest path distance d f (0 , v ) in theresidual lower-graph ˆ G sf increases monotonically with each ﬂowaugmentation.Proof. We will suppose that for some node v ∈ ˆ V , there isa ﬂow augmentation that causes the shortest path distancefrom to v to decrease, and then we will derive a contradic-tion. Let f be the ﬂow just before the ﬁrst augmentation thatdecreases some shortest path distance, and let f (cid:48) be the ﬂowjust afterward. Let v be the node with the minimum d f (cid:48) (0 , v ) whose distance was decreased by the augmentation, so that d f (cid:48) (0 , v ) < d f (0 , v ) . Let p = 0 (cid:32) u → v be a shortest pathfrom to v in ˆ G sf (cid:48) , so that ( u, v ) ∈ ˆ E sf (cid:48) and d f (cid:48) (0 , v ) = (cid:26) d f (cid:48) (0 , u ) if u < v (inﬁnite edge) d f (cid:48) (0 , u ) + 1 otherwise . (21)Because of how we chose v , we know that the distance ofnode u from the source did not decrease, i.e. , d f (cid:48) (0 , u ) ≥ d f (0 , u ) . (22)We claim that ( u, v ) / ∈ ˆ E sf . Why? If we had ( u, v ) ∈ ˆ E sf , thenwe would also have d f (0 , v ) ≤ d f (0 , u ) + 1 , (23) ≤ d f (cid:48) (0 , u ) + 1 , = d f (cid:48) (0 , v ) , which contradicts our assumption that d f (cid:48) (0 , v ) < d f (0 , v ) .The above argument simply follows even if ( u, v ) is aninﬁnite capacity edge. Hence ( u, v ) / ∈ ˆ E sf .How can we have ( u, v ) / ∈ ˆ E sf and ( u, v ) ∈ ˆ E sf (cid:48) ? Notethat, in this case, ( u, v ) cannot be an inﬁnite capacity edge.There can be two reasons:1) A new lowest edge ( u, v ) is created due to the ﬂowfrom v to u . That means the augmentation must haveincreased the ﬂow from v to u . The MEMF algorithmalways augments ﬂow along shortest paths, and there-fore the shortest path from to u in ˆ G sf has ( v, u ) as itslast edge. Therefore, d f (0 , v ) = d f (0 , u ) − , (24) ≤ d f (cid:48) (0 , u ) − , = d f (cid:48) (0 , v ) − , which contradicts our assumption that d f (cid:48) (0 , v )

If the MEMF algorithm is run on the Ishikawagraph ˆ G = ( ˆ V ∪ { , } , ˆ E ) with source and sink , then thetotal number of augmentations performed by the algorithm is O ( | ˆ V|| ˆ E| ) .Proof. We say that an edge ( u, v ) in a residual lower-graph ˆ G sf is critical on an augmenting path p if the residual capacityof p is the residual capacity of ( u, v ) , i.e. , if c f ( p ) = c f ( u, v ) .After we have augmented ﬂow along an augmenting path,any critical edge on the path disappears from the residualgraph. Moreover, at least one edge on any augmenting pathmust be critical. We will show that each of the | ˆ E| edges canbecome critical at most | ˆ V| / times. Furthermore, notethat, an inﬁnite capacity edge cannot be critical at any pointof the algorithm.Let u and v be nodes in ˆ V ∪ { , } that are connected byan edge in ˆ E s . Since augmenting paths are shortest paths,when ( u, v ) is critical for the ﬁrst time, we have d f (0 , v ) = d f (0 , u ) + 1 . (26)Once the ﬂow is augmented, the edge ( u, v ) disappears fromthe residual graph. Since we maintain the lowest-cross-edgeproperty, there cannot be an edge ( u, v ) in ˆ G sf for some v < v . Therefore, the edge ( u, v ) cannot reappear later onanother augmenting path until after the ﬂow from u to v for some v ≤ v is decreased, which occurs only if ( v , u ) appears on an augmenting path. If f (cid:48) is the ﬂow when thisevent occurs, then we have d f (cid:48) (0 , u ) = d f (cid:48) (0 , v ) + 1 . (27)Since d f (cid:48) (0 , v ) ≤ d f (cid:48) (0 , v ) , due to the upward inﬁnitecapacity edges, and d f (0 , v ) ≤ d f (cid:48) (0 , v ) by Lemma A.1, wehave d f (cid:48) (0 , u ) = d f (cid:48) (0 , v ) + 1 , (28) ≥ d f (cid:48) (0 , v ) + 1 , ≥ d f (0 , v ) + 1 , = d f (0 , u ) + 2 . Consequently, from the time ( u, v ) becomes critical to thetime when it next becomes critical, the distance of u fromthe source increases by at least 2. The distance of u fromthe source is initially at least 0. The intermediate nodes ona shortest path from to u cannot contain , u or (since ( u, v ) on an augmenting path implies that u (cid:54) = 1 ). Therefore,until u becomes unreachable from the source, if ever, itsdistance is at most | ˆ V| . Thus, after the ﬁrst time that ( u, v ) becomes critical, it can become critical at most | ˆ V| / timesmore, for a total of | ˆ V| / times. Since there are O ( | ˆ E| ) pairs of nodes that can have an edge between them in aresidual graph, the total number of critical edges during theentire execution of the MEMF algorithm is O ( | ˆ V|| ˆ E| ) . Eachaugmenting path has at least one critical edge, and hencethe theorem follows. A PPENDIX BP ROOF OF T HEOREM

Theorem.

Given the set of Ishikawa graph parameters φ , there isan augmenting path in the block-graph if and only if there existsan augmenting path in the Ishikawa graph.Proof. First, we will prove that, if there is an augmentingpath in the block-graph, then there exists an augmentingpath in the Ishikawa graph. It is clear that an augmentingpath in the block-graph contains an edge from node 0 toa block and then a sequence of edges B i : γ → B j : δ andﬁnally an edge from a block to node 1. Note that an edgefrom node 0 to a block B i : γ corresponds to a positive edge e i : (cid:96) − in the Ishikawa graph; similarly an edge from a block B j : δ to node 1 corresponds to a positive edge e j :0 . Now,consider an edge B i : γ → B j : δ in the augmenting path.Corresponding to this, there exists a positive edge e ij : λµ such that U i : λ ∈ B i : γ (cid:48) for some γ (cid:48) ≥ γ and U j : µ ∈ B j : δ in the Ishikawa graph. Also along the column i , there areupward inﬁnite capacity edges, and nodes corresponding toa block are also connected with positive bidirectional edges.Hence, there exists an augmenting path in the Ishikawagraph, corresponding to the augmenting path in the block-graph.Now, we will prove the converse. Consider an augment-ing path in the Ishikawa graph. The path may contain asequence of positive edges e i : λ , e ij : λµ and inﬁnite capacityedges e ii : λλ +1 . Note that, by construction, the e i : λ edgeseither will be in the same block B i : γ in the block-graph, orwill be between a block and node 0 or node 1. Furthermore,the inﬁnite capacity edges either will be in the same block,or there will be an edge B i : γ → B j : δ in the block-graphto represent them. Finally, if e ij : λµ is a positive edge, then,by construction of the block-graph, there exists an edge B i : γ → B j : δ (cid:48) where U i : λ ∈ B i : γ and U j : µ ∈ B j : δ with δ (cid:48) ≤ δ .Hence, if there is an augmenting path in the Ishikawa graph,then there exists an augmenting path in the block-graph. R EFERENCES [1] H. Ishikawa, “Exact optimization for markov random ﬁelds withconvex priors,”

Pattern Analysis and Machine Intelligence, IEEETransactions on , vol. 25, no. 10, pp. 1333–1336, 2003.[2] D. Schlesinger and B. Flach,

Transforming an arbitrary minsumproblem into a binary one . TU, Fak. Informatik, 2006.[3] L. Ford and D. R. Fulkerson,

Flows in networks . PrincetonPrinceton University Press, 1962, vol. 1962.[4] P. Torr and M. Kumar, “Improved moves for truncated convexmodels,” in

Advances in neural information processing systems , 2009,pp. 889–896.[5] O. Veksler, “Multi-label moves for mrfs with truncated convexpriors,”

International journal of computer vision , vol. 98, no. 1, pp.1–14, 2012.[6] T. Ajanthan, R. Hartley, M. Salzmann, and H. Li, “Iterativelyreweighted graph cut for multi-label mrfs with non-convex pri-ors,” in

The IEEE Conference on Computer Vision and Pattern Recog-nition (CVPR) , June 2015.[7] Y. Boykov and V. Kolmogorov, “An experimental comparison ofmin-cut/max-ﬂow algorithms for energy minimization in vision,”

Pattern Analysis and Machine Intelligence, IEEE Transactions on ,vol. 26, no. 9, pp. 1124–1137, 2004.[8] P. Kohli and P. H. Torr, “Efﬁciently solving dynamic markovrandom ﬁelds using graph cuts,” in

Computer Vision, 2005. ICCV2005. Tenth IEEE International Conference on , vol. 2. IEEE, 2005, pp.922–929. [9] P. Strandmark and F. Kahl, “Parallel and distributed graph cuts bydual decomposition,” in

Computer Vision and Pattern Recognition(CVPR), 2010 IEEE Conference on . IEEE, 2010, pp. 2085–2092.[10] V. Kolmogorov, “Convergent tree-reweighted message passing forenergy minimization,”

Pattern Analysis and Machine Intelligence,IEEE Transactions on , vol. 28, no. 10, pp. 1568–1583, 2006.[11] N. Komodakis, N. Paragios, and G. Tziritas, “Mrf energy mini-mization and beyond via dual decomposition,”

Pattern Analysisand Machine Intelligence, IEEE Transactions on , vol. 33, no. 3, pp.531–552, 2011.[12] B. Savchynskyy, S. Schmidt, J. H. Kappes, and C. Schn¨orr, “Efﬁ-cient mrf energy minimization via adaptive diminishing smooth-ing,” in

Uncertainty in Artiﬁcial Intelligence , 2012, pp. 746–755.[13] T. Ajanthan, R. Hartley, and M. Salzmann, “Memory efﬁcient maxﬂow for multi-label submodular mrfs,” in

The IEEE Conference onComputer Vision and Pattern Recognition (CVPR) , June 2016.[14] E. Boros and P. L. Hammer, “Pseudo-boolean optimization,”

Dis-crete applied mathematics , vol. 123, no. 1, pp. 155–225, 2002.[15] T. Werner, “A linear programming approach to max-sum problem:A review,”

Pattern Analysis and Machine Intelligence, IEEE Transac-tions on , vol. 29, no. 7, pp. 1165–1179, 2007.[16] T. H. Cormen, C. E. Leiserson, R. L. Rivest, and C. Stein,

Introduc-tion to algorithms . MIT press Cambridge, 2001, vol. 6.[17] J. B. Orlin, “Max ﬂows in o (nm) time, or better,” in

Proceedings ofthe forty-ﬁfth annual ACM symposium on Theory of computing . ACM,2013, pp. 765–774.[18] M. J. Wainwright, T. S. Jaakkola, and A. S. Willsky, “Map estima-tion via agreement on trees: message-passing and linear program-ming,”

Information Theory, IEEE Transactions on , vol. 51, no. 11, pp.3697–3717, 2005.[19] V. Kolmogorov and M. Wainwright, “On the optimality of tree-reweighted max-product message-passing,”

Uncertainty on Artiﬁ-cial Intelligence , 2005.[20] A. V. Goldberg and R. E. Tarjan, “A new approach to themaximum-ﬂow problem,”

Journal of the ACM (JACM) , vol. 35,no. 4, pp. 921–940, 1988.[21] B. G. Chandran and D. S. Hochbaum, “A computational studyof the pseudoﬂow and push-relabel algorithms for the maximumﬂow problem,”

Operations research , vol. 57, no. 2, pp. 358–376, 2009.[22] A. V. Goldberg, S. Hed, H. Kaplan, R. E. Tarjan, and R. F. Wer-neck, “Maximum ﬂows by incremental breadth-ﬁrst search,” in

Algorithms–ESA 2011 . Springer, 2011, pp. 457–468.[23] T. Verma and D. Batra, “Maxﬂow revisited: An empirical compar-ison of maxﬂow algorithms for dense vision problems.” in

BMVC ,2012, pp. 1–12.[24] A. Delong and Y. Boykov, “A scalable graph-cut algorithm fornd grids,” in

Computer Vision and Pattern Recognition, 2008. CVPR2008. IEEE Conference on . IEEE, 2008, pp. 1–8.[25] O. Jamriˇska, D. S`ykora, and A. Hornung, “Cache-efﬁcient graphcuts on structured grids,” in

Computer Vision and Pattern Recogni-tion (CVPR), 2012 IEEE Conference on . IEEE, 2012, pp. 3673–3680.[26] A. Shekhovtsov and V. Hlav´aˇc, “A distributed mincut/maxﬂowalgorithm combining path augmentation and push-relabel,”

Inter-national journal of computer vision , vol. 104, no. 3, pp. 315–342, 2013.[27] V. Vineet and P. Narayanan, “Cuda cuts: Fast graph cuts on thegpu,” in

Computer Vision and Pattern Recognition Workshops, 2008.CVPRW’08. IEEE Computer Society Conference on . IEEE, 2008, pp.1–8.[28] J. H. Kappes, B. Andres, F. A. Hamprecht, C. Schn¨orr,S. Nowozin, D. Batra, S. Kim, B. X. Kausler, T. Kr¨oger,J. Lellmann, N. Komodakis, B. Savchynskyy, and C. Rother,“A comparative study of modern inference techniques forstructured discrete energy minimization problems,”

InternationalJournal of Computer Vision , pp. 1–30, 2015. [Online]. Available:http://dx.doi.org/10.1007/s11263-015-0809-x[29] D. Scharstein and R. Szeliski, “A taxonomy and evaluation ofdense two-frame stereo correspondence algorithms,”

Internationaljournal of computer vision , vol. 47, no. 1-3, pp. 7–42, 2002.[30] ——, “High-accuracy stereo depth maps using structured light,”in

Computer Vision and Pattern Recognition, 2003. Proceedings. 2003IEEE Computer Society Conference on , vol. 1. IEEE, 2003, pp. I–195.[31] A. Geiger, P. Lenz, C. Stiller, and R. Urtasun, “Vision meetsrobotics: The kitti dataset,”

The International Journal of RoboticsResearch , p. 0278364913491297, 2013.[32] R. Szeliski, R. Zabih, D. Scharstein, O. Veksler, V. Kolmogorov,A. Agarwala, M. Tappen, and C. Rother, “A comparative studyof energy minimization methods for markov random ﬁelds with smoothness-based priors,” Pattern Analysis and Machine Intelli-gence, IEEE Transactions on , vol. 30, no. 6, pp. 1068–1080, 2008.[33] S. Birchﬁeld and C. Tomasi, “A pixel dissimilarity measure thatis insensitive to image sampling,”

Pattern Analysis and MachineIntelligence, IEEE Transactions on , vol. 20, no. 4, pp. 401–406, 1998.[34] A. V. Goldberg, S. Hed, H. Kaplan, P. Kohli, R. E. Tarjan, and R. F.Werneck, “Faster and more dynamic maximum ﬂow by incremen-tal breadth-ﬁrst search,” in

Algorithms–ESA 2015 . Springer, 2015,pp. 619–630.[35] B. Andres, T. Beier, and J. H. Kappes, “Opengm: A c++ library fordiscrete graphical models,” arXiv preprint arXiv:1206.0111 , 2012.[36] P. J. Huber, “Robust estimation of a location parameter,”

TheAnnals of Mathematical Statistics , vol. 35, no. 1, pp. 73–101, 1964.

Thalaiyasingam Ajanthan obtained his BScdegree in Electronics and TelecommunicationEngineering from University of Moratuwa, SriLanka, in 2013. He is now a PhD student at theCollege of Engineering and Computer Science,Australian National University (ANU). He is alsoa member of the Analytics Group at Data61,CSIRO. He is a student member of IEEE.

Richard Hartley is a member of the ComputerVision Group in the Research School of Engi-neering, at ANU, where he has been since Jan-uary, 2001. He is also a member of the Analyticsgroup in Data61, CSIRO. He worked at the GEResearch and Development Center from 1985 to2001, working ﬁrst in VLSI design, and later incomputer vision. He became involved with Im-age Understanding and Scene Reconstructionworking with GEs Simulation and Control Sys-tems Division. He is an author (with A. Zisser-man) of the book Multiple View Geometry in Computer Vision.