Algorithms for Lipschitz Learning on Graphs
aa r X i v : . [ c s . L G ] J un Algorithms for Lipschitz Learning on Graphs ∗† Rasmus KyngYale University [email protected]
Anup RaoYale University [email protected]
Sushant SachdevaYale University [email protected]
Daniel A. SpielmanYale University [email protected]
July 1, 2015
Abstract
We develop fast algorithms for solving regression problems on graphs where one is given the value of a functionat some vertices, and must find its smoothest possible extension to all vertices. The extension we compute is theabsolutely minimal Lipschitz extension, and is the limit for large p of p -Laplacian regularization. We present analgorithm that computes a minimal Lipschitz extension in expected linear time, and an algorithm that computesan absolutely minimal Lipschitz extension in expected time e O ( mn ) . The latter algorithm has variants that seemto run much faster in practice. These extensions are particularly amenable to regularization: we can perform l -regularization on the given values in polynomial time and l -regularization on the initial function values and on graphedge weights in time e O ( m / ) .Our definitions and algorithms naturally extend to directed graphs. We consider a problem in which we are given a weighted undirected graph G = ( V, E, ℓ ) and values v : T → R on a subset T of its vertices. We view the weights ℓ as indicating the lengths of edges, with shorter length indicatinggreater similarity. Our goal it to assign values to every vertex v ∈ V \ T so that the values assigned are as smooth aspossible across edges. A minimal Lipschitz extension of v is a vector v that minimizes max ( x,y ) ∈ E ( ℓ ( x, y )) − (cid:12)(cid:12) v ( x ) − v ( y ) (cid:12)(cid:12) , (1)subject to v ( x ) = v ( x ) for all x ∈ T . We call such a vector an inf-minimizer. Inf-minimizers are not unique. So,among inf-minimizers we seek vectors that minimize the second-largest absolute value of ℓ ( x, y ) − (cid:12)(cid:12) v ( x ) − v ( y ) (cid:12)(cid:12) across edges, and then the third-largest given that, and so on. We call such a vector v a lex-minimizer. It is also knownas an absolutely minimal Lipschitz extension of v .These are the limit of the solution to p -Laplacian minimization problems for large p , namely the vectors that solve min v ∈ R n v | T = v | T X ( x,y ) ∈ E ( ℓ ( x, y )) − p | v ( x ) − v ( y ) | p . (2)The use of p = 2 was suggested in the foundational paper of Zhu et al. (2003), and is particularly nice because it canbe obtained by solving a system of linear equations in a symmetric diagonally dominant matrix, which can be done ∗ This research was partially supported by AFOSR Award FA9550-12-1-0175, NSF grant CCF-1111257, a Simons Investigator Award to DanielSpielman, and a MacArthur Fellowship. † Code used in this work is available at https://github.com/danspielman/YINSlex p has been discussed by Alamgir and Luxburg (2011),and by Bridle and Zhu (2013), but it is much more complicated to compute. The fastest algorithms we know for thisproblem require convex programming, and then require very high accuracy to obtain the values at most vertices. Bytaking the limit as p goes to infinity, we recover the lex-minimizer, which we will show can be computed quickly.The lex-minimization problem has a remarkable amount of structure. For example, in uniformly weighted graphsthe value of the lex-minimizer at every vertex not in T is equal to the average of the minimum and maximum of thevalues at its neighbors. This is analogous to the property of the -Laplacian minimizer that the value at every vertexnot in T equals the average of the values at its neighbors. We first present several important structural properties of lex-minimizers in Section 3.2. As we shall point out, someof these were known from previous work, sometimes in restricted settings. We state them generally and prove themfor completeness. We also prove that the lex-minimizer is as stable as possible under perturbations of v (Section 3.1).The structure of the lex-minimization problem has led us to develop elegant algorithms for its solution. Both thealgorithms and their analyses could be taught to undergraduates. We believe that these algorithms could be used inplace of -Laplacian minimization in many applications.We present algorithms for the following problems. Throughout, m = | E | and n = | V | . Inf-minimization:
An algorithm that runs in expected time O ( m + n log n ) (Section 4.3). Lex-minimization:
An algorithm that runs in expected time O ( n ( m + n log n )) (Section 4), along with a variant thatruns quickly in practice (Section 4.4). l -regularization of edge lengths for inf-minimization: The problem of minimizing (1) given a limited budget withwhich one can increase edge lengths is a linear programming problem. We show how to solve it in time e O ( m / ) with an interior point method by using fast Laplacian solvers (Section 8). The same algorithm can accommodate l -regularization of the values given in v . l -regularization of vertex values for inf-minimization: We give a polynomial time algorithm for l -regularizationof the values at vertices. That is, we minimize (1) given a budget of a number of vertices that can be proclaimedoutliers and removed from T (Section 7.1). We solve this problem by reducing it to the problem of computingminimum vertex covers on transitively closed directed acyclic graphs, a special case of minimum vertex coverthat can be solved in polynomial time.After any regularization for inf-minimization, we suggest computing the lex-minimizer. We find the result for l -regularization of vertex values to be particularly surprising, especially because we prove that the analogous problemfor -Laplacian minimization is NP-Hard (Section 7.2).All of our algorithms extend naturally to directed graphs (Section 5). This is in contrast with the problem ofminimizing -Laplacians on directed graphs, which corresponds to computing electrical flows in networks of resistorsand diodes, for which fast algorithms are not presently known.We present a few experiments on examples demonstrating that the lex-minimizer can overcome known deficien-cies of the -Laplacian minimizer (Section 1.2, Figures 1,2), as well as a demonstration of the performance of thedirected analog of our algorithms on the WebSpam dataset of Castillo et al. (2006) (Section 6). In the WebSpam prob-lem we use the link structure of a collection of web sites to flag some sites as spam, given a small number of labeledsites known to be spam or normal. We first encountered the idea of using the minimizer of the 2-Laplacian given by (2) for regression and classifica-tion on graphs in the work of Zhu et al. (2003) and Belkin et al. (2004) on semi-supervised learning. These workstransformed learning problems on sets of vectors into problems on graphs by identifying vectors with vertices andconstructing graphs with edges between nearby vectors. One shortcoming of this approach (see Nadler et al. (2009),2 ertex position on real line -4 -2 0 2 4 6 8 I n f e rr ed V o l t age -1-0.8-0.6-0.4-0.200.20.40.60.81 lex2-Laplabels Figure 1: Lex vs 2-Laplacian on 1D gaussian clus-ters.
Number of Vertices M ean l e rr o r
50 lex50 l2100 lex100 l2500 lex500 l21000 lex1000 l2
Figure 2: kNN graphs on samples from 4D cube.
Alamgir and Luxburg (2011), Bridle and Zhu (2013)) is that if the number of vectors grows while the number of la-beled vectors remains fixed, then almost all the values of the 2-Laplacian minimizer converge to the mean of thelabels on most natural examples. For example, Nadler et al. (2009) consider sampling points from two Gaussiandistributions centered at and on the real line. They place edges between every pair of points ( x, y ) with length exp( | x − y | / σ ) for σ = 0 . , and provide only the labels v (0) = − and v (4) = 1 . Figure 1 shows the valuesof the -Laplacian minimizer in red, which are all approximately zero. In contrast, the values of the lex-minimizer inblue, which are smoothly distributed between the labeled points, are shown.The “manifold hypothesis” (see Chapelle et al. (2010), Ma and Fu (2011)) holds that much natural data lies near alow-dimensional manifold and that natural functions we would like to learn on this data are smooth functions on themanifold. Under this assumption, one should expect lex-minimizers to interpolate well. In contrast, the -Laplacianminimizers degrade (dotted lines) if the number of labeled points remains fixed while the total number of points grows.In Figure 2, we demonstrate this by sampling many points uniformly from the unit cube in 4 dimensions, form their8-nearest neighbor graph, and consider the problem of regressing the first coordinate. We performed 8 experiments,varying the number of labeled points in { , , , } . Each data point is the mean average l error over 100experiments. The plots for root mean squared error are similar. The standard deviation of the estimations of the meanare within one pixel, and so are not displayed. The performance of the lex-minimizer (solid lines) does not degrade asthe number of unlabeled points grows.Analogous to our inf-minimizers, minimal Lipschitz extensions of functions in Euclidean space and over moregeneral metric spaces have been studied extensively in Mathematics (Kirszbraun (1934), McShane (1934), Whitney(1934)). von Luxburg and Bousquet (2003) employ Lipschitz extensions on metric spaces for classification and relatethese to Support Vector Machines. Their work inspired improvements in classification and regression in metric spaceswith low doubling dimension (Gottlieb et al. (2013), Gottlieb et al. (2013b)). Theoretically fast, although not actuallypractical, algorithms have been given for constructing minimal Lipschitz extensions of functions on low-dimensionalEuclidean spaces (Fefferman (2009a), Fefferman and Klartag (2009), Fefferman (2009b)). Sinop and Grady (2007)suggest using inf-minimizers for binary classification problems on graphs. For this special case, where all of thegiven values are either 0 or 1, they present an O ( m + n log n ) time algorithm for computing an inf-minimizer. Thecase of general given values, which we solve in this paper, is much more complicated. To compensate for the non-uniqueness of inf-minimizers, they suggest choosing the inf-minimizer that minimizes (2) with p = 2 . We believe thatthe lex-minimizer is a more natural choice.The analog of our lex-minimizer over continuous spaces is called the absolutely minimal Lipschitz extension(AMLE). Starting with the work of Aronsson (1967), there have been several characterizations and proofs of the ex-istence and uniqueness of the AMLE (Jensen (1993), Crandall et al. (2001), Barles and Busca (2001), Aronsson et al.(2004)). Many of these results were later extended to general metric spaces, including graphs (Milman (1999),Peres et al. (2011), Naor and Sheffield (2010), Sheffield and Smart (2010)). However, to the best of our knowledge,fast algorithms for computing lex-minimizers on graphs were not known. For the special case of undirected, un-weighted graphs, Lazarus et al. (1999) presented both a polynomial-time algorithm and an iterative method. Oberman32011) suggested computing the AMLE in Euclidean space by first discretizing the problem and then solving the cor-responding graph problem by an iterative method. However, no run-time guarantees were obtained for either iterativemethod. Lexicographic Ordering.
Given a vector r ∈ R m , let π r denote a permutation that sorts r in non-increasing orderby absolute value, i.e. , ∀ i ∈ [ m − , | r ( π r ( i )) | ≥ | r ( π r ( i + 1)) | . Given two vectors r, s ∈ R m , we write r (cid:22) s toindicate that r is smaller than s in the lexicographic ordering on sorted absolute values, i.e. ∃ j ∈ [ m ] , (cid:12)(cid:12) r ( π r ( j )) (cid:12)(cid:12) < (cid:12)(cid:12) s ( π s ( j )) (cid:12)(cid:12) and ∀ i ∈ [ j − , (cid:12)(cid:12) r ( π r ( i )) (cid:12)(cid:12) = (cid:12)(cid:12) s ( π s ( i )) (cid:12)(cid:12) or ∀ i ∈ [ m ] , (cid:12)(cid:12) r ( π r ( i )) (cid:12)(cid:12) = (cid:12)(cid:12) s ( π s ( i )) (cid:12)(cid:12) . Note that it is possible that r (cid:22) s and s (cid:22) r while r = s . It is a total relation: for every r and s at least one of r (cid:22) s or s (cid:22) r is true. Graphs and Matrices.
We will work with weighted graphs. Unless explicitly stated, we will assume that they areundirected. For a graph G , we let V G be its set of vertices, E G be its set of edges, and ℓ G : E G → R + be theassignment of positive lengths to the edges. We let | V G | = n, and | E G | = m. We assume ℓ G is symmetric, i.e. , ℓ G ( x, y ) = ℓ G ( y, x ) . When G is clear from the context, we drop the subscript.A path P in G is an ordered sequence of (not necessarily distinct) vertices P = ( x , x , . . . , x k ) , such that ( x i − , x i ) ∈ E for i ∈ [ k ] . The endpoints of P are denoted by ∂ P = x , ∂ P = x k . The set of interior verticesof P is defined to be int( P ) = { x i : 0 < i < k } . For ≤ i < j ≤ k, we use the notation P [ x i : x j ] to denote thesubpath ( x i , . . . , x j ) . The length of P is ℓ ( P ) = P ki =1 ℓ ( x i − , x i ) . A function v : V → R ∪ {∗} is called a voltage assignment (to G ). A vertex x ∈ V is a terminal withrespect to v iff v ( x ) = ∗ . The other vertices, for which v ( x ) = ∗ , are non-terminals . We let T ( v ) denote theset of terminals with respect to v . If T ( v ) = V, we call v a complete voltage assignment (to G ). We say that anassignment v : V → R ∪ {∗} extends v if v ( x ) = v ( x ) for all x such that v ( x ) = ∗ .Given an assignment v : V → R ∪ {∗} , and two terminals x, y ∈ T ( v ) for which ( x, y ) ∈ E , we define the gradient on ( x, y ) due to v to be grad G [ v ]( x, y ) = v ( x ) − v ( y ) ℓ ( x, y ) . It may be useful to view grad G [ v ]( x, y ) as the current in the edge ( x, y ) induced by voltages v . When v is acomplete voltage assignment, we interpret grad G [ v ] as a vector in R m , with one entry for each edge. However, forconvenience, we define grad G [ v ]( x, y ) = − grad G [ v ]( y, x ) . When G is clear from the context, we drop the subscript.A graph G along with a voltage assignment v to G is called a partially-labeled graph , denoted ( G, v ) . We saythat a partially-labeled graph ( G, v ) is a well-posed instance if for every maximal connected component H of G, wehave T ( v ) ∩ V H = ∅ . A path P in a partially-labeled graph ( G, v ) is called a terminal path if both endpoints are terminals. We define ∇ P ( v ) to be its gradient: ∇ P ( v ) = v ( ∂ P ) − v ( ∂ P ) ℓ ( P ) . If P contains no terminal-terminal edges (and hence, contains at least one non-terminal), it is a free terminal path . Lex-Minimization.
An instance of the L EX -M INIMIZATION problem is described by a partially-labeled graph ( G, v ) . The objective is to compute a complete voltage assignment v : V G → R extending v that lex-minimizes grad [ v ] . Definition 2.1 (Lex-minimizer)
Given a partially-labeled graph ( G, v ) , we define lex G [ v ] to be a complete voltageassignment to V that extends v , and such that for every other complete assignment v ′ : V G → R that extends v , wehave grad G [ lex G [ v ]] (cid:22) grad G [ v ′ ] . That is, lex G [ v ] achieves a lexicographically-minimal gradient assignment to theedges. We call lex G [ v ] the lex-minimizer for ( G, v ) . Note that if T ( v ) = V G , then trivially, lex G [ v ] = v . Basic Properties of Lex-Minimizers
Lazarus et al. (1999) established that lex-minimizers in unweighted and undirected graphs exist, are unique, and maybe computed by an elementary meta-algorithm. We state and prove these facts for undirected weighted graphs, anddefer the discussion of the directed case to Section 5. We also state for directed and weighted graphs characterizationsof lex-minimizers that were established by Peres et al. (2011), Naor and Sheffield (2010) and Sheffield and Smart(2010) for unweighted graphs. These results are essential for the analyses of our algorithms. We defer most proofs toAppendix A.
Definition 3.1 A steepest fixable path in an instance ( G, v ) is a free terminal path P that has the largest gradient ∇ P ( v ) amongst such paths. Observe that a steepest fixable path with ∇ P ( v ) = 0 must be a simple path. Definition 3.2
Given a steepest fixable path P in an instance ( G, v ) , we define fix G [ v , P ] : V G → R ∪ {∗} to be thevoltage assignment defined as follows fix G [ v , P ]( x ) = ( v ( ∂ P ) − ∇ P ( v ) · ℓ G ( P [ ∂ P : x ]) x ∈ int( P ) \ T ( v ) ,v ( x ) otherwise. We say that the vertices x ∈ int( P ) are fixed by the operation fix [ v , P ] . If we define v = fix G [ v , P ] , where P = ( x , . . . , x r ) is the steepest fixable path in ( G, v ) , then it is easy to argue that for every i ∈ [ r ] , we have grad [ v ]( x i − , x i ) = ∇ P (see Lemma A.5). The meta-algorithm M ETA -L EX , spelled out as Algorithm 1, entailsrepeatedly fixing steepest fixable paths. While it is possible to have multiple steepest fixable paths, the result of fixingall of them does not depend on the order in which they are fixed. Theorem 3.3
Given a well-posed instance ( G, v ) , the meta-algorithm M ETA -L EX , which repeatedly fixes steepestfixable paths, produces the unique lex-minimizer extending v . Corollary 3.4
Given a well-posed instance ( G, v ) such that T ( v ) = V G , let P be a steepest fixable path in ( G, v ) . Then, ( G, fix [ v , P ]) is also a well-posed instance, and lex G [ fix [ v , P ]] = lex G [ v ] . Since a lex-minimal element must be an inf-minimizer, we also obtain the following corollary, that can also beproved using LP duality.
Lemma 3.5
Suppose we have a well-posed instance ( G, v ) . Then, there exists a complete voltage assignment v extending v such that (cid:13)(cid:13) grad [ v ] (cid:13)(cid:13) ∞ ≤ α , iff every terminal path P in ( G, v ) satisfies ∇ P ( v ) ≤ α. The following theorem states that lex G [ v ] is monotonic with respect to v and it respects scaling and translation of v . Theorem 3.6
Let ( G, v ) be a well-posed instance with T := T ( v ) as the set of terminals. Then the followingstatements hold.1. For any c, d ∈ R , v a partial assignment with terminals T ( v ) = T and v ( t ) = cv ( t ) + d for all t ∈ T .Then, lex G [ v ]( i ) = c · lex G [ v ]( i ) + d for all i ∈ V G .2. v a partial assignment with terminals T ( v ) = T. Suppose further that v ( t ) ≥ v ( t ) for all t ∈ T. Then, lex G [ v ]( i ) ≥ lex G [ v ]( i ) for all i ∈ V G . As a corollary, the above theorem gives a nice stability property that lex-minimal elements satisfy.
Corollary 3.7
Given well-posed instances ( G, v ) , ( G, v ) such that T := T ( v ) = T ( v ) , let ǫ := max t ∈ T | v ( t ) − v ( t ) | . Then | lex G [ v ]( i ) − lex G [ v ]( i ) | ≤ ǫ for all i ∈ V G . .2 Alternate Characterizations There are at least two other seemingly disparate definitions that are equivalent to lex-minimal voltages. l p -norm Minimizers. As mentioned in the introduction, for a well-posed instance ( G, v ) the lex-minimizer is alsothe limit of l p minimizers. This follows from existing results about the limit of l p -minimizers (Egger and Huotari(1990)) in affine spaces, since { grad [ v ] | v is complete , v extends v } forms an affine subspace of R m . Thus, we havethe following theorem:
Theorem 3.8 (Limit of l p -minimizers, follows from Egger and Huotari (1990)) For any p ∈ (1 , ∞ ) , given a well-posed instance ( G, v ) define v p to be the unique complete voltage assignment extending v and minimizing (cid:13)(cid:13) grad [ v ] (cid:13)(cid:13) p , i.e. v p = arg min v is complete v extends v (cid:13)(cid:13) grad [ v ] (cid:13)(cid:13) p . Then, lim p →∞ v p = lex G [ v ] . Max-Min Gradient Averaging.
Consider a well-posed instance ( G, v ) , and a complete voltage assignment v ex-tending v . If G is such that ℓ ( e ) = 1 for all e ∈ E G , it is easy to see that lex = lex G [ v ] satisfies the following simplecondition for all x ∈ V G \ T ( v ) , lex ( x ) = 12 max ( x,y ) ∈ E G lex ( y ) + min ( x,z ) ∈ E G lex ( z ) ! . This condition should be contrasted to the optimality condition for l -regularization on these instances, which givesfor all non-terminals x, the optimal voltage v satisfies v ( x ) = deg ( x ) P y :( x,y ) ∈ E G v ( y ) . To prove the above claim, consider locally changing lex at x and observe that the gradients of edges not incidentat x remain unchanged, and at least one of edges incident at x will have a strictly larger gradient, contradicting lex-minimality. For general graphs, this condition of local optimality can still be characterized by a simple max-mingradient averaging property as described below. Definition 3.9 (Max-Min Gradient Averaging)
Given a well-posed instance ( G, v ) , and a complete voltage as-signment v extending v , we say that v satisfies the max-min gradient averaging property (w.r.t. ( G, v ) ) if for every x ∈ V G \ T ( v ) , we have max y :( x,y ) ∈ E G grad [ v ]( x, y ) = − min y :( x,y ) ∈ E G grad [ v ]( x, y ) . As stated in the theorem below, lex G [ v ] is the unique assignment satisfying max-min gradient averaging property.Sheffield and Smart (2010) proved a variant of this statement for weighted graphs. For completeness, we present aproof in the appendix. Theorem 3.10
Given a well-posed instance ( G, v ) , lex G [ v ] satisfies max-min gradient averaging property. More-over, it is the unique complete voltage assignment extending v that satisfies this property w.r.t. ( G, v ) . An advantage of this characterization is that it can be verified quickly. This is particularly useful for implementationsfor computing the lex-minimizer.
We now sketch the ideas behind our algorithms and give precise statements of our results. A full description of all thealgorithms is included in the appendix.We define the pressure of a vertex to be the gradient of the steepest terminal path through it: pressure [ v ]( x ) = max {∇ P ( v ) | P is a terminal path in ( G, v ) and x ∈ P } . α > , in order to identify vertices with pressure exceeding α, wecompute vectors vHigh [ α ]( x ) and vLow [ α ]( x ) defined as follows in terms of dist , the metric on V induced by ℓ : vLow [ α ]( x ) = min t ∈ T ( v ) { v ( t ) + α · dist ( x, t ) } vHigh [ α ]( x ) = max t ∈ T ( v ) { v ( t ) − α · dist ( t, x ) } . We first consider the problem of computing the lex-minimizer on a star graph in which every vertex but the center is aterminal. This special case is a subroutine in the general algorithm, and also motivates some of our techniques.Let x be the center vertex, T be the set of terminals, and all edges be of the form ( x, t ) with t ∈ T . The initialvoltage assignment is given by v : T → R , and we abbreviate dist ( x, t ) by d ( t ) = ℓ ( x, t ) . From Corollary 3.4 we knowthat we can determine the value of the lex minimizer at x by finding a steepest fixable path. By definition, we need tofind t , t ∈ T that maximize the gradient of the path from t to t , ∇ ( t , t ) = v ( t ) − v ( t ) d ( t )+ d ( t ) . As observed above, thisis equivalent to finding a terminal with the highest pressure. We now present a simple randomized algorithm for thisproblem that runs in expected linear time.Given a terminal t , we can compute its pressure α along with the terminal t such that |∇ ( t , t ) | = α in time O ( | T | ) by scanning over the terminals in T . Consider doing this for a random terminal t . We will show that in lineartime one can then find the subset of terminals T ′ ⊂ T whose pressure is greater than α . Assuming this, we completethe analysis of the algorithm. If T ′ = ∅ , t is a vertex with highest pressure. Hence the path from t to t is a steepestfixable path, and we return ( t , t ) . If T ′ = ∅ , the terminal with the highest pressure must be in T ′ , and we recurse bypicking a new random t ∈ T ′ . As the size of T ′ will halve in expectation at each iteration, the expected time of thealgorithm on the star is O ( | T | ) .To determine which terminals have pressure exceeding α , we observe that the condition ∃ t : α < ∇ ( t , t ) = v ( t ) − v ( t ) d ( t )+ d ( t ) , is equivalent to ∃ t : v ( t )+ α d ( t ) < v ( t ) − α d ( t ) . This, in turn, is equivalent to vLow [ α ]( x ) < v ( t ) − α d ( t ) . We can compute vLow [ α ]( x ) in deterministic O ( | T | ) time. Similarly, we can check if ∃ t : α < ∇ ( t , t ) bychecking if vHigh [ α ]( x ) > v t + α d ( t ) . Thus, in linear time, we can compute the set T ′ of terminals with pressureexceeding α . The above algorithm is described in Algorithm 10. Theorem 4.1
Given a set of terminals T, initial voltages v : T → R , and distances d : T → R + , S TAR S TEEPEST P ATH ( T, v, d ) returns ( t , t ) maximizing v ( t ) − v ( t ) d ( t )+ d ( t ) , and runs in expected time O ( | T | ) . Theorem 3.3, tells us that M
ETA -L EX will compute lex-minimizers given an algorithm for finding a steepest fixablepath in ( G, v ) . Recall that finding a steepest fixable path is equivalent to finding a path with gradient equal to thehighest pressure amongst all vertices. In this section, we show how to do this in expected time O ( m + n log n ) .We describe an algorithm V ERTEX S TEEPEST P ATH that finds a terminal path P through any vertex x such that ∇ P ( v ) = pressure [ v ]( x ) in expected O ( m + n log n ) time. Using Dijkstra’s algorithm, we compute dist ( x, t ) forall t ∈ T. If x ∈ T ( v ) , then there must be a terminal path P that starts at x that has ∇ P ( v ) = pressure [ v ]( x ) . Tocompute such a P we examine all t ∈ T ( v ) in O ( | T | ) time to find the t that maximizes |∇ ( x, t ) | = | v ( x ) − v ( t ) | dist ( x,t ) , andthen return a shortest path between x and that t. If x / ∈ T ( v ) , then the steepest path through x between terminals t and t must consist of shortest paths between x and t and between x and t . Thus, we can reduce the problem to that of finding the steepest path in a star graphwhere x is the only non-terminal and is connected to each terminal t by an edge of length dist ( x, t ) . By Theorem 4.1,we can find this steepest path in O ( | T | ) expected time. The above algorithm is formally described as Algorithm 9. Theorem 4.2
Given a well-posed instance ( G, v ) , and a vertex x ∈ V G , V ERTEX S TEEPEST P ATH ( G, v , x ) returnsa terminal path P through x such that ∇ P ( v ) = pressure [ v ]( x ) , in O ( m + n log n ) expected time.
7s in the algorithm for the star graph, we need to identify the vertices whose pressure exceeds a given α . For a fixed α, we can compute vLow [ α ]( x ) and vHigh [ α ]( x ) for all x ∈ V G using a simple modification of Dijkstra’s algorithm in O ( m + n log n ) time. We describe the algorithms C OMP VH IGH , C
OMP VL OW for these tasks in Algorithms 3 and 4.The following lemma encapsulates the usefulness of vLow and vHigh . Lemma 4.3
For every x ∈ V G , pressure [ v ]( x ) > α iff vHigh [ α ]( x ) > vLow [ α ]( x ) . It immediately follows that the algorithm C
OMP H IGH P RESS G RAPH ( G, v , α ) described in Algorithm 6 computesthe vertex induced subgraph on the vertex set { x ∈ V G | pressure [ v ]( x ) > α } . We can combine these algorithms into an algorithm S
TEEPEST P ATH that finds the steepest fixable path in ( G, v ) in O ( m + n log n ) expected time. We may assume that there are no terminal-terminal edges in G. We sample an edge ( x , x ) uniformly at random from E G , and a terminal x uniformly at random from V G . For i = 1 , , , we computethe steepest terminal path P i containing x i . By Theorem 4.2, this can be done in O ( m + n log n ) expected time. Let α be the largest gradient max i ∇ P i . As mentioned above, we can identify G ′ , the induced subgraph on vertices x withpressure exceeding α , in O ( m + n log n ) time. If G ′ is empty, we know that the path P i with largest gradient is asteepest fixable path. If not, a steepest fixable path in ( G, v ) must be in G ′ , and hence we can recurse on G ′ . Sincewe picked a uniformly random edge, and a uniformly random vertex, the expected size of G ′ is at most half that of G .Thus, we obtain an expected running time of O ( m + n log n ) . This algorithm is described in detail in Algorithm 7.
Theorem 4.4
Given a well-posed instance ( G, v ) with E G ∩ ( T ( v ) × T ( v )) = ∅ , S TEEPEST P ATH ( G, v ) returnsa steepest fixable path in ( G, v ) , and runs in O ( m + n log n ) expected time. By using S
TEEPEST P ATH in M
ETA -L EX , we get the C OMP L EX M IN , shown in Algorithm 1. From Theorem 3.3 andTheorem 4.4, we immediately get the following corollary. Corollary 4.5
Given a well-posed instance ( G, v ) as input, algorithm C OMP L EX M IN computes a lex-minimizingassignment that extends v in O ( n ( m + n log n )) expected time. Given the algorithms in the previous section, it is straightforward to construct an infinity minimizer. Let α ⋆ be thegradient of the steepest terminal path. From Lemma 3.5, we know that the norm of the inf minimizer is α ⋆ . Consideringall trivial terminal paths (terminal-terminal edges), and using S TEEPEST P ATH , we can compute α ⋆ in randomized O ( m + n log n ) time. It is well known (McShane (1934); Whitney (1934)) that v = vLow [ α ⋆ ] and v = vHigh [ α ⋆ ] areinf-minimizers. It is also known that ( v + v ) is the inf-minimizer that minimizes the maximum ℓ ∞ -norm distanceto all inf-minimizers. In the case of path graphs, this was observed by Gaffney and Powell (1976) and independentlyby Micchelli et al. (1976). For completeness, the algorithm is presented as Algorithm 5, and we have the followingresult. Theorem 4.6
Given a well-posed instance ( G, v ) , C OMP I NF M IN ( G, v ) returns a complete voltage assignment v for G extending v that minimizes (cid:13)(cid:13) grad [ v ] (cid:13)(cid:13) ∞ , and runs in randomized O ( m + n log n ) time. The lex-minimizer has additional structure that allows one to compute it by more efficient algorithms. One observationthat leads to a faster implementation is that fixing a steepest fixable path does not increase the pressure at vertices,provided that one appropriately ignores terminal-terminal edges. Thus, if G ( α ) is a subgraph that we identified withpressure greater than α, we can iteratively fix all steepest fixable paths P in G ( α ) with ∇ P > α.
Another simpleobservation is that if G ( α ) is disconnected, we can simply recurse on each of the connected components. A completedescription of an the algorithm C OMP F AST L EX M IN based on these idea is given in Algorithm 11. The algorithmprovably computes lex G ( v ) , and it is possible to implement it so that the space requirement is only O ( m + n ) . Although, we are unable to prove theoretical bounds on the running time that are better than O ( n ( m + n log n )) ,it runs extremely quickly in practice. We used it to perform the experiments in this paper. For random regulargraphs and Delaunay graphs, with n = 0 . × vertices and around 2 million edges m ∼ . − × , it8akes a couple of minutes on a 2009 MacBook Pro. Similar times are observed for other model graphs of thissize such as random regular graphs and real world networks. An implementation of this algorithm may be foundat https://github.com/danspielman/YINSlex . Our definitions and algorithms, including those for regularization, extend to directed graphs with only small modifi-cations. We view directed edges as diodes and only consider potential differences in the direction of the edge. Fora complete voltage assignment v on the vertices of a directed graph G , we define the directed gradient on ( x, y ) dueto v to be grad + G [ v ]( x, y ) = max n v ( x ) − v ( y ) ℓ ( x,y ) , o . Given a partially-labelled directed graph ( G, v ) , we say that a acomplete voltage assignment v is a lex-minimizer if it extends v and for other complete voltage assignment v ′ thatextends v we have grad + G [ v ] (cid:22) grad + G [ v ′ ] . We say that a partially-labelled directed graph ( G, v ) is a well-posed directed instance if every free vertex appears in a directed path between two terminals.The main difference between the directed and undirected cases is that the directed lex-minimizer is not necessarilyunique. To maintain clarity of exposition, we chose to focus on undirected graphs so far. For directed graphs, we havethe following corresponding structural results. Theorem 5.1
Given a well-posed instance ( G, v ) on a directed graph G , there exists a lex-minimizer, and the set ofall lex-minimizers is a convex set. Moreover, for every two lex-minimizers v and v ′ , we have grad + G [ v ] = grad + G [ v ′ ] . However, note that in the case of directed graphs, the lex-minimizer need not be unique. We still have a weaker versionof Theorem 3.3 for directed graphs.
Theorem 5.2
Given a well-posed instance ( G, v ) on a directed graph G , let v be the partial voltage assignmentextending v obtained by repeatedly fixing steepest fixable (directed) paths P with ∇ P > . Then, any lex-minimizerof ( G, v ) must extend v . Moreover, for every edge e ∈ E G \ ( T ( V ) × T ( V )) , any lex-minimizer v of ( G, v ) mustsatisfy grad + [ v ]( e ) = 0 . When the value of the lex-minimizer at a vertex is not uniquely determined, it is constrained to an interval. In ourexperiments, we pick the convention that when the voltage at a vertex is constrained to an interval ( −∞ , a ] or [ a, ∞ ) ,we assign a to the terminal. When it is constrained to a finite interval, we assign a voltage closest to the median of theoriginal voltages. We demonstrate the performance of our lex-minimization algorithms on directed graphs by using them to detect spamwebpages as in Zhou et al. (2007). We use the dataset webspam-uk2006-2.0 described in Castillo et al. (2006).This collection includes 11,402 hosts, out of which 7,473 (65.5 %) are labeled, either as spam or normal . Each hostcorresponds to the collection of web pages it serves. Of the hosts, 1924 are labeled spam (25.7 % of all labels). Weconsider the problem of flagging some hosts as spam, given only a small fraction of the labels for training. We assigna value of to the spam hosts, and a value of to the normal ones. We then compute a lex minimizer and examine theeffect of flagging as spam all hosts with a value greater than some threshold.Following Zhou et al. (2007), we create edges between hosts with lengths equal to the reciprocal of the number oflinks from one to the other. We run our experiments only on the largest strongly connected component of the graph,which contains 7945 hosts of which 5552 are labeled. 16 % of the nodes in this subgraph are labeled spam . To createtraining and test data, for a given value p , we select a random subset of p % of the spam labels and a random subsetof p % of the normal labels to use for training. The remaining labels are used for testing. We report results for p = 5 and p = 20 .Again following Zhou et al. (2007), we plot the precision and recall of different choices of threshold for flaggingpages as spam. Recall is the fraction of spam pages our algorithm flags as spam, and precision is the fraction of pagesour algorithm flags as spam that actually are spam. Amongst the algorithms studied by Zhou et al. (2007), the top9erformer was their algorithm based on sampling according to a random-walk that follows in-links from other hosts.We compare their algorithm with the classification we get by directing edges in the opposite directions of links. Thishas the effect that a link to a spam host is evidence of spamminess, and a link from a normal host is evidence ofnormality.Results are shown in Figure 3. While we are not able to reliably flag all spam hosts, we see that in the range of10-50 % recall, we are able to flag spam with precision above 82 %. We see that the performance of directed lex-minimization does not degrade rapidly when from the “large training set” regime of p = 20 , to the “small training set”regime of p = 5 . Recall P r e c i s i on RandWalkDirectedLex
Recall P r e c i s i on
20 % labels for training
RandWalkDirectedLex
Figure 3: Recall and precision in the web spam classification experiment. Each data point shown was computed as an average over100 runs. The largest standard deviation of the mean precision across the plotted recall values was less than 1.3 %. The algorithmof Zhou et al. (2007) appears as R
AND W ALK . Our directed lex-minimization algorithm appears as D
IRECTED L EX . For comparison, in Appendix C, we show the performance of our algorithm and that of Zhou et al. (2007) bothwith link directions reversed, as well as the performance of undirected lex-minimization and Laplacian inference, allof which are significantly worse. l -Regularization of Vertex Values We now explain how we can accommodate noise in both the given voltages and in the given lengths of edges. We canfind the minimum number of labels to ignore, or the minimum increase in edges lengths needed so that there exists anextension whose gradients have l ∞ -norm lower than a given target. After determining which labels to ignore or theneeded increment in edge lengths, we recommend computing a lex minimizer.The algorithms we present in this section are essentially the same for directed and undirected graphs. l -Vertex Regularization for Inf-minimization The l -regularization of vertex labels can be viewed as a problem of outlier removal: the vector we compute is allowedto disagree with v on up to k terminals. Given a voltage assignment v and a subset T ⊂ V of the vertices, by v ( T ) we mean the vector obtained by restricting v to T . We define the l -Vertex Regularization for l ∞ problem to be min v ∈ IR n (cid:13)(cid:13) grad G [ v ] (cid:13)(cid:13) ∞ subject to (cid:13)(cid:13) v ( T ) − v ( T ) (cid:13)(cid:13) ≤ k, (3)where v ( T ) is the vector of values of v on the terminals T .In Appendix D, we describe an approximation algorithm A PPROX -O UTLIER that approximately solves program (3).The precise statement we prove in Appendix D is given in the following theorem.10 heorem 7.1 (Approximate l -vertex regularization) The algorithm A PPROX -O UTLIER takes a positive integer k and a partially-labeled graph ( G, v ) , and outputs an assignment v with (cid:13)(cid:13) v ( T ) − v ( T ) (cid:13)(cid:13) ≤ k , and (cid:13)(cid:13) grad G [ v ] (cid:13)(cid:13) ∞ ≤ α ∗ , where α ∗ is the optimum value of program (3) . The algorithm runs in time O ( k ( m + n log n )) . In Appendix D, we also describe an algorithm O
UTLIER that exactly solves program (3) in polynomial time, and weprove its correctness.
Theorem 7.2 (Exact l -vertex regularization) The algorithm O UTLIER takes a positive integer k and a partially-labeled graph ( G, v ) solves program (3) exactly. The algorithm runs in polynomial time. We give a proof of Theorem 7.2 in Appendix D. To do this, we reduce the program (3) to the problem of minimizingthe required l -budget needed to achieve a fixed gradient α using a binary search over a set of O ( n ) gradients. Thislatter problem we reduce in polynomial time to Minimum Vertex Cover (VC) on a transitively closed, directed acyclicgraph (a TC-DAG). VC on a TC-DAG can be solved exactly in polynomial time by a reduction to the MaximumBipartite Matching Problem (Fulkerson (1956)). The problem was phrased by Fulkerson as one of finding a maximumantichain of a finite poset. Any transitively closed DAG corresponds directly to the comparability graph of a poset. Amaximum antichain of a poset is a maximum independent set of a the comparability graph of the poset, and hence itscomplement is a minimum vertex cover of the comparability graph. We refer to the algorithm developed by Fulkersonas K ONIG -C OVER . Theorem 7.3
The algorithm K ONIG -C OVER computes a minimum vertex cover for any transitively closed DAG G inpolynomial time. l regularization for l The result that l -regularized inf-minimization can be solved exactly in polynomial time is surprising, especiallybecause the analogous problem for 2-Laplacian minimization turns out to be NP-Hard.We define the the l vertex regularization for l for a partially-labeled graph ( G, v ) and an integer k by min v ∈ R n : k v ( T ) − v ( T ) k ≤ k v T Lv, where L is the Laplacian of G . Theorem 7.4 l vertex regularization for l is NP-Hard. In Appendix E we prove Theorem 7.4 by giving a polynomial time (Karp) reduction from the NP-Hard minimumbisection problem to l vertex regularization for l . l -Edge and Vertex Regularization of Inf-minimizers Consider a partially-labeled graph ( G, v ) and an α > . The set of voltage assignments given by n v : v extends v and (cid:13)(cid:13) grad G [ v ] (cid:13)(cid:13) ∞ ≤ α o is convex. Going further, let us consider the edge lengths in a graph to be specified by a vector ℓ ∈ IR E . Now the setof voltages v and and lengths ℓ which achieve k grad G ( ℓ ) [ v ] k ∞ ≤ α is jointly convex in v and ℓ . To see this, observethat k grad G ( ℓ ) [ v ] k ∞ ≤ α ⇔ ∀ ( u, v ) ∈ E : − αℓ ( u, v ) ≤ v ( u ) − v ( v ) ≤ αℓ ( u, v ) . (4)Furthermore, the condition “ v extends v ” is a linear constraint on v , which we express as v ( T ) = v ( T ) . Fromthe above, it is clear that the gradient condition corresponds to a convex set, as it is an intersection of half-spaces.These half-spaces are given by O ( m ) linear inequalities. We can leverage this to phrase many regularized variants ofinf-minimization as convex programs, and in some cases linear programs.11or example, we may consider a variant of inf-minimization combined with an l -budget for changing lengths ofedges and values on terminals. Given a parameter γ > which specifies the relative cost of regularizing terminals toregularizing edges, the problem is as follows arg min v ∈ IR n ,s ∈ IR m ,s ≥ k s k + γ (cid:13)(cid:13) v ( T ) − v ( T ) (cid:13)(cid:13) subject to (cid:13)(cid:13)(cid:13) grad G ( ℓ + s ) [ v ] (cid:13)(cid:13)(cid:13) ∞ ≤ α. (5)From our observation (4), it follows that problem (5) may be expressed as a linear program with O ( n ) variablesand O ( m ) constraints. We can use ideas from Daitch and Spielman (2008) to solve the resulting linear program intime e O ( m . ) by an interior point method with a special purpose linear equation solver. The reason is that the linearequations the IPM must solve at each iteration may be reduced to linear equations in symmetric, diagonally dominantmatrices, and these may be solved in nearly-linear time (Cohen et al. (2014)). Conclusion.
We propose the use of inf and lex minimizers for regression on graphs. We present simple algorithmsfor computing them that are provably fast and correct, and can also be implemented efficiently. We also present aframework and polynomial time algorithms for regularization in this setting. The initial experiments reported in thepaper indicate that these algorithms give pretty good results on real and synthetic datasets. The results seem to comparequite favorably to other algorithms, particularly in the regime of tiny labeled sets. We are testing these algorithms onseveral other graph learning questions, and plan to report on them in a forthcoming experimental paper. We believethat inf and lex minimizers, and the associated ideas presented in the paper, should be useful primitives that can beprofitably combined with other approaches to learning on graphs.
Acknowledgements
We thank anonymous reviewers for helpful comments. We thank Santosh Vempala and Bartosz Walczak for pointingout that it was already known how to compute a minimum vertex cover of a transitively closed DAG in polynomialtime.
References
Morteza Alamgir and Ulrike V. Luxburg. Phase transition in the family of p-resistances.In
Advances in Neural Information Processing Systems 24 , pages 379–387. 2011. URL http://books.nips.cc/papers/files/nips24/NIPS2011_0278.pdf .Gunnar Aronsson. Extension of functions satisfying lipschitz conditions.
Arkiv fr Matematik , 6(6):551–561, 1967.ISSN 0004-2080. doi: 10.1007/BF02591928. URL http://dx.doi.org/10.1007/BF02591928 .Gunnar Aronsson, Michael G. Crandall, and Petri Juutinen. A tour of the theory of absolutely minimizing functions.
Bull. Amer. Math. Soc. (N.S.) , 41(4):439–505, 2004. ISSN 0273-0979. doi: 10.1090/S0273-0979-04-01035-3.URL http://dx.doi.org/10.1090/S0273-0979-04-01035-3 .Guy Barles and J´erˆome Busca. Existence and comparison results for fully nonlinear degenerate elliptic equationswithout zeroth-order term.
Comm. Partial Differential Equations , 26:2323–2337, 2001.Mikhail Belkin, Irina Matveeva, and Partha Niyogi. Regularization and semi-supervised learning on largegraphs. In
Learning Theory , volume 3120 of
Lecture Notes in Computer Science , pages 624–638.Springer Berlin Heidelberg, 2004. ISBN 978-3-540-22282-8. doi: 10.1007/978-3-540-27819-1 43. URL http://dx.doi.org/10.1007/978-3-540-27819-1_43 .Nick Bridle and Xiaojin Zhu. p -voltages: Laplacian regularization for semi-supervised learning on high-dimensionaldata. In Eleventh Workshop on Mining and Learning with Graphs (MLG2013) , 2013.12arlos Castillo, Debora Donato, Luca Becchetti, Paolo Boldi, Stefano Leonardi, Massimo Santini, and SebastianoVigna. A reference collection for web spam.
SIGIR Forum , 40(2):11–24, December 2006. ISSN 0163-5840. doi:10.1145/1189702.1189703. URL http://doi.acm.org/10.1145/1189702.1189703 .Olivier Chapelle, Bernhard Schlkopf, and Alexander Zien.
Semi-Supervised Learning . The MIT Press, 1st edition,2010. ISBN 0262514125, 9780262514125.Michael B Cohen, Rasmus Kyng, Gary L Miller, Jakub W Pachocki, Richard Peng, Anup B Rao, and Shen Chen Xu.Solving SDD linear systems in nearly m log / n time. In Proceedings of the 46th Annual ACM Symposium onTheory of Computing , pages 343–352. ACM, 2014.M.G. Crandall, L.C. Evans, and R.F. Gariepy. Optimal lipschitz extensions and the infinity laplacian.
Calculus of Vari-ations and Partial Differential Equations , 13(2):123–139, 2001. ISSN 0944-2669. doi: 10.1007/s005260000065.URL http://dx.doi.org/10.1007/s005260000065 .Samuel I. Daitch and Daniel A. Spielman. Faster approximate lossy generalized flow via interior point algo-rithms. In
Proceedings of the Fortieth Annual ACM Symposium on Theory of Computing , STOC ’08, pages451–460, New York, NY, USA, 2008. ACM. ISBN 978-1-60558-047-0. doi: 10.1145/1374376.1374441. URL http://doi.acm.org/10.1145/1374376.1374441 .Alan Egger and Robert Huotari. Rate of convergence of the discrete polya algorithm.
Journal of ApproximationTheory , 60(1):24 – 30, 1990. ISSN 0021-9045. doi: http://dx.doi.org/10.1016/0021-9045(90)90070-7. URL .Charles Fefferman. Whitney’s extension problems and interpolation of data.
Bull. Amer. Math. Soc.(N.S.) , 46(2):207–220, 2009a. ISSN 0273-0979. doi: 10.1090/S0273-0979-08-01240-8. URL http://dx.doi.org/10.1090/S0273-0979-08-01240-8 .Charles Fefferman. Fitting a [image] -smooth function to data, iii.
Annals of Mathematics , 170(1):pp. 427–441, 2009b.ISSN 0003486X. URL .Charles Fefferman and Bo’az Klartag. Fitting a cm -smooth function to data i.
Annals of Mathematics , 169(1):pp.315–346, 2009. ISSN 0003486X. URL .D. R. Fulkerson. Note on dilworths decomposition theorem for partially ordered sets.
Proc. Amer. Math. Soc , 1956.P.W. Gaffney and M.J.D. Powell. Optimal interpolation. In
Numerical Analysis , volume 506 of
Lecture Notes in Math-ematics , pages 90–99. Springer Berlin Heidelberg, 1976. ISBN 978-3-540-07610-0. doi: 10.1007/BFb0080117.URL http://dx.doi.org/10.1007/BFb0080117 .L.-A. Gottlieb, A. Kontorovich, and R. Krauthgamer. Efficient classification for metric data.
CoRR , abs/1306.2547,2013. URL http://arxiv.org/abs/1306.2547 .L.-A. Gottlieb, A. Kontorovich, and R. Krauthgamer. Efficient regression in metric spaces via approximate lipschitzextension. In
Similarity-Based Pattern Recognition , volume 7953 of
Lecture Notes in Computer Science , pages43–58. Springer Berlin Heidelberg, 2013b. ISBN 978-3-642-39139-2. doi: 10.1007/978-3-642-39140-8 3. URL http://dx.doi.org/10.1007/978-3-642-39140-8_3 .Robert Jensen. Uniqueness of lipschitz extensions: Minimizing the sup norm of the gradient.
Archive for Ra-tional Mechanics and Analysis , 123(1):51–74, 1993. ISSN 0003-9527. doi: 10.1007/BF00386368. URL http://dx.doi.org/10.1007/BF00386368 .M. Kirszbraun. ber die zusammenziehende und lipschitzsche transformationen.
Fundamenta Mathematicae , 22(1):77–108, 1934. URL http://eudml.org/doc/212681 .13ndrew J. Lazarus, Daniel E. Loeb, James G. Propp, Walter R. Stromquist, and Daniel H. Ull-man. Combinatorial games under auction play.
Games and Economic Behavior , 27(2):229 – 264, 1999. ISSN 0899-8256. doi: http://dx.doi.org/10.1006/game.1998.0676. URL .Yunqian Ma and Yun Fu.
Manifold Learning Theory and Applications . CRC Press, Inc., Boca Raton, FL, USA, 1stedition, 2011. ISBN 1439871094, 9781439871096.E. J. McShane. Extension of range of functions.
Bull. Amer. Math. Soc. , 40(12):837–842, 12 1934. URL http://projecteuclid.org/euclid.bams/1183497871 .C.A. Micchelli, T.J. Rivlin, and S. Winograd. The optimal recovery of smooth functions.
Nu-merische Mathematik , 26(2):191–200, 1976. ISSN 0029-599X. doi: 10.1007/BF01395972. URL http://dx.doi.org/10.1007/BF01395972 .V. A. Milman. Absolutely minimal extensions of functions on metric spaces. 1999. URL http://iopscience.iop.org/1064-5616/190/6/A05/pdf/MSB_190_6_A05.pdf .Boaz Nadler, Nathan Srebro, and Xueyuan Zhou. Statistical analysis of semi-supervised learning: The limit of infiniteunlabelled data. 2009. URL http://ttic.uchicago.edu/˜nati/Publications/NSZnips09.pdf .A. Naor and S. Sheffield. Absolutely minimal Lipschitz extension of tree-valued mappings.
CoRR , abs/1005.2535,May 2010. URL http://arxiv.org/abs/1005.2535 .A. M. Oberman. Finite difference methods for the Infinity Laplace and p-Laplace equations.
CoRR , abs/1107.5278,July 2011. URL http://arxiv.org/abs/1107.5278 .Yuval Peres, Oded Schramm, Scott Sheffield, and DavidB. Wilson. Tug-of-war and the infinity lapla-cian. In
Selected Works of Oded Schramm , Selected Works in Probability and Statistics, pages 595–638. Springer New York, 2011. ISBN 978-1-4419-9674-9. doi: 10.1007/978-1-4419-9675-6 18. URL http://dx.doi.org/10.1007/978-1-4419-9675-6_18 .S. Sheffield and C. K. Smart. Vector-valued optimal Lipschitz extensions.
CoRR , abs/1006.1741, June 2010. URL http://arxiv.org/abs/1006.1741 .Ali Kemal Sinop and Leo Grady. A seeded image segmentation framework unifying graph cuts and random walkerwhich yields a new algorithm. In
Computer Vision, 2007. ICCV 2007. IEEE 11th International Conference on ,pages 1–8. IEEE, 2007.Vijay V. Vazirani.
Approximation Algorithms . Springer-Verlag New York, Inc., New York, NY, USA, 2001. ISBN3-540-65367-8.Ulrike von Luxburg and Olivier Bousquet. Distance-based classification with lipschitz functions. In
Learn-ing Theory and Kernel Machines , volume 2777 of
Lecture Notes in Computer Science , pages 314–328.Springer Berlin Heidelberg, 2003. ISBN 978-3-540-40720-1. doi: 10.1007/978-3-540-45167-9 24. URL http://dx.doi.org/10.1007/978-3-540-45167-9_24 .Hassler Whitney. Analytic extensions of differentiable functions defined in closed sets.
Transac-tions of the American Mathematical Society , 36(1):pp. 63–89, 1934. ISSN 00029947. URL .Dengyong Zhou, Christopher J. C. Burges, and Tao Tao. Transductive link spam detection. In
Proceedingsof the 3rd International Workshop on Adversarial Information Retrieval on the Web , AIRWeb ’07, pages 21–28, New York, NY, USA, 2007. ACM. ISBN 978-1-59593-732-2. doi: 10.1145/1244408.1244413. URL http://doi.acm.org/10.1145/1244408.1244413 .Xiaojin Zhu, Zoubin Ghahramani, and John Lafferty. Semi-supervised learning using gaussian fields and harmonicfunctions. In
IN ICML , pages 912–919, 2003. 14
Basic Properties of Lex-Minimizers
A.1 Meta Algorithm
Algorithm 1: Algorithm M
ETA -L EX : Given a well-posed instance ( G, v ) , outputs lex G [ v ] . for i = 1 , , . . . : if T ( v i − ) = V G , then return v i − .2. E ′ = E G \ ( T ( v i − ) × T ( v i − )) , G ′ := ( V G , E ′ ) .3. Let P ⋆i be a steepest fixable path in ( G ′ , v i − ) . Let α ⋆i ← ∇ P ⋆ ( v i − ) . v i ← fix [ v i − , P ⋆i ] . In this subsection, we prove the results that appeared in section 2. We start with a simple observation.
Proposition A.1
Given a well-posed instance ( G, v ) such that T ( v ) = V, let P be a steepest fixable path in ( G, v ) . Then, fix [ v , P ] extends v , and ( G, fix [ v , P ]) is also a well-posed instance. The properties we prove below do not depend on the choice of the steepest fixable path.
Proposition A.2
For any well-posed instance ( G, v ) , with | V G | = n, M ETA -L EX ( G, v ) terminates in at most n iterations, and outputs a complete voltage assignment v that extends v . Proof of Proposition A.2:
By Proposition A.1, at any iteration i, v i − extends v and ( G ′ , v i − ) is a well-posedinstance. M ETA -L EX only outputs v i − iff T ( v i − ) = V, which means v i − is a complete voltage assignment. Forany v i − that is not complete, for any x ∈ V \ T ( v i − ) , we must have a free terminal path in ( G ′ , v i − ) that contains x .Hence, a steepest fixable path P ⋆i exists in ( G ′ , v i − ) . Since P ⋆i is a free terminal path, fix [ v i − , P ⋆i ] fixes the voltagefor at least one non-terminal. Thus, M ETA -L EX ( G, v ) must complete in at most n iterations. ✷ For the following lemmas, consider a run of M
ETA -L EX with well-posed instance ( G, v ) as input. Let v out be thecomplete voltage assignment output by M ETA -L EX . Let E i be the set of edges E ′ and G i be the graph G ′ constructedin iteration i of M ETA -L EX . Lemma A.3
For every edge e ∈ E i − \ E i , we have (cid:12)(cid:12) grad [ v out ]( e ) (cid:12)(cid:12) ≤ α ⋆i . Moreover, α ⋆i is non-increasing with i. Proof of Lemma A.3:
Let P ⋆i = ( x , . . . , x r ) be a steepest fixable path in iteration i (when we deal with instance ( G i − , v i − ) ). Consider a terminal path P i +1 in ( G i , v i ) such that { ∂ P i +1 , ∂ P i +1 } ∩ ( T ( v i ) \ T ( v i − )) = ∅ . Weclaim that ∇ P i +1 ( v i ) ≤ α ⋆i . On the contrary, assume that ∇ P i +1 ( v i ) > α ⋆i . Consider the case ∂ P i +1 ∈ T ( v i ) \ T ( v i − ) , ∂ P ∈ T ( v i − ) . By the definition of v i , we must have ∂ P i +1 = x j for some j ∈ [ r − . Let P ′ i +1 be thepath formed by joining paths P ⋆i [ x : x j ] and P i +1 . P ′ i +1 is a free terminal path in ( G i − , v i − ) . We have, v i − ( x ) − v i − ( ∂ P i +1 ) = ( v i ( x ) − v i ( x j )) + ( v i ( ∂ P i +1 ) − v i ( ∂ P i +1 )) > α ⋆i · ℓ ( P ⋆i [ x : x j ]) + α ⋆i · ℓ ( P i +1 ) = α ⋆i · ℓ ( P ′ i +1 ) , giving ∇ P ′ i +1 ( v i ) > α ⋆i , which is a contradiction since the steepest fixable path P ⋆i in ( G i − , v i − ) has gradient α ⋆i .The other cases can be handled similarly.Applying the above claim to an edge e ∈ E i − \ E i , whose gradient is fixed for the first time in iteration i, weobtain that grad [ v i +1 ]( e ) ≤ α ⋆i . If v is the complete voltage assignment output by M ETA -L EX , since v extends v i +1 , we get grad [ v out ]( e ) ≤ α ⋆i . Applying the claim to the symmetric edge, we obtain − grad [ v out ]( e ) ≤ α ⋆i , implying | grad [ v out ]( e ) | ≤ α ⋆i . Consider any free terminal path P i +1 in ( G i , v i ) . If P i +1 is also a terminal path in ( G i − , v i − ) , it is a freeterminal path in ( G i − , v i − ) . In addition, since a steepest fixable path P ⋆i in ( G i − , v i − ) has ∇ P ⋆i = α ⋆i , we get ∇ P i +1 ( v i ) = ∇ P i +1 ( v i − ) ≤ α ⋆i . Otherwise, we must have { ∂ P i +1 , ∂ P i +1 } ∩ ( T ( v i ) \ T ( v i − )) = ∅ , and we candeduce ∇ P i +1 ( v i ) ≤ α ⋆i using the above claim. Thus, all free terminal paths P i +1 in ( G i , v i ) satisfy ∇ P i +1 ( v i ) ≤ α ⋆i . In particular, α ⋆i +1 = ∇ P ⋆i +1 ( v i ) ≤ α ⋆i . Thus, α ⋆i is non-increasing with i. ✷ emma A.4 For any complete voltage assignment v for G that extends v , if v = v out , we have grad [ v ] grad [ v out ] , and hence grad [ v out ] (cid:22) grad [ v ] . Proof of Lemma A.4:
Consider any complete voltage assignment v for G that extends v , such that v = v out . Thus,there exists a unique i such that v extends v i − but does not extend v i . We will argue that grad [ v ] grad [ v out ] , andhence grad [ v out ] (cid:22) grad [ v ] . For every edge e ∈ E \ E i − that has been fixed so far, grad [ v ]( e ) = grad [ v i − ]( e ) = grad [ v out ]( e ) , and hence we can ignore these edges.Since v extends v i − but not v i , there exists an x ∈ T ( v i ) \ T ( v i − ) such that v ( x ) = v i ( x ) = v out ( x ) . Assume v ( x ) < v i ( x ) (the other case is symmetric). If P ⋆i = ( x , . . . , x r ) is the steepest fixable path with gradient α ⋆i pickedin iteration i, we must have x = x j for some j ∈ [ r − . Thus, j X k =1 ( v ( x k − ) − v ( x k )) = v ( x ) − v ( x j ) > v i ( x ) − v i ( x j ) = α ⋆i · ℓ ( P ⋆i [ x : x j ]) = α ⋆i · j X k =1 ℓ ( x k − , x k ) . Thus, for some k ∈ [ j ] , we must have grad [ v ]( x k − , x k ) > α ⋆i . Since P ∗ i is a path in G i − , we have { x k − , x k } 6⊆ T ( v i − ) . This gives ( x k − , x k ) ∈ ( E i − \ E i ) . But then, from Lemma A.3, it follows that for all e ∈ ( E i − \ E i ) , wehave | grad [ v out ]( e ) | ≤ α ⋆i . Thus, we have grad [ v ] grad [ v out ] . ✷ Lemma A.5
Let P = ( x , . . . , x r ) be a steepest fixable path such that it does not have any edges in T ( v ) × T ( v ) and v = fix G [ v , P ] . Then for every i ∈ [ r ] , we have grad [ v ]( x i − , x i ) = ∇ P. Proof of Lemma A.5:
Suppose this is not true and let j ∈ [ r ] be the minimum number such that grad [ v ]( x j − , x j ) = ∇ P. By definition of v we would necessarily have j < r and v j ∈ T ( v ) . Suppose grad [ v ]( x j − , x j ) < ∇ P. Wewould then have v ( x ) − v ( x j ) < ∇ P ∗ ℓ ( P [ x : x j ]) . Since P does not have any edges in T ( v ) × T ( v ) , P := ( x j , ..., x r ) would be a free terminal path with ∇ P > ∇ P. This is a contradiction. Other cases can be ruledout similarly. ✷ Proof of Theorem 3.3:
Consider an arbitrary run of M
ETA -L EX on ( G, v ) . Let v out be the complete voltageassignment output by M ETA -L EX . Proposition A.1 implies that v out extends v . Lemma A.4 implies that for anycomplete voltage assignment v = v out that extends v , we have grad [ v out ] (cid:22) grad [ v ] . Thus, v out is a lex-minimizer.Moreover, the lemma also gives that for any such v, grad [ v ] grad [ v out ] . and hence v out is a unique lex-minimizer.Thus, v out is the unique voltage assignment satisfying Def. 2.1, and we denote it as lex G [ v ] . Since we started with anarbitrary run of M
ETA -L EX , uniqueness implies that every run of M ETA -L EX on ( G, v ) must output lex G [ v ] . ✷ Proof of Lemma 3.5:
Suppose we have a complete voltage assignment v extending v , such that (cid:13)(cid:13) grad [ v ] (cid:13)(cid:13) ∞ ≤ α. For any terminal path P = ( x , . . . , x r ) , we get, ∇ P ( v ) = v ( ∂ P ) − v ( ∂ P ) = v ( ∂ P ) − v ( ∂ P ) = r X i =1 grad [ v ]( x i − , x i ) ≤ α · r X i =1 ℓ ( x i − , x i ) = α · ℓ ( P ) , giving ∇ P ( v ) ≤ α. On the other hand, suppose every terminal path P in ( G, v ) satisfies ∇ P ( v ) ≤ α. Consider v = lex G [ v ] . Weknow that v extends v . For every edge e ∈ E G ∩ T ( v ) × T ( v ) , e is a (trivial) terminal path in ( G, v ) , and hencehas satisfies grad [ v ]( e ) = grad [ v ]( e ) = ∇ e ( v ) ≤ α. Considering the reverse edge, we also obtain − grad [ v ]( e ) ≤ α. Thus, | grad [ v ]( e ) | ≤ α. Moreover, using Lemma A.3, we know that for edge e ∈ E G \ T ( v ) × T ( v ) , | grad [ v ]( e ) | ≤ α ⋆ = ∇ P ⋆ ≤ α since P is a terminal path in ( G, v ) . Thus, for every e ∈ E G , | grad [ v ]( e ) | ≤ α, and hence (cid:13)(cid:13) grad [ v ] (cid:13)(cid:13) ∞ ≤ α. ✷ A.2 Stability
In this subsection, we sketch a proof of the monotonicity of lex-minimizers and show how it implies the stabilityproperty claimed earlier.For any well-posed ( G, v ) , there could be several possible executions of M ETA -L EX , each characterized by thesequence of paths P ⋆i . We can apply Theorem 3.3 to deduce the following structural result about the lex-minimizer.16 orollary A.6
For any well-posed instance ( G, v ) , consider a sequence of paths ( P , . . . , P r ) and voltage assign-ments ( v , . . . , v r ) for some positive integer r such that:1. P ⋆i is a steepest fixable path in ( G i − , v i − ) for i = 1 , . . . , r. v i = fix [ v i − , P ⋆i ] for i = 1 , . . . , r. T ( v r ) = V G . Then, we have v r = lex G [ v ] . We call such a sequence of paths and voltages to be a decomposition of lex G [ v ] . Again, note that lex G [ v ] canpossibly have multiple decompositions. However, any two such decompositions are consistent in the sense that theyproduce the same voltage assignment. Proof of Corollary 3.7:
We first define some operations on partial assignments which simplifies the notation. Let v , v be any two partial assignments with the same set of terminals T := T ( v ) = T ( v ) and c, d ∈ R . By cv + d we mean a partial assignment v with T ( v ) = T satisfying v ( t ) = cv ( t ) + d for all t ∈ T . Also, by v + v wemean a partial assignment v with T ( v ) = T satisfying v ( t ) = v ( t ) + v ( t ) for all t ∈ T. Also, we say v ≥ v if v ( t ) ≥ v ( t ) for all t ∈ T .Now we can show how Corollary 3.7 follows from Theorem 3.6. Let v := v − v , and k v k ∞ = ǫ , for some ǫ > . Therefore, v + ǫ ≥ v ≥ v − ǫ. Theorem 3.6 then implies that lex G [ v ] + ǫ ≥ lex [ v ] ≥ lex [ v ] − ǫ, hence provingthe corollary. ✷ Proof sketch of Theorem 3.6:
It is easy to see that the first statement holds. For the second statement, we firstobserve that if there is a sequence of paths P , ..., P r that is simultaneously a decomposition of both lex [ v ] and lex [ v ] , then this is easy to see. If such a path sequence doesn’t exist, then we look at v t := v + t ( v − v ) . Westate here without a proof (though the proof is elementary) that we can then split the interval [0 , into finitely manysubintervals [ a , a ] , [ a , a ] , .., [ a k − , a k ] , with a = 0 , a k = 1 , such that for any i , there is a path sequence P , ..., P r which is a decomposition of lex [ v t ] for all t ∈ [ a i , a i +1 ] . We then observe that v = v a ≤ v a ≤ ...v a k = v . Sincefor every a i , a i +1 , there is a path sequence which is simultaneously a decomposition of both lex [ v a i ] and lex [ v a i +1 ] ,we immediately get lex [ v ] = lex [ v a ] ≤ lex [ v a ] ≤ ... ≤ lex [ v a k ] = lex [ v ] . ✷ A.3 Alternate Characterizations
Proof of Theorem 3.10:
We know that lex G [ v ] extends v . We first prove that v = lex G [ v ] satisfies the max-mingradient averaging property. Assume to the contrary. Thus, there exists x ∈ V G \ T ( v ) such that max y :( x,y ) ∈ E G grad [ v ]( x, y ) = − min y :( x,y ) ∈ E G grad [ v ]( x, y ) . Assume that max ( x,y ) ∈ E G grad [ v ]( x, y ) ≥ − min ( x,y ) ∈ E G grad [ v ]( x, y ) . Then, consider v ′ extending v that is iden-tical to v except for v ′ ( x ) = v ( x ) − ǫ for ǫ > . For ǫ small enough, we get that max y :( x,y ) ∈ E G grad [ v ′ ]( x, y ) < max y :( x,y ) ∈ E G grad [ v ]( x, y ) and − min y :( x,y ) ∈ E G grad [ v ′ ]( x, y ) < max y :( x,y ) ∈ E G grad [ v ]( x, y ) . The gradient of edges not incident on the vertex x is left unchanged. This implies that grad [ v ] grad [ v ′ ] , contradicting the assumption that v is the lex-minimizer. (The other case is similar).17or the other direction. Consider a complete voltage assignment v extending v that satisfies the max-min gradientaveraging property w.r.t. ( G, v ) . Let α = max ( x,y ) ∈ E G x ∈ V \ T ( v ) grad [ v ]( x, y ) ≥ be the maximum edge gradient, and consider any edge ( x , x ) ∈ E G such that grad [ v ]( x , x ) = α, with x ∈ V \ T ( v ) . If α = 0 , grad [ v ] is identically zero, and is trivially the lex-minimal gradient assignment. Thus, both v and lex G [ v ] are constant on each connected component. Since ( G, v ) is well-posed, there is at least one terminal in eachcomponent, and hence v and lex G [ v ] must be identical.Now assume α > . By the max-min gradient averaging property, ∃ x ∈ V G such that ( x , x ) ∈ E G and grad [ v ]( x , x ) = min y :( x ,y ) ∈ E G grad [ v ]( x , y ) = − max y :( x ,y ) ∈ E G grad [ v ]( x , y ) ≤ − grad [ v ]( x , x ) = − α. Thus, grad [ v ]( x , x ) ≥ α. Since α is the maximum edge gradient, we must have grad [ v ]( x , x ) = α. More-over, v ( x ) > v ( x ) > v ( x ) , thus x = x . We can inductively apply this argument at x until we hit a ter-minal. Similarly, if x / ∈ T ( v ) we can extend the path in the other direction. Consequently, we obtain a path P = ( x j , . . . , x , x , x , x − , . . . , x k ) with all vertices as distinct, such that x j , x k ∈ T ( v ) , and x i ∈ V \ T ( v ) for all i ∈ [ j + 1 , k − . Moreover, grad [ v ]( x i , x i − ) = α for all j < i ≤ k. Thus, P is a free terminal path with ∇ P [ v ] = α. Moreover, since v is a voltage assignment extending v with (cid:13)(cid:13) grad [ v ] (cid:13)(cid:13) ∞ = α, using Lemma 3.5, we know thatevery terminal path P ′ in ( G, v ) must satisfy ∇ P ′ ( v ) ≤ α. Thus, P is a steepest fixable path in ( G, v ) . Thus,letting v = fix [ v , P ] , using Corollary 3.4, we obtain that lex G [ v ] = lex G [ v ] . Moreover, since α = ∇ P [ v ] = grad [ v ]( x i , x i − ) for all i ∈ ( j, k ] , we get v ( x i ) = v ( x i ) for all i ∈ ( j, k ) . Thus, v extends v . We can iterate this argument for r iterations until T ( v r ) = V G , giving v = v r and v r = lex G [ v r ] = lex G [ v ] . (Since we are fixing at least one terminal at each iteration, this procedure terminates). Thus, we get v = lex G [ v ] . ✷ B Description of the Algorithms
Algorithm 2: M OD D IJKSTRA ( G, v , α ) : Given a well-posed instance ( G, v ) , a gradient value α ≥ , outputs a completevoltage assignment v for G, and an array parent : V → V ∪ { null } . for x ∈ V G ,
2. Add x to a fibonacci heap, with key ( x ) = + ∞ . finished ( x ) ← false for x ∈ T ( v )
5. Decrease key ( x ) to v ( x ) . parent ( x ) ← null . while heap is not empty8. x ← pop element with minimum key from heap9. v ( x ) ← key ( x ) . finished ( x ) ← true . for y : ( x, y ) ∈ E G if finished ( y ) = false if key ( y ) > v ( x ) + α · ℓ ( x, y )
13. Decrease key ( y ) to v ( x ) + α · ℓ ( x, y ) . parent ( y ) ← x. return ( v, parent ) Theorem B.1
For a well-posed instance ( G, V ) and a gradient value α ≥ , let ( v, parent ) ← M OD D IJKSTRA ( G, v , α ) . Then, v is a complete voltage assignment such that, ∀ x ∈ V G , v ( x ) = min t ∈ T ( v ) { v ( t ) + α dist ( x, t ) } . Moreover, thepointer array parent satisfies ∀ x / ∈ T ( v ) , parent ( x ) = null and v ( x ) = v ( parent ( x )) + α · ℓ ( x, parent ( x )) . lgorithm 3: Algorithm C OMP VL OW ( G, v , α ) : Given a well-posed instance ( G, v ) , a gradient value α ≥ , outputs vLow , a complete voltage assignment for G, and an array LParent : V → V ∪ { null } . ( vLow , LParent ) ← M OD D IJKSTRA ( G, v , α ) return ( vLow , LParent ) Algorithm 4: Algorithm C
OMP VH IGH ( G, v , α ) : Given a well-posed instance ( G, v ) , a gradient value α ≥ , outputs vHigh , a complete voltage assignment for G , and an array HParent : V → V ∪ { null } . for x ∈ V G if x ∈ T ( v ) then v ( x ) ← − v ( x ) else v ( x ) ← v ( x ) . ( temp , HParent ) ← M OD D IJKSTRA ( G, v , α ) for x ∈ V G : vHigh ( x ) ← − temp ( x ) return ( vHigh , HParent ) Corollary B.2
For a well-posed instance ( G, V ) and a gradient value α ≥ , let ( vLow [ α ] , LParent ) ← C OMP VL OW ( G, v , α ) and ( vHigh [ α ] , HParent ) ← C OMP VH IGH ( G, v , α ) . Then, vLow [ α ] , vHigh [ α ] are complete voltage assignments for G such that, ∀ x ∈ V G , vLow [ α ]( x ) = min t ∈ T ( v ) { v ( t ) + α · dist ( x, t ) } vHigh [ α ]( x ) = max t ∈ T ( v ) { v ( t ) − α · dist ( t, x ) } . Moreover, the pointer arrays
LParent , HParent satisfy ∀ x / ∈ T ( v ) , LParent ( x ) , HParent ( x ) = null and vLow [ α ]( x ) = vLow [ α ]( LParent ( x )) + α · ℓ ( x, LParent ( x )) , vHigh [ α ]( x ) = vHigh [ α ]( HParent ( x )) − α · ℓ ( x, HParent ( x )) . Algorithm 5: Algorithm C
OMP I NF M IN ( G, v ) : Given a well-posed instance ( G, v ) , outputs a complete voltage assignment v for G, extending v that minimizes (cid:13)(cid:13) grad [ v ] (cid:13)(cid:13) ∞ . α ← max {| grad [ v ]( e ) | | e ∈ E G ∩ ( T ( v ) × T ( v )) } . E G ← E G \ ( T ( v ) × T ( v )) P ← S TEEPEST P ATH ( G, v ) . α ← max { α, ∇ P ( v ) } ( vLow , LParent ) ← C OMP VL OW ( G, v , α ) ( vHigh , HParent ) ← C OMP VH IGH ( G, v , α ) for x ∈ V G if x ∈ T ( v ) then v ( x ) ← v ( x ) else v ( x ) ← · ( vLow ( x ) + vHigh ( x )) . return v Algorithm 6: Algorithm C
OMP H IGH P RESS G RAPH ( G,v , α ) : Given a well-posed instance ( G, v ) , a gradient value α ≥ ,outputs a minimal induced subgraph G ′ of G where every vertex has pressure [ v ]( · ) > α. ( vLow , LParent ) ← C OMP VL OW ( G, v , α ) ( vHigh , HParent ) ← C OMP VH IGH ( G, v , α ) V G ′ ← { x ∈ V G | vHigh ( x ) > vLow ( x ) } E G ′ ← { ( x, y ) ∈ E G | x, y ∈ V G ′ } . G ′ ← ( V ′ , E ′ , ℓ ) return G ′ Proof of Lemma 4.3: vHigh [ α ]( x ) > vLow [ α ]( x ) is equivalent to max t ∈ T ( v ) { v ( t ) − α · dist ( t, x ) } > min t ∈ T ( v ) { v ( t ) + α · dist ( x, t ) } , which implies that there exists terminals s, t ∈ T ( v ) such that v ( t ) − α · dist ( t, x ) > v ( s ) + α · dist ( x, s ) thus, pressure [ v ]( x ) ≥ v ( t ) − v ( s ) dist ( t, x ) + dist ( x, s ) > α. So the inequality on vHigh and vLow implies that pressure is strictly greater than α . On the other hand, if pressure [ v ]( x ) >α , there exists terminals s, t ∈ T ( v ) such that v ( t ) − v ( s ) dist ( t, x ) + dist ( x, s ) = pressure [ v ]( x ) > α. Hence, v ( t ) − α · dist ( t, x ) > v ( s ) + α · dist ( x, s ) which implies vHigh [ α ]( x ) > vLow [ α ]( x ) . ✷ Algorithm 7: Algorithm S
TEEPEST P ATH ( G,v ) : Given a well-posed instance ( G, v ) , with T ( v ) = V G , outputs a steepestfree terminal path P in ( G, v ) .
1. Sample uniformly random e ∈ E G . Let e = ( x , x ) .
2. Sample uniformly random x ∈ V G . for i = 1 to P ← V ERTEX S TEEPEST P ATH ( G, v , x i )
5. Let j ∈ arg max j ∈{ , , } ∇ P j ( v ) G ′ ← C OMP H IGH P RESS G RAPH ( G, v , ∇ P j ( v )) if E G ′ = ∅ , then return P j else return S TEEPEST P ATH ( G ′ , v | V G ′ ) Algorithm 8: Algorithm C
OMP L EX M IN ( G, v ) : Given a well-posed instance ( G, v ) , with T ( v ) = V G , outputs lex G [ v ] . while T ( v ) = V G E G ← E G \ ( T ( v ) × T ( v )) P ← S TEEPEST P ATH ( G, v ) v ← fix [ v , P ] return v Algorithm 9: Algorithm V
ERTEX S TEEPEST P ATH ( G,v , x ) : Given a well-posed instance ( G, v ) , and a vertex x ∈ V G ,outputs a steepest terminal path in ( G, v ) through x .
1. Using Dijkstra’s algorithm, compute dist ( x, t ) for all t ∈ T ( v ) if x ∈ T ( v ) y ← arg max y ∈ T ( v ) | v ( x ) − v ( y ) | dist ( x,y ) if v ( x ) ≥ v ( y ) then return a shortest path from x to y else return a shortest path from y to x else for t / ∈ T ( v ) , d ( t ) ← dist ( x, t ) ( t , t ) ← S TAR S TEEPEST P ATH ( T ( v ) , v | T ( v ) , d )
10. Let P be a shortest path from t to x. Let P be a shortest path from x to t . P ← ( P , P ) . return P. Algorithm 10: S
TAR S TEEPEST P ATH ( T,v, d ) : Returns the steepest path in a star graph, with a single non-terminal connectedto terminals in T, with lengths given by d , and voltages given by v.
1. Sample t uniformly and randomly from T
2. Compute t ∈ arg max t ∈ T | v ( t ) − v ( t ) | d ( t )+ d ( t ) α ← | v ( t ) − v ( t ) | d ( t )+ d ( t )
4. Compute v low ← min t ∈ T ( v ( t ) + α · d ( t )) T low ← { t ∈ T | v ( t ) > v low + α · d ( t ) }
6. Compute v high ← max t ∈ T ( v ( t ) − α · d ( t )) T high ← { t ∈ T | v ( t ) < v high − α · d ( t ) } T ′ ← T low ∪ T high . if T ′ = ∅ then if v ( t ) ≥ v ( t ) then return ( t , t ) else return ( t , t ) else return S TAR S TEEPEST P ATH ( T ′ , v | T ′ , d T ′ ) B.1 Faster Lex-minimization
Algorithm 11: Algorithm C
OMP F AST L EX M IN ( G, v ) : Given a well-posed instance ( G, v ) , with T ( v ) = V G , outputs lex G [ v ] . while T ( v ) = V G v ← F IX P ATHS A BOVE P RESS ( G, v , return v Algorithm 12: Algorithm F IX P ATHS A BOVE P RESS ( G,v , α ) : Given a well-posed instance ( G, v ) , with T ( v ) = V G , anda gradient value α, iteratively fixes all paths with gradient > α . while T ( v ) = V G E G ← E G \ ( T ( v ) × T ( v ))
3. Sample uniformly random e ∈ E G . Let e = ( x , x ) .
4. Sample uniformly random x ∈ V G . for i = 1 to P i ← V ERTEX S TEEPEST P ATH ( G, v , x i )
7. Let j ∈ arg max j ∈{ , , } ∇ P j ( v ) G ′ ← C OMP H IGH P RESS G RAPH ( G, v , ∇ P j ( v )) if E G ′ = ∅ , then v ← fix [ v , P ] else Let G ′ i , i = 1 , . . . , r be the connected components of G ′ . for i = 1 , . . . , r v i ← F IX P ATHS A BOVE P RESS ( G ′ i , v | V G ′ i , ∇ P j ( v )) for x ∈ V G ′ i , set v ( x ) ← v i ( x ) if α > then G ← C OMP H IGH P RESS G RAPH ( G, v , α ) return v C Experiments on WebSpam: Testing More Algorithms
For completeness, in this appendix we show how a number of algorithms perform on the web spam experiment ofSection 6. We consider the following algorithms:• R
AND W ALK along in-links. For a detailed description see Zhou et al. (2007). This algorithm essentially per-forms a Personalized PageRank random walk from each vertex x and computes a spam-value for the vertex x bytaking a weighted average of the labels of the vertices where the random walk from x terminates. Also shown inSection 6. • D
IRECTED L EX , with edges in the opposite directions of links. This has the effect that a link to a spam host isevidence of spam, and a link from a normal host is evidence of normality. Also shown in Section 6. • R
AND W ALK along out-links.• D
IRECTED L EX , with edges in the directions of links. This has the effect that a link from to a spam host isevidence of spam, and a link to a normal host is evidence of normality.• U NDIRECTED L EX : Lex-minimization with links treated as undirected edges.• L APLACIAN : l -regression with links treated as undirected edges.• D IRECTED
EAREST N EIGHBOR : Uses shortest distance along paths following out-links.
Spam-ratio isdefined distance from normal hosts, divided by distance to spam hosts. Sites are flagged as spam when spam-ratio exceeds some threshold. We also tried following paths along in-links instead, but that gave much worseresults.We use the experimental setup described in Section 6. Results are shown in Figure 4. The alternative conventionfor D
IRECTED L EX orients edges in the directions of links. This takes a link from a spam host to be evidence ofspam, and a link to a normal host to be evidence of normality. This approach performs significantly worse than ourpreferred convention, as one would intuitively expect. U NDIRECTED L EX and L APLACIAN approaches also performsignificantly worse. D
IRECTED
EAREST N EIGHBOR performs poorly, demonstrating that D
IRECTED L EX is verydifferent from that approach. As observed by Zhou et al. (2007), sampling based on a random walk following out-linksperforms worse than following in-links. Up to 60 % recall, D IRECTED L EX performs best, both in the regime of 5 %labels for training and in the regime of 20 % labels for training.22 ecall P r e c i s i on RandWalk along in-linksDirectedLexRandWalk along out-linksDirectedLex, opposite link directionUndirectedLexLaplacianDirected 1NN
Recall P r e c i s i on
20 % labels for training
RandWalk along in-linksDirectedLexRandWalk along out-linksDirectedLex, opposite link directionUndirectedLexLaplacianDirected 1NN
Figure 4: Recall and precision in the WebSpam classification experiment. Each data point shown was computed as an averageover 100 runs. The largest standard deviation of the mean precision across the plotted recall values was less than 1.5 %. Thealgorithm of Zhou et al. (2007) appears as R
AND W ALK (along in-links). We also show R
AND W ALK along out-links. Our directedlex-minimization algorithm appears as D
IRECTED L EX . We also show D IRECTED L EX with link directions reversed, along withU NDIRECTED L EX and L APLACIAN . D l -Vertex Regularization Proofs In this appendix, we prove Theorem 7.1 and Theorem 7.2. For the purposes of proving the second theorem, we intro-duce an alternative version of problem (3). The optimization problem here requires us to minimize l -regularization23udget required to obtain an inf-minimizer with gradient below a given threshold: min v ∈ IR n (cid:13)(cid:13) v ( T ) − v ( T ) (cid:13)(cid:13) subject to (cid:13)(cid:13) grad G [ v ] (cid:13)(cid:13) ∞ ≤ α. (6)We will also need the following graph construction. Definition D.1
The α -pressure terminal graph of a partially-labeled graph ( G, v ) is a directed unweighted graph G α = ( T ( v ) , b E ) such that ( s, t ) ∈ b E if and only if there is a terminal path P from s to t in G with ∇ P ( v ) > α. Note that the α -pressure terminal graph has O ( n ) vertices but may be dense, even when G is not. Algorithm 13: Algorithm T
ERM -P RESSURE : Given a well-posed instance ( G, v ) and α ≥ , outputs α pressure terminalgraph G α . Initialize G α with vertex set V α = T ( v ) and edge set b E = ∅ . for each terminal s ∈ T ( v )
1. Compute the distances to every other terminal t by running Dijktra’s algorithm, allowing shortest pathsthat run through other terminals.2. Use the resulting distances to check for every other terminal t if there is a terminal path P from s to t with ∇ P ( v ) > α . If there is, add edge ( s, t ) to b E . Lemma D.2
The α -pressure terminal graph of a voltage problem ( G, v ) can be computed in O (( m + n log n ) n ) timeusing algorithm T ERM -P RESSURE (Algorithm 13).
Proof:
The correctness of the algorithm follows from the fact that Dijkstra’s algorithm will identify all shortestdistances between the terminals, and the pressure check will ensure that terminal pairs ( s, t ) are added to b E if andonly if they are the endpoints of a terminal path P with ∇ P ( v ) > α . The running time is dominated by performingDijkstra’s algorithm once for each terminal. A single run of Dijkstra’s algorithm takes O ( m + n log n ) time, and thisis performed at most n times, for a total running time of O (( m + n log n ) n ) . ✷ We make three observations that will turn out to be crucial for proving Theorems 7.1 and 7.2.
Observation D.3 G α is a subgraph of G β for α ≥ β . Proof:
Suppose edge ( s, t ) appears in G α , then for some path P ∇ P ( v ) > α ≥ β, so the edge also appears in G β . ✷ Observation D.4 G α is transitively closed. Proof:
Suppose edges ( s, t ) and ( t, r ) appear in G α . Let P ( s,t ) , P ( t,r ) , P ( s,r ) be the respective shortest paths in G between these terminal pairs. Then ∇ P ( s,r ) ( v ) = v ( s ) − v ( r ) ℓ ( P ( s,r ) ) ≥ v ( s ) − v ( r ) ℓ ( P ( s,t ) ) + ℓ ( P ( t,r ) ) = v ( s ) − v ( t ) + v ( t ) − v ( r ) ℓ ( P ( s,t ) ) + ℓ ( P ( t,r ) ) ≥ min v ( s ) − v ( t ) ℓ ( P ( s,t ) ) , v ( t ) − v ( r ) ℓ ( P ( t,r ) ) ! > α. (7)So edge ( s, r ) also appears in G α . This is sufficient for G α to be transitively closed. ✷ bservation D.5 G α is a directed acyclic graph. Proof:
Suppose for a contradiction that a directed cycle appears in G α . Let s and t be two vertices in this cycle. Let P ( s,t ) and P ( t,s ) be the respective shortest paths in G between these terminal pairs. Because G α is transitively closed,both edges ( s, t ) and ( t, s ) must appear in G α . But ( s, t ) ∈ b E implies v ( s ) − v ( t ) > αℓ ( P ( s,t ) ) > , and similarly ( t, s ) ∈ b E implies v ( t ) − v ( s ) > αℓ ( P ( t,s ) ) > . This is a contradiction. ✷ The usefulness of the α -pressure terminal graph is captured in the following lemma. We define a vertex cover of adirected graph to be a vertex set that constitutes a vertex cover in the same graph with all edges taken to be undirected. Lemma D.6
Given a partially-labeled graph ( G, v ) and a set U ⊆ V , there exists a voltage assignment v ∈ IR n thatsatisfies (cid:8) t ∈ T ( v ) : v ( t ) = v ( t ) (cid:9) ⊆ U and (cid:13)(cid:13) grad G [ v ] (cid:13)(cid:13) ∞ ≤ α, if and only if U is a vertex cover in the α -pressure terminal graph G α of ( G, v ) . Proof:
We first show the “only if” direction. Suppose for a contradiction that there exists a voltage assignment v forwhich (cid:13)(cid:13) grad G [ v ] (cid:13)(cid:13) ∞ ≤ α , but U is not a vertex cover in G α . Let ( s, t ) be an edge G α which is not covered by U . Thepresence of this edge in G α implies that there exists a terminal path P from s to t in G for which ∇ P ( v ) > α. But, by Lemma 3.5 this means there is no assignment v for G which agrees with v on s and t and has (cid:13)(cid:13) grad G [ v ] (cid:13)(cid:13) ∞ ≤ α . This contradicts our assumption.Now we show the “if” direction. Consider an arbitrary vertex cover U of G α . Suppose for a contradiction thatthere does not exist a voltage assignment v for G with (cid:13)(cid:13) grad G [ v ] (cid:13)(cid:13) ∞ ≤ α and (cid:8) t ∈ T ( v ) : v ( t ) = v ( t ) (cid:9) ⊆ U .Define a partial voltage assignment v U given by v U ( t ) = ( v ( t ) if t ∈ T ( v ) \ U ∗ o.w.The preceding statement is equivalent to saying that there is no v that extends v U and has (cid:13)(cid:13) grad G [ v ] (cid:13)(cid:13) ∞ ≤ α . ByLemma 3.5, this means there is terminal path between s, t ∈ T ( v U ) with gradient strictly larger than α . But thismeans an edge ( s, t ) is present in G α and is not covered. This contradicts our assumption that U is a vertex cover. ✷ We are now ready to prove Theorem 7.2.
Proof of Theorem 7.2:
We describe and prove the algorithm O
UTLIER . The algorithm will reduce problem (3)to problem (6): Suppose v ∗ is an optimal assignment for problem (3). It achieves a maximum gradient α ∗ = (cid:13)(cid:13) grad G [ v ∗ ] (cid:13)(cid:13) ∞ . Using Dijkstra’s algorithm we compute the pairwise shortest distances between all terminals in G .From these distances and the terminal voltages, we compute the gradient on the shortest path between each terminalpair. By Lemma 3.5, α ∗ must equal one of these gradients. So we can solve problem (3) by iterating over the set ofgradients between terminals and solving problem (6) for each of these O ( n ) gradients. Among the assignments with (cid:13)(cid:13) v ( T ) − v ( T ) (cid:13)(cid:13) ≤ k , we then pick the solution that minimizes (cid:13)(cid:13) grad G [ v ] (cid:13)(cid:13) ∞ .In fact, we can do better. By Observation D.3, G α is a subgraph of G β for α ≥ β . This means a vertex coverof G α is also a vertex cover of G β , and hence the minimum vertex cover for G β is at least as large as the minimumvertex cover for G α . This means we can do a binary search on the set of O ( n ) terminal gradients to find the minimumgradient for which there exists an assignment with (cid:13)(cid:13) v ( T ) − v ( T ) (cid:13)(cid:13) ≤ k . This way, we only make O (log n ) calls toproblem (6), in order to solve problem (3).We use the following algorithm to solve problem (6). 25. Compute the α -pressure terminal graph G α of G using the algorithm T ERM -P RESSURE .2. Compute a minimum vertex cover U of G α using the algorithm K ONIG -C OVER from Theorem 7.3.3. Define a partial voltage assignment v U given by v U ( t ) = ( v ( t ) if t ∈ T ( v ) \ U, ∗ otherwise.4. Using Algorithm 5, compute voltages v that extend v U and output v .From Lemma D.2, it follows that step 1 computes the α -pressure terminal graph in polynomial time. From The-orem 7.3 it follows that step 2 computes the a minimum vertex cover of the α -pressure terminal graph in polynomialtime, because our observations D.4 and D.5 establish that the graph is a TC-DAG. From Lemma D.6 and Theorem 4.6,it follows that the output voltages solve program (6). ✷ To prove Theorem 7.1, we use the standard greedy approximation algorithm for MIN-VC (Vazirani (2001)).
Theorem D.7 2-Approximation Algorithm for Vertex Cover.
The following algorithm gives a 2-approximation tothe Minimum Vertex Cover problem on a graph G = ( V, E ) .0. Initialize U = ∅ .1. Pick an edge ( u, v ) ∈ E that is not covered by U .2. Add u and v to the set U .3. Repeat from step 1 if there are still edges not covered by U .4. Output U . We are now in a position to prove Theorem 7.1
Proof of Theorem 7.1:
Given an arbitrary k and a partially-labeled graph ( G, v ) , let α ∗ be the optimum valueof program (3). Observe that by Lemma D.6, this implies that G α ∗ has a vertex cover of size k . Given the partialassignment v , for every vertex set U , we define v U ( t ) = ( v ( t ) if t ∈ T ( v ) \ U ∗ o.w.We claim the following algorithm A PPROX -O UTLIER outputs a voltage assignment v with (cid:13)(cid:13) grad G [ v ] (cid:13)(cid:13) ∞ ≤ α ∗ and (cid:13)(cid:13) v ( T ) − v ( T ) (cid:13)(cid:13) ≤ k . Algorithm A
PPROX -O UTLIER :
0. Initialize U = ∅ .1. Using the algorithm S TEEPEST P ATH (Algorithm 7), find a steepest terminal path in G w.r.t. v U . Denotethis path P and let s and t be its terminal endpoints. If there is no terminal path with positive gradient, skipto step 4.2. Add s and t to the set U .3. If | U | ≤ k − then repeat from step 1.4. Using the algorithm C OMP I NF M IN (Algorithm 5), compute voltages v that extend v U and output v .From the stopping conditions, it is clear that | U | ≤ k . If in step 1 we ever find that no terminal paths have positivegradient then our v that extends v U will have (cid:13)(cid:13) grad G [ v ] (cid:13)(cid:13) ∞ = 0 ≤ α ∗ , by Lemma 3.5. Similarly if we find a steepest26ath with gradient less than α ∗ w.r.t. v U , then for this U there exists v that extends v U and has (cid:13)(cid:13) grad G [ v ] (cid:13)(cid:13) ∞ ≤ α ∗ .This will continue to hold when if we add vertices to U . Therefore, for the final U , there will exist an v that extends v U and has (cid:13)(cid:13) grad G [ v ] (cid:13)(cid:13) ∞ ≤ α ∗ .If we never find a steepest terminal path P with ∇ P ( v ) ≤ α ∗ , then each steepest path we find corresponds to anedge in G α ∗ that is not yet covered by U and our algorithm in fact implements the greedy approximation algorithmfor vertex cover described in Theorem D.7. This implies that the final U is a vertex cover of G α ∗ of size at most k .By Lemma D.6, this implies that there exists a voltage assignment u extending v U that has (cid:13)(cid:13) grad G [ u ] (cid:13)(cid:13) ∞ ≤ α ∗ . Thisimplies by Theorem 4.6 that the v we output has (cid:13)(cid:13) grad G [ v ] (cid:13)(cid:13) ∞ ≤ α ∗ .In all cases, the v we output extends v U , so (cid:13)(cid:13) v ( T ) − v ( T ) (cid:13)(cid:13) ≤ | U | ≤ k . ✷ E Proof of Hardness of l regularization for l We will prove Theorem 7.4, by a reduction from minimum bisection. To this end, let G = ( V, E ) be any graph. Wewill reduce the minimum bisection problem on G to our regularization problem. Let n = | V | . The graph on which wewill perform regularization will have vertex set V ∪ b V , where b V is a set of n vertices that are in -to- correspondence with V . We assume that every edge in G has weight .We now connect every vertex in b V to the corresponding vertex in V by an edge of weight B , for some large B to bedetermined later. We also connect all of the vertices in b V to each other by edges of weight B . So, we have a completegraph of weight B edges on b V , a matching of weight B edges connecting b V to V , and the original graph G on V .The input potential function will be v ( a ) = ( for a ∈ b V , and for a ∈ V .Now set k = n/ . We claim that we will be able to determine the value of the minimum bisection from the solutionto the regularization problem.If S is the set of vertices on which v and w differ, then we know that the w is harmonic on S : for every a ∈ S , w ( a ) is the weighted average of the values at its neighbors. In the following, we exploit the fact that | S | ≤ n/ . Claim E.1
For every a ∈ S ∩ b V , w ( a ) ≤ /nB . Proof:
Let a be the vertex in S ∩ b V that maximizes w ( a ) . So, a is connected to at least n/ neighbors in b V with w -value equal to by edges of weight B . On the other hand, a has only one neighbor that is not in b V , that vertex has w -value at most , and it is connected to that vertex by an edge of weight B . Call that vertex c . We have (( n − B + B ) w ( a ) = Bw ( c ) + X b ∈ b V ,b = a B w ( b )= Bw ( c ) + X b ∈ b V ∩ S,b = a B w ( b ) + X b ∈ b V − S B w ( b ) ≤ B + X b ∈ b V ∩ S,b = a B w ( a ) ≤ B + ( n/ − B w ( a ) . Subtracting ( n/ − B w ( a ) from both sides gives (( n/ B + B ) w ( a ) ≤ B, which implies the claim. ✷ Claim E.2
For a ∈ S ∩ V , w ( a ) ≤ n/B . roof: Vertex a has exactly one neighbor in b V . Let’s call that neighbor c . We know that w ( c ) ≤ /B n . On theother hand, vertex a has fewer than n − neighbors in V , and each of these have w -value at most . Let d a denote thedegree of a in G . Then, ( B + d a ) w ( a ) ≤ d a + B B n . So, w ( a ) ≤ d a + 2 /Bnd a + B ≤ n + (2 /Bn ) B + n ≤ n/B. ✷ We now estimate the value of the regularized objective function. To this end, we assume that | S | = k = n/ . Let T = S ∩ V, and t = | T | . We will prove that S ⊂ V and so S = T and t = n/ .Let δ denote the number of edges on the boundary of T in V . Once we know that t = n/ , δ is the size of abisection. Claim E.3
The contribution of the edges between V and b V to the objective function is at least ( n − t ) B − /B and at most ( n − t ) B + tn /B. Proof:
For the lower bound, we just count the edges between vertices in V \ T and b V . There are n − t of theseedges, and each of them has weight B . The endpoint in V \ T has w -value , and the endpoint in b V has w -value atmost /nB . So, the contribution of these edges is at least ( n − t ) B (1 − /nB ) ≥ ( n − t ) B (1 − /nB ) ≥ ( n − t ) B − /B. For the upper bound, we observe that the difference in w -values across each of these n − t edges is at most , so theirtotal contribution is at most ( n − t ) B. Since for every vertex a ∈ T , w ( a ) ≤ n/B , and also every vertex b ∈ b V , w ( b ) ≤ /nB , the contribution due toedges between T and b V is at most t ( n/B ) B = tn /B. ✷ We will see that this is the dominant term in the objective function. The next-most important term comes from theedges in G . 28 laim E.4 The contribution of the edges in G to the objective function is at least δ (1 − n/B ) and at most δ + ( t / n/B ) Proof:
Let ( a, b ) ∈ E . If neither a nor b is in T , then w ( a ) = w ( b ) = 1 , and so this edge has no contribution. If a ∈ T but b T , then the difference in w -values on them is between (1 − n/B ) and . So, the contribution of suchedges to the objective function is between δ (1 − n/B ) and δ. Finally, if a and b are in T , then the difference in w -values on them is at most n/B , and so the contribution of all suchedges to the objective function is at most ( t / n/B ) . ✷ Claim E.5
The edges between pairs of vertices in b V contribute at most /B to the objective function. Proof: As ≤ w ( a ) ≤ /B n for every a ∈ b V , every edge between two vertices in b V can contribute at most B (2 /B n ) = 4 /Bn . As there are fewer than n / such edges, their total contribution to the objective function is at most ( n / /Bn ) = 2 /B. ✷ Lemma E.6 If n ≥ and B = 2 n , the value of the objective function is at least ( n − t ) B + δ − / and at most ( n − t ) B + δ + 1 / . Proof:
Summing the contributions in the preceding three claims, we see that the value of the objective function is atleast ( n − t ) B − /B + δ (1 − n/B ) ≥ ( n − t ) B + δ − /B − nδ/B ≥ ( n − t ) B + δ − n /B ≥ ( n − t ) B + δ − / , as δ ≤ ( n/ .Similarly, the objective function is at most ( n − t ) B + tn /B + δ + ( t / n/B ) + 2 /B ≤ ( n − t ) B + n / B + δ + n / B + 2 /B ≤ ( n − t ) B + n / B + δ + 1 / n + 1 /n ≤ ( n − t ) B + δ + 1 / . ✷ Claim E.7 If n ≥ and B = 2 n , then S ⊂ V . Proof:
The objective function is minimized by making t as large as possible, so t = n/ and S ⊂ V . ✷ heorem E.8 The value of the objective function reveals the value of the minimum bisection in G . Proof:
The value of the objective function will be between ( n/ B + δ − / and ( n/ B + δ + 1 / . So, the objective function will be smallest when δ is as small as possible. ✷✷