Approximating Sparse Quadratic Programs
Danny Hermelin, Leon Kellerhals, Rolf Niedermeier, Rami Pugatch
aa r X i v : . [ c s . D S ] A ug Approximating Sparse Quadratic Programs
Danny Hermelin , Leon Kellerhals , Rolf Niedermeier , and Rami Pugatch Ben-Gurion University of the Negev,Department of Industrial Engineering and Management, Beer Sheva, Israel [email protected], [email protected] Technische Universit¨at Berlin,Chair of Algorithmics and Computational Complexity, Berlin, Germany [email protected], [email protected]
Abstract.
Given a matrix A ∈ R n × n , we consider the problem of maximizing x T Ax subject to theconstraint x ∈ {− , } n . This problem, called MaxQP by Charikar and Wirth [FOCS’04], generalizes
MaxCut and has natural applications in data clustering and in the study of disordered magnetic phases ofmatter. Charikar and Wirth showed that the problem admits an Ω(1 / lg n ) approximation via semidefiniteprogramming, and Alon, Makarychev, Makarychev, and Naor [STOC’05] showed that the same approachyields an Ω(1) approximation when A corresponds to a graph of bounded chromatic number. Boththese results rely on solving the semidefinite relaxation of MaxQP , whose currently best running timeis ˜ O ( n . · min { N, n . } ), where N is the number of nonzero entries in A and ˜ O ignores polylogarithmicfactors.In this sequel, we abandon the semidefinite approach and design purely combinatorial approximationalgorithms for special cases of MaxQP where A is sparse ( i.e. , has O ( n ) nonzero entries). Our algorithmsare superior to the semidefinite approach in terms of running time, yet are still competitive in terms oftheir approximation guarantees. More specifically, we show that: – Unit MaxQP , where A ∈ {− , , } n × n , admits an (1 / d )-approximation in O ( n . ) time, whenthe corresponding graph has no isolated vertices and at most dn edges. – MaxQP admits an Ω(1 / lg a max )-approximation in O ( n . lg a max ) time, where a max is the maxi-mum absolute value in A , when the corresponding graph is d -degenerate. – Unit MaxQP admits a (1 − ε )-approximation in O ( n ) time when the corresponding graph is H -minor free. – MaxQP admits a (1 − ε )-approximation in O ( n ) time when the corresponding graph and each ofits minors have bounded local treewidth. In this paper we are interested in the following (integer) quadratic problem which was coined
MaxQP byCharikar and Wirth [14]. Given an n × n symmetric matrix with zero valued diagonal entries A , a i,j ∈ Q for all i, j ∈ { , . . . , n } , we want to maximizeval x ( A ) = n X i =1 n X j =1 a i,j x i x j s.t. x i ∈ {− , } for all i ∈ { , . . . , n } . (1)Observe that the requirement that all diagonal values of A are zero is to avoid the term P i a i,i which isconstant in (1). Furthermore, a non-symmetric matrix A can be replaced with an equivalent symmetric A ′ by setting a ′ i,j = a ′ j,i = · ( a i,j + a j,i ), and so the requirement that A is symmetric is just for conveniencesake.Our interest in MaxQP lies in the fact that it is a generic example of integer quadratic programmingwhich naturally appears in different contexts. Below we review three examples:1
Graph cuts:
Readers familiar with the standard quadratic program formulation of
MaxCut [22] willnotice the similarity to (1). Indeed, given a graph G = ( V, E ) with vertex set V = { , . . . , n } and edgeweights a i,j ≥ { i, j } ∈ E , the corresponding MaxQP instance on − · A has an optimumsolution of value 2 k − P i,j a i,j iff G has a maximum cut of total weight k . Thus, MaxQP with onlynegative entries can be used to solve
MaxCut exactly, implying that even this special case is NP-hard.Furthermore, this special case translates to the closely related
MaxCut Gain problem [14, 27]. – Correlation clustering:
In correlation clustering [6, 13, 15, 33], we are provided with pairwise judgmentsof the similarity of n data items. In the simplest version of the problem there are three possible inputsfor each pair: similar ( i.e. positive), dissimilar ( i.e. negative), or no judgment. In a given clustering ofthe n items, a pair of items is said to be in agreement ( disagreement ) if it is a positive (negative) pairwithin one cluster or a negative (positive) pair across two distinct clusters. In MaxCorr , the goal isto maximize the correlation of the clustering; that is, the absolute difference between the number ofpairs in agreement and the number of pairs in disagreement, across all clusters. Note that when onlytwo clusters are allowed, this directly corresponds to
Unit MaxQP , the variant of
MaxQP where a i,j ∈ {− , , } for each entry a i,j of A . – Ising spin glass model:
Spin glass models are used to in physics to study disordered magnetic phasesof matter. Such system are notoriously hard to solve, and various techniques to approximate the freeenergy were developed. In the Ising spin-glass model [9, 34], each node in the graph represents a singlespin which can either point up (+1) or down (-1), and neighboring spins ( i, j ) may have either positiveor negative coupling energy a i,j between themThe energy of this system (when there is no externalfield) is given by its Hamiltonian H = − · P i,j a i,j α ( i ) α ( j ), where α ( i ) ∈ {− , } is the spin at site i .A famous problem in the physics of spin-glasses is the characterization of the ground state — the statethat minimizes the energy of the system. This problem is precisely MaxQP .It is convenient to view
MaxQP in graph-theoretic terms. Let G = ( V, E ) be the graph associated with A , where V = { , . . . , n } and E = {{ i, j } : a i,j = 0 } . The first algorithmic result for MaxQP was fromBieche et al. [12] and Barahona et al. [11] who studied the problem in the context of the Ising spin glassmodel. They showed that when G is restricted to be planar, the problem is polynomial-time solvable viaa reduction to maximum weight matching. At the same time, Barahona proved that the problem is NP-hard for three-dimensional grids [9], or apex graphs (graphs with a vertex whose removal leaves the graphplanar) [10]. As MaxQP is NP-hard, even for restricted instances, our focus is naturally on polynomial-time approxima-tion algorithms. We note that the fact that the values of A are allowed to be both positive and negativemakes MaxQP quite unique in this context. First of all, there is an immediate equivalence between themaximization version of
MaxQP and its minimization version, as maximizing val x ( A ) is the same as mini-mizing val x ( − · A ). Furthermore, solutions might have negative values. This poses an extra challenge sincea solution with a non-positive value is not an f ( n )-approximate solution, for any function f , in case theoptimum is positive (which it always is whenever A = 0, see [14] and Lemma 6). In particular, a uniformlyat random chosen solution x has val x ( A ) = 0 on expectation, and unlike MaxCut , such a solution is unlikelyto be useful as any kind of approximation.Alon and Naor [2] were the first to show that these difficulties can be overcome by carefully rounding asemidefinite relaxation of
MaxQP . In particular, they studied the problem when G is restricted to a bipartitegraph, and showed that using a rounding technique that relies on the famous Grothendieck inequality, one canobtain an approximation factor guarantee of ≈ .
56 for the bipartite case. Later, together with Makarychevand Makarychev [1], they showed that the integrality gap of the semidefinite relaxation is O (lg χ ( G )) andΩ(lg ω ( G )), where χ ( G ) and ω ( G ) are the chromatic and clique numbers of G respectively. In particular, thisgap is constant for several interesting graph classes such as d -degenerate graphs and H -minor free graphs,and it generalizes the previous result in [2] as χ ( G ) ≤ G is bipartite.2 heorem 1 ([1, 2]) . MaxQP restricted to graphs of O (1) chromatic number can be approximated within afactor of Ω(1) in polynomial time.
Regarding the general version of the problem, where G can be an arbitrary graph, an O (lg n ) integralitygap for the semidefinite relaxation was first shown by Nestrov [31]. However, his proof was non-constructive.Charikar and Wirth [14] made his proof constructive, and provided a rounding procedure for the relaxationthat guarantees Ω(1 / lg n )-approximate solutions regardless of the structure of G . Theorem 2 ([14, 31]) . MaxQP can be approximated within a factor of
Ω(1 / lg n ) in polynomial time. As for the time complexity of the algorithm in Theorem 1 and 2 above, Arora, Hazan, and Kale [4]provided improved running times for several semidefinite programs, including the relaxation of
MaxQP .They showed that this relaxation can be solved (to within any constant factor) in time ˜ O ( n . · min { N, n . } ),where N is the number of nonzero entries in A and ˜ O ignores polylogarithmic factors. Thus, for generalmatrices A this running time is O ( n ), and for matrices with O ( n ) nonzero entries this is O ( n . ).There has also been work on approximation lower bounds for MaxQP . Alon and Naor [2] showed that
MaxQP restricted to bipartite graphs cannot be approximated within 16 / ε unless P=NP, while Charikarand Wirth [14] showed that, assuming P =NP, the problem admits no (11 /
13 + ε )-approximation when G is an arbitrary graph. Both these results follow somewhat directly from the 16 /
17 + ε lower bound for MaxCut [25]. In contrast, Arora et al. [3] showed a much stronger lower bound by proving that there existsa constant c >
MaxQP cannot be approximated within Ω(1 / lg c n ), albeit under the weakerassumption that NP * DTime( n lg O (1) n ). In this paper we focus on sparse graphs, i.e. graphs where the number of edges m is O ( n ). This correspondsto matrices A having O ( n ) nonzero entries. Note that MaxQP remains APX-hard in this case as well(see Theorem 8 in Appendix B). Nevertheless, we show that one can abandon the semidefinite approachin favor of simpler “purely combinatorial” algorithms, while still maintaining comparable performances. Inparticular, our algorithms are faster than than those obtained from the semidefinite approach whose fastestknown implementation requires O ( n . ) time [4]. Furthermore, most of them are quite easy to implement.Our first result concerns Unit MaxQP , the special case of
MaxQP where a i,j ∈ {− , , } for eachentry a i,j of A (recall the MaxCorr problem above). Here we obtain an Ω(1)-approximation algorithm forgeneral sparse graphs that do not have any particular structure.
Theorem 3.
Let d > . Then Unit MaxQP restricted to graphs G = ( V, E ) with no isolated vertices and | E | ≤ d · n can be approximated within a factor of / d in O ( n . ) time. Note that there are several interesting graph classes included in the theorem above but excluded byTheorem 1. For instance, consider a graph consisting of a clique of size √ n together with a perfect matchingon the remaining vertices. The result of Alon et al. [1] implies that the semidefinite relaxation has anintegrality gap of Ω(lg n ) on such a graph, while the algorithm in Theorem 3 provides an Ω(1)-approximation.Our next result extends Theorem 3 to the weighted case, but at a cost to the approximation factorguarantee. Furthermore, it applies for a less general graph class, namely the class of d -degenerate graphs.Recall that a graph is d -degenerate if each of its subgraphs has a vertex of degree at most d . Let a max =max i,j | a i.j | denote the maximum absolute value in A . Theorem 4.
Let d > . Then MaxQP restricted to d -degenerate graphs can be approximated within afactor of Ω(1 / lg a max ) in O ( n . lg a max ) time. Note that Theorem 1 provides an Ω(1)-approximation for d -degenerate graphs, yet the algorithm inTheorem 4 is faster by a factor of n .We next consider sparse graph classes with additional structure. Recall that G is H -minor free if onecannot obtain in G an isomorphic copy of H by a series of edge contractions, edge deletions, and vertexdeletions. We show that, for the unit case, one can obtain a (1 − ε )-approximation algorithm, for any ε > H -minor free graphs. 3 heorem 5. For ε > and any graph H there is an O ( n ) time (1 − ε ) -approximation algorithm for UnitMaxQP restricted to H -minor free graphs. An apex graph contains one specific vertex such that deleting it makes the graph planar. For the specialcase that H is an apex graph, we present an algorithm with the same (1 − ε ) factor guarantee, for any ε > Theorem 6.
Let ε > and any apex graph H there is an O ( n ) time (1 − ε ) -approximation algorithm for MaxQP restricted to H -minor free graphs. The class of apex-minor free graphs is better known as the class of minor-closed graphs with boundedlocal treewidth; equivalence of the two classes was shown by Eppstein [20]. This class includes planar andbounded genus graphs.Finally, we note that our results have direct consequences for the
MaxCorr problem: Charikar andWirth [14] proved that an α -approximation algorithm for MaxQP implies an α/ (2 + α )-approximationalgorithm for MaxCorr . Combining this with Theorems 3, 6 and 5 directly gives us the following:
Corollary 1.
MaxCorr can be approximated within a factor of – / (6 d + 1) − ε on graphs G = ( V, E ) with | E | ≤ d · | V | in O ( n . ) time; – / − ε on H -minor free graphs in O ( n ) time. – / − ε on apex-minor free graphs in O ( n ) time. Throughout the paper we use G = ( V, E ) to denote the graph associated with our input matrix A ; that is, V = { , . . . , n } and E = {{ i, j } : a i,j = 0 } . Thus, n = | V | and we let m = | E | . We slightly abuse notationby allowing a solution x to denote either a vector in {− , } n indexed by V or a function x : V → {− , } .For a solution x , we let val x ( G ) = P { i,j }∈ E a i,j x i x j , and we let opt( G ) = max x val x ( G ). We use || A || todenote the sum of absolute values in A , i.e. || A || = P i,j | a i,j | . Note that opt( G ) ≤ || A || .We use standard graph-theoretic terminology when dealing with the graph G , as in e.g. [17]. in particular,for a subset V ′ ⊆ V , we let G [ V ′ ] denote the subgraph of G induced by V ′ ; i.e. , the subgraph with vertexset V ′ and edge-set {{ u, v } ∈ E : u, v ∈ V ′ } . We let G − V ′ denote G [ V \ V ′ ], and for a subset of edges E ′ ⊆ E we let G − E ′ denote the graph ( V, E ′ ) without isolated vertices. For a pair of disjoint subsets V , V ⊆ V , we let E ( V , V ) denote the set of edges E ( V , V ) = {{ u, v } ∈ E : u ∈ V , v ∈ V } . Finally, weuse N ( v ) = { u : { u, v } ∈ E } to denote the neighborhood of a vertex v ∈ V of G . Note that for a uniformly chosen at random solution x , the value a i,j x i x j is zero in expectation for any edge { i, j } ∈ E . This implies that opt( G ) ≥
0. Moreover, a solution x with val x ( G ) ≥ Lemma 1.
One can compute in O ( m + n ) time a solution x for which val x ( G ) ≥ .Proof. For each vertex i ∈ V , define the subset of edges E ( i ) = {{ i, j } ∈ E : j < i } . Consider an arbitraryinitial solution x , and let z i = P { i,j }∈ E ( i ) x i x j a i,j . Then z i is the contribution of edges in E ( i ) to val x ( G ).We compute a solution x ∗ by scanning the vertices from 1 to n . For a given vertex i , we check whether z i < x ∗ i = − x i , and otherwise we set x ∗ i = x i . Note that z ∗ i = P { i,j }∈ E ( i ) x ∗ i x ∗ j a i,j must now bepositive. As the value of x ∗ i does not change z ∗ j for any j < i , when we finish our scan we have z ∗ i ≥ i ∈ { , . . . , n } . Thus, val x ∗ ( G ) = P i z ∗ i ≥ Lemma 2.
Let V , V ⊆ V be two disjoint subsets of vertices, and let x and x be two solutions for G [ V ] and G [ V ] of value z and z respectively. Then at least one of the solutions x ∪ x and − x ∪ x has value z + z for G [ V ∪ V ] . roof. Suppose x ∪ x has value less than z + z . This means that the total contribution of the edgesin E ( V , V ) is negative in this solution. Observe that in − x ∪ x each edge of E ( V , V ) with negativecontribution under x ∪ x now has positive contribution, and vice versa. The lemma thus follows.Combining Lemma 1 and Lemma 2 above, we get an important property of val( G ), namely that it ismonotone with respect to induced subgraphs. Lemma 3.
Let H be an induced subgraph of G . Then given a solution x , one can compute a solution x ∗ for G with val x ∗ ( G ) ≥ val x ( H ) .Proof. Let V ⊆ V be the vertices of G which are not present in H . According to Lemma 1 we can computea solution x for G [ V ] with value at least zero in linear time. According to Lemma 2 either x ∪ x or x ∪ − x have value at least val x ( G [ V ]) + val x ( H ) ≥ val x ( H ). Thus, taking x ∗ to be the solution with higher valueout of x ∪ x or x ∪ − x proves the lemma. In this section we present approximation algorithms for
MaxQP using a lower bound we develop for thevalue of the optimum solution. In particular, we provide complete proofs for Theorem 3 and Theorem 4.Beginning with the unit weight case, i.e. the case when A ∈ {− , , } n × n , we obtain a lower boundanalogous to the classical MaxCut bound of Edwards [18], although our approach follows the later proofof Erd˝os et al. [21]. This will directly imply Theorem 3. We then show how to extend our lower boundto general weights in case G is triangle-free, i.e. the case where G contains no three pairwise mutuallyadjacent vertices. In the last subsection we show how to remove the triangle-freeness restriction in case G is d -degenerate, providing a proof for Theorem 4. A set S ⊆ V of vertices is a star in G if | S | ≥ G [ S ] is connected and has at most one vertex of degreegreater than 1 (called the center of S ). We say a star S is uniform if the edges of G [ S ] are either all positiveor all negative. A star packing of G is a family of pairwise disjoint subsets of vertices S = { S , . . . , S s } suchthat each S i is a uniform star in G . We let V S = S S i ∈S S i , and I S = V \ V S . We refer to s = |S| as the size of S , and to the value m ( S ) = P i ( | S i | −
1) (the total number of edges in S ), as the magnitude of S .Star packings will be useful throughout the section for showing lower bounds on opt( G ). The directconnection between these two concepts is given in the lemma below. Lemma 4.
Given a star packing S of magnitude m ( S ) , one can compute in linear time a solution x with val x ( G ) ≥ m ( S ) .Proof. By Lemma 3 it suffices to compute a solution x for V S with val x ( G [ V S ]) ≥ m ( S ). Let S = { S , . . . , S s } .We construct such a solution x by induction on s . For s = 1, we assign the vertices of S the same valuein {− , } in case all edges of G [ S ] are positive, and we assign the leaves and the center vertex of S opposing values if all edges of G [ S ] are negative. Thus, val x ( G [ S ]) = | S | − m ( S ). Suppose then that s >
1, and let S = S \ { S s } . By induction, we have a solution x for V S with val x ( G [ V S ]) ≥ m ( S ). Let x s : S s → {− , } be such that val x s ( G [ S s ]) = | S s | −
1, as in the case of s = 1. Then, by Lemma 2, either x ∪ x s or x ∪ − x s have value at least m ( S ) + | S s | − m ( S ), and we are done.We construct a particular star packing S ∗ for G . We begin by first letting S ∗ be any matching ofmaximum size in G . Observe that since S ∗ is a maximum matching, the set I S ∗ is independent in G , andthere are no other star packings in G of greater size. Both of these invariants will be maintained throughoutour construction. We next greedily add edges to S ∗ by exhaustively applying the following rule as long aspossible: Rule 1: If S i ∪ { v } is a uniform star for some v ∈ I S ∗ and some S i ∈ S , then add v to S i .5nce Rule 1 cannot be applied, every edge between a vertex in a positive star and I S is negative, and viceversa. We then exhaustively apply Rule 2: Rule 2:
If there is a center c i of a star S i ∈ S ∗ , | S i | ≥
3, which has more than | S i | − N ⊆ I S ∗ , then replace S i with { c i } ∪ N in S ∗ .It is clear that S ∗ remains a star packing after we finish applying both rules above. We next provide alower bound on the magnitude of S ∗ . For each star S ∈ S ∗ , let N S ⊆ I S ∗ denote the set of neighbors of S in I S ∗ ; that is, N S = { u ∈ I S ∗ : { u, v } ∈ E, v ∈ S } . Then we have: Lemma 5. | N S | ≤ | S | − for each S ∈ S ∗ .Proof. Suppose | S | ≥
3, and let c ∈ S be the center of S . First observe that any edge between S \ { c } and I S ∗ can be used to create a new star S s +1 , contradicting the fact that S is of maximum size. Assume thatall edges in G [ S i ] are positive (the negative case is symmetric). Then, since Rule 1 cannot be applied, alledges between c i and I S ∗ are negative. Furthermore, since Rule 2 cannot be applied, there are no more than | S | − | N S | ≤ | S | − S = { u, v } , and that { u, v } is positive (again, the case where { u, v } is negative issymmetric). Since Rule 1 cannot be applied, all edges between u or v and I S are negative. Moreover, sinceRule 2 cannot be applied, neither u nor v can be adjacent to more than one vertex in I S ∗ . Finally, if u isadjacent to u ′ ∈ I S ∗ and v is adjacent to v ′ ∈ I S ∗ with u ′ = v ′ , then we can replace S with { u, u ′ } and { v, v ′ } , contradicting the fact that S is of maximum size. Thus, | N S | ≤ | s | − Lemma 6.
Let | E | ≤ d · n . Then m ( S ∗ ) ≥ m/ d .Proof. We present a mapping from V to the edges of S ∗ which maps at most three vertices to a single edge,proving that m ( S ∗ ) ≥ n/
3. The lemma will then follow immediately from the fact that n ≥ m/d .For a vertex v belonging to some star S i of S ∗ , we map v to any edge incident to v in G [ S i ]. Thus,exactly one edge of G [ S i ] will have two vertices in its preimage, while the remaining edges have only one.After mapping all vertices in the stars of S ∗ , we proceed to map the remaining vertices in I S ∗ as follows. Wego through each star S i at a time, and map the vertices in I S ∗ that are connected to vertices in S i . Thereare at most | S i | − G [ S i ]. This increases the size of the preimage of each edge in G [ S i ] to at most three. After going over allstars S i we map all vertices of I S ∗ , as G has no isolated vertices, and so we obtain a mapping from V to theedges of S ∗ with the promised property. Proof of Theorem 3.
Due to Lemmata 4 and 6, the star packing S ∗ yields a solution x with val x ( G ) ≥ m/ d .Since opt( G ) ≤ || A || = m , this solution is 1 / d -approximate. The running time for computing S ∗ isdominated by the computation of the maximum matching for the initial star packing, taking O ( m √ n ) = O ( n . ) time [30]; exhaustive application of Rules 1 and 2 and the computation of the solution from the starpacking both take O ( m + n ) = O ( n ) time. As an intermediate step towards the proof of Theorem 4 we extend the lower bound of the previous subsectionto arbitrary weights in case G is triangle-free. For weighted graphs, we let the magnitude of a star packing S be the total absolute value of edges in S , i.e. m ( S ) = P S ∈S P { i,j }∈ E ( G [ S ]) | a i,j | .Let a min = min i,j | a i,j | and a max = max i,j | a i,j | denote the minimum and maximum absolute valuesin A respectively. Let us first consider the case where the ratio between these two values is at most 2, i.e. a max ≤ · a min . Observe that in this case the lower bound given in Lemma 6 can be easily beextended to m/ d . This is because the star packing S ∗ has magnitude at least m ( S ∗ ) ≥ n/ · a min , whileopt( G ) ≤ || A || ≤ dn · a max ≤ dn · a min . Lemma 7. m ( S ∗ ) ≥ || A || / d in case a max ≤ · a min . G into ℓ = ⌈ lg | a max |⌉ subsets E , . . . , E ℓ − , where E i contains all edges { u, v } with | a u,v | ∈ [2 i , i +1 − i ∈ { , . . . , ℓ − } ,let A i denote the submatrix of A corresponding to E i , and let G i = G [ E i ]. Then, as shown in the previoussubsection, we can compute in polynomial time a star packing S ∗ i with m ( S ∗ i ) ≥ || A i || / c . By the pigeonholeprinciple this gives us: Lemma 8. m ( S ∗ i ) ≥ || A || / d lg a max for some i ∈ { , . . . , ℓ − } . Now the crucial observation here is that, as G is triangle-free, each S ∗ i is also a star packing in G . Indeed,if S ∈ S ∗ i is a star on at least three vertices, then there can be no edges in G between degree 1 vertices of G i [ S ]. Thus, by Lemma 4, each S ∗ i corresponds to a solution x i with val x i ( G ) ≥ m ( S ∗ i ). Combining thiswith Lemma 8 above proves that the solution x i of maximum value is an Ω(1 / lg a max )-approximate solution. Towards proving Theorem 4 we show how to obtain a triangle-free subgraph of G , the total edge weightsof which are a constant fraction of || A || . For this we utilize the local-ratio technique [8] commonly used inapproximation algorithms [7].Recall that if G is a d -degenerate graph, then there exists an ordering v , . . . , v n of the vertices of G such |{{ v i , v j } ∈ E : i < j }| ≤ d for each i ∈ { , . . . , n } (and this ordering can be computed in linear time). Tosimplify notation, we assume the natural ordering on the vertices { , . . . , n } of G satisfies this property. Welet ~N i = { j : { i, j } ∈ E and i < j } . Furthermore, we let G i = G [ { v } ∪ ~N i ] for each i ∈ { , . . . , n } , and use E i to denote the edge set of G i .We present an algorithm which we call the triangle deletion algorithm . The algorithm recursively com-putes a triangle traversal set F ⊆ E , that is, an edge set F such G − F is triangle-free. We use w ( i, j ) todenote | a i,j | for each i, j ∈ { , . . . , n } , and we let W ( E ′ ) = P { i,j }∈ E ′ w ( i, j ) for any subset of edges E ′ ⊆ E .The algorithm starts with F = ∅ . TriangleDeletion ( G, w ):1. Let F ⊆ E be all edges { i, j } with w ( i, j ) = 0.2. Find the smallest i ∈ { , . . . , n } such that G i − F contains a triangle. – if no such i exists, then return F .3. Let ε be the minimum w ( i, j ) of any edge { i, j } ∈ E i .4. Set w ( i, j ) = ε if { i, j } is an edge in G i , and otherwise w ( i, j ) = 0.5. Let F = TriangleDeletion ( G, w − w ).6. If E i ⊆ F , then remove some edge { i, j } ∈ E i from F .7. Return F . Lemma 9.
The triangle deletion algorithm returns a set of edges F ⊆ E such that:1. G − F contains no triangle.2. w ( E \ F ) ≥ / ( d + d ) · w ( E ) .Proof. First observe that the algorithm is guaranteed to terminate, as in each recursive step at least one edgegets it weight decreased to zero. We prove the two properties in the lemma via induction on the recursivesteps of the algorithm. So the base case is the case in which no graph G i − F contains a triangle (see step2), so clearly, G − F is triangle-free. Furthermore, w ( F ) = 0 and so w ( E \ F ) = W ( E ), and the secondcondition is satisfied as well.Consider a recursive call in which the algorithm does not terminate at step 2, i.e. in which there existsa smallest i ∈ { , . . . , n } where G i − F contains a triangle. Let F be the set computed in step 5 of thealgorithm. Then by induction we know that G − F contains no triangle. Furthermore, by definition of i ,any triangle in G containing vertex i must be completely included in G i . Thus, if E i ⊆ F , removing someedge { i, j } ∈ F ∩ E i from F in step 6 does not add a triangle to G − F . It follows that the set F returnedat step 7 satisfies the first condition of the lemma. 7o see that it also satisfies the second condition, let w = w − w , where w is the weight functionconstructed at step 4 of the algorithm. By induction we have w ( E \ F ) ≥ / ( d + d ) · w ( E )after step 5 of the algorithm, and this also holds at step 7 since we do not add edges to F at step 6. Now,observe that by construction of w , we have w ( E ) = ε · | E i | ≤ ε · ( d + d ) /
2. Furthermore, at step 7 the set F contains at most one edge of E i , and so w ( F ) ≥ ε . Together this implies that w ( E \ F ) ≥ / ( d + d ) · w ( E ) . Thus, from the two inequalities above we get w ( E \ F ) = w ( E \ F ) + w ( E \ F ) ≥ / ( d + d ) · ( w ( E ) + w ( E )) = 2 / ( d + d ) · w ( E ) , and so F satisfies the second condition of the lemma as well. Proof of Theorem 4.
First observe that as G is d -degenerate we have m ≤ dn . Further, we may assumethat G has no isolated vertices since deleting them does not affect the degeneracy. Our algorithm obtainsa triangle-free subgraph G ′ of G using the triangle deletion algorithm above. Letting A ′ denote the matrixcorresponding to G ′ , we have || A ′ || ≥ / ( d + d ) || A || by Lemma 9. Next, our algorithm uses Lemma 8 toobtain a star packing S ∗ i of magnitude m ( S ∗ i ) ≥ || A ′ || / d lg a max ≥ || A || / ((6 d + 6 d ) lg a max ) . Finally, using Lemma 4, the algorithm computes a solution x with val x ( G ) ≥ m ( S ∗ ). As opt( G ) ≤ || A || , thissolution has an approximation ratio of 2 / ((6 d + 6 d ) lg a max ) = Ω(1 / lg a max ).As for the time complexity of our algorithm, observe that the triangle deletion algorithm runs in O ( n + m ) = O ( n ) time. The next step of the algorithm requires computing O (lg a max ) star packings, each taking O ( n . ) time to compute. Altogether this gives us a running time of O ( n . lg a max ). H -Minor Free Graphs In this section we present approximation algorithm for sparse
MaxQP instances that have some additionalstructure. Namely, we prove Theorems 5 and 6. Our algorithms all evolve around the Baker technique forplanar graphs [5] and its generalizations [16, 19, 23], all using what we refer to here as a treewidth partition — a partition of the vertices of G into V , . . . , V k − such that G − V i has bounded treewidth for any subset V i in the partition. As treewidth plays a central role here, we begin with formally defining this notion.A tree decomposition is a pair ( T , X ) where X is a family of vertex subsets of G , called bags , and T is atree with X as its node set. The decomposition is required to satisfy (i) { X ∈ X : v ∈ X } is connected in T for each v ∈ V , and (ii) for each { u, v } ∈ E there is a bag X ∈ X that contains both u and v . The width ofa tree decomposition ( T , X ) is max X ∈X | X | −
1, and the treewidth of G is the smallest width amongst all itstree decompositions. The proof of the following lemma is deferred to Appendix A. Lemma 10.
MaxQP restricted to graphs of treewidth at most k can be solved in O (2 k k · n ) time. Our starting point for the (1 − ε )-approximation for MaxQP on H -minor free graphs with H being an apexgraph (Theorem 6) is a layer decomposition L , . . . , L ℓ ⊆ V of G , where L = { v } for some arbitrary vertex v ∈ V , and L i = { u : d ( v, u ) = i } are all vertices at distance i from v , for each i ∈ { , . . . , ℓ } . This is thestandard starting point of all Baker type algorithms, and can be computed via breadth-first search from v
8n linear time. Note that L , . . . , L ℓ is a partition of V , and that for each i ∈ { , . . . , ℓ } , each vertex in L i has neighbors only in L i − ∪ L i ∪ L i +1 (here and elsewhere in this section we set L − = L ℓ +1 = ∅ whennecessary).Given 0 < ε ≤
1, we let k be the smallest integer such that 4 /k ≤ ε . For each i ∈ { , . . . , k − } , let L i denote the union of all vertices in layers with index equal to i (mod k ); that is, L i = S j ≡ i (mod k ) L j . Wedefine two subgraphs of G : The graph G i is the graph induced by V − L i , and the graph H i is the graphinduced by N [ L i ]. Note that there is some overlap between the vertices of G i and H i , but each edge of G appears in exactly one of these subgraphs. Also note that since there is an apex graph H that G does notcontain as a minor, G and each of its minors have bounded local treewidth [20]; thus both G i and H i arebounded treewidth graphs [23].Our algorithm computes k different solutions for G , and selects the best one ( i.e. the one which maxi-mizes (1)) as its solution. For i ∈ { , . . . , k − } , we first compute an optimal solution for G i in linear timeusing the algorithm given in Lemma 10. We then extend this solution to a solution x i for G as is done inLemma 3. In this way we obtain in linear time k solutions x , . . . , x k − with val x i ( G ) ≥ opt( G i ) for each i ∈ { , . . . , k − } . In Lemma 11 we argue that the solution of maximum objective value is (1 − ε )-approximateto the optimum of G ; the proof of Theorem 6 will then follow as a direct corollary. Lemma 11.
There is a solution x ∈ { x , . . . , x k − } with val x ( G ) ≥ (1 − ε ) · opt( G ) .Proof. Let x ∗ denote the optimal solution for G . Then, as the edge set of G is partitioned into the edges of G i and H i , we have opt( G i ) + opt( H i ) ≥ val x ∗ ( G i ) + val x ∗ ( H i ) = val x ∗ ( G ) = opt( G )for each i ∈ { , . . . , k − } . Next observe that any two subgraphs H i and H i with | i − i | ≥ G . It follows that for any j ∈ { , , , } , the graph S i ≡ j (mod 4) H i is an induced subgraph in G , and so opt( G ) ≥ P i ≡ j (mod 4) opt( H i )by Lemma 3. Thus, we have 4 · opt( G ) ≥ X j X i ≡ j (mod 4) opt( H i ) = X i opt( H i ) . Combining the two inequalities above we get k − X i =0 val x i ( G ) ≥ k − X i =0 opt( G i ) ≥ k · opt( G ) − k − X i =0 opt( H i ) ≥ k · opt( G ) − · opt( G ) = ( k − G ) . It follows that the best solution out of x , . . . , x k − has value at least (1 − /k ) · val( G ), which is at least(1 − ε ) · val( G ), since 4 /k ≤ ε . H -minor free instances To obtain the (1 − ε )-approximation for Unit MaxQP on H -minor free graphs for any fixed graph H (Theorem 5) we make use of the lower bound obtained in Section 3.1, our algorithm for MaxQP restrictedto bounded-treewidth graphs, and the following theorem by Demaine et al. [16]:
Theorem 7 ([16]) . For a fixed graph H , there is a constant c H such that, for any integer k ≥ and forevery H -minor free graph G , the vertices of G can be partitioned into k sets such that the graph obtained bytaking the union of any k − of these sets has treewidth at most c H · k . Furthermore, such a partition canbe found in polynomial time. Note that this theorem gives a similar partition to the one used in the previous subsection, albeit slightlyweaker. In particular, there is no restriction on the edges connecting vertices in different subsets of the9artition as was the case in the previous subsection. It is for this reason that arbitrary weights are difficultto handle, and we need to resort to the lower bound of Lemma 6. Fortunately, for the unweighted case, wecan use the fact that there exists some constant h depending only on H such that G has at most hn edges(see e.g. [17]). In particular, it can be shown that h = O ( n ′ √ lg n ′ ) [29], where n ′ is the number of verticesof H . Combining this fact with Lemma 6 we get: Lemma 12. opt( G ) ≥ m/ h . Our algorithm proceeds as follows. Fix k ≥ h/ε , and let V , . . . , V k − denote the partition of V computedby the algorithm from Theorem 7. For each i ∈ { , . . . , k − } , let E i denote the set of edges E ( V i , S j = i V j ),and let m i = | E i | . Furthermore, let G i = G − E i . As both G [ V i ] and G [ V \ V i ] have bounded treewidth, wecan compute an optimal solution for each of these subgraphs (and therefore also for G i ) using the algorithmin Lemma 10. Using Lemma 2, we can extend the optimal solutions for G [ V i ] and G [ V \ V i ] to a solution x i for G with value val x i ( G ) ≥ opt( G i ) . On the other hand, the optimal solution of G cannot do better thanopt( G i ) + m i ≥ opt( G ) . Combining the two inequalities above, we can bound the sum of the objective values obtained by all oursolutions by k − X i =0 val x i ( G ) ≥ k − X i =0 opt( G i ) ≥ k − X i =0 (opt( G ) − m i )= k − X i =0 opt( G ) − m ≥ ( k − h ) · opt( G ) , where the last inequality follows from Lemma 12. Thus at least one of these solutions has value at least( k − h ) /k · opt( G ), which is greater than (1 − ε )opt( G ) by our selection of parameter k .To analyze the time complexity of our algorithm, observe that computing each solution x i requires O ( n )time according to Lemma 10 and Lemma 2. Thus, the time complexity of the algorithm is dominatedby the time required to compute the partition promised by Theorem 7. Demaine et al. [16] showed thatthis partition can be computed in linear time given the graph decomposition promised by Robertson andSeymour’s graph minor theory [32]. In turn, Grohe et al. [24] presented an O ( n ) time algorithm for thisdecomposition, improving earlier constructions [16, 26]. Thus, the total running time of our algorithm canalso be bounded by O ( n ). This completes the proof of Theorem 5. We presented efficient combinatorial approximation algorithms for sparse instances of
MaxQP withoutresorting to the semidefinite relaxation, as done by Alon and Naor [2] and Charikar and Wirth [14]. Froma theoretical perspective, we still leave open whether there is a combinatorial algorithm approximating d -degenerate MaxQP instances in polynomial time. Further, is it possible to approximate sparse
UnitMaxQP instances up to a constant factor in linear time? Finally, the simplicity of our algorithms compelsthe study of their usability in practice, especially for characterizations of ground states of spin glass models.
References [1] Noga Alon, Konstantin Makarychev, Yury Makarychev, and Assaf Naor. Quadratic forms on graphs.In
Proceedings of the 37th Annual ACM Symposium on Theory Of Computing (STOC) , pages 486–493,2005. 2, 3 102] Noga Alon and Assaf Naor. Approximating the cut-norm via Grothendieck’s inequality.
SIAM Journalon Computing , 35(4):787–803, 2006. 2, 3, 10, 13[3] Sanjeev Arora, Eli Berger, Elad Hazan, Guy Kindler, and Muli Safra. On non-approximability forquadratic programs. In
Proceedings of the 46th Annual IEEE Symposium on Foundations Of ComputerScience (FOCS) , pages 206–215, 2005. 3[4] Sanjeev Arora, Elad Hazan, and Satyen Kale. Fast algorithms for approximate semidefinite programmingusing the multiplicative weights update method. In , pages 339–348, 2005. 3[5] Brenda S. Baker. Approximation algorithms for NP-complete problems on planar graphs.
Journal ofthe ACM , 41(1):153–180, 1994. 8[6] Nikhil Bansal, Avrim Blum, and Shuchi Chawla. Correlation clustering.
Machine Learning , 56(1-3):89–113, 2004. 2[7] Reuven Bar-Yehuda, Keren Bendel, Ari Freund, and Dror Rawitz. Local ratio: a unified framework forapproximation algorithms.
ACM Computing Surveys , 36(4):422–463, 2004. 7[8] Reuven Bar-Yehuda and Shimon Even. A local-ratio theorem for approximating the weighted vertexcover problem.
Annals of Discrete Mathematics , 25:27–45, 1985. 7[9] Francisco Barahona. On the computational complexity of ising spin glass models.
Journal of PhysicsA: Mathematical, Nuclear and General , 15:3241–3253, 1982. 2[10] Francisco Barahona. The max-cut problem on graphs not contractible to K . Operations ResearchLetters , 2(3):107–111, 1983. 2[11] Francisco Barahona, Roger Maynard, Rammal Rammal, and Jean-Pierre Uhry. Morphology of groundstates of two-dimensional frustration model.
Journal of Physics A: Mathematical, Nuclear and General ,15:673–699, 1982. 2[12] Isabelle Bieche, Roger Maynard, Rammal Rammal, and Jean-Pierre Uhry. On the ground states ofthe frustration model of a spin glass by a matching method of graph theory.
Journal of Physics A:Mathematical, Nuclear, and General , 13:2553–2576, 1980. 2[13] Moses Charikar, Venkatesan Guruswami, and Anthony Wirth. Clustering with qualitative information.
Journal of Computer and System Sciences , 71(3):360–383, 2005. 2[14] Moses Charikar and Anthony Wirth. Maximizing quadratic programs: extending Grothendieck’s in-equality. In
Proceedings of the 45th annual IEEE symposium on Foundations Of Computer Science(FOCS) , pages 54–60, 2004. 1, 2, 3, 4, 10[15] Erik D. Demaine, Dotan Emanuel, Amos Fiat, and Nicole Immorlica. Correlation clustering in generalweighted graphs.
Theoretical Computer Science , 361(2-3):172–187, 2006. 2[16] Erik D. Demaine, Mohammad Taghi Hajiaghayi, and Ken ichi Kawarabayashi. Algorithmic graphminor theory: Decomposition, approximation, and coloring. In
Proceedings of the 46th Annual IEEEsymposium on Foundations Of Computer Science (FOCS) , pages 637–646, 2005. 8, 9, 10[17] Reinhard Diestel.
Graph Theory , volume 173 of
Graduate Texts in Mathematics . Springer, 5th edition,2016. 4, 10[18] Carol S. Edwards. Some extremal properties of bipartite subgraphs.
Canadian Journal of Mathematics ,25(3):475485, 1973. 5 1119] David Eppstein. Subgraph isomorphism in planar graphs and related problems.
Journal of GraphAlgorithms & Applications , 3(3):1–27, 1999. 8[20] David Eppstein. Diameter and treewidth in minor-closed graph families.
Algorithmica , 27(3):275–291,2000. 4, 9[21] Paul Erd˝os, Andr´as Gy´arf´as, and Yoshiharu Kohayakawa. The size of the largest bipartite subgraphs.
Discrete Math , 177(1-3):267–271, 1997. 5[22] Michel X. Goemans and David P. Williamson. Improved approximation algorithms for maximum cutand satisfiability problems using semidefinite programming.
Journal of the ACM , 42(6):1115–1145,1995. 2[23] Martin Grohe. Local tree-width, excluded minors, and approximation algorithms.
Combinatorica ,23(4):613–632, 2003. 8, 9[24] Martin Grohe, Ken-ichi Kawarabayashi, and Bruce A. Reed. A simple algorithm for the graph minordecomposition—Logic meets structural graph theory. In
Proceedings of the 24th annual ACM-SIAMSymposium On Discrete Algorithms (SODA) , pages 414–431, 2013. 10[25] Johan H˚astad. Some optimal inapproximability results.
Journal of the ACM , 48(4):798–859, 2001. 3,13, 14[26] Ken ichi Kawarabayashi and Paul Wollan. A simpler algorithm and shorter proof for the graph minordecomposition. In
Proceedings of the 43rd ACM Symposium on Theory Of Computing (STOC) , pages451–458, 2011. 10[27] Subhash Khot and Ryan O’Donnell. SDP gaps and UGC-hardness for max-cut-gain.
Theory of Com-puting , 5(1):83–117, 2009. 2[28] Ton Kloks.
Treewidth, Computations and Approximations , volume 842 of
Lecture Notes in ComputerScience . Springer, 1994. 13[29] Alexandr V. Kostochka. Lower bound of the Hadwiger number of graphs by their average degree.
Combinatorica , 4(4):307–316, 1984. 10[30] Silvio Micali and Vijay V. Vazirani. An o ( v p | V || e | ) algorithm for finding maximum matching in generalgraphs. In Proceedings of the 21st Annual Symposium on Foundations Of Computer Science (FOCS) ,pages 17–27, 1980. 6[31] Yurii Nesterov. Global quadratic optimization via conic relaxation. CORE Discussion Papers 1998060,Universit catholique de Louvain, Center for Operations Research and Econometrics (CORE), 1998. 3[32] Neil Robertson and Paul D. Seymour. Graph minors. XVI. Excluding a non-planar graph.
Journal ofCombinatorial Theory, Series B , 89(1):43–76, 2003. 10[33] Chaitanya Swamy. Correlation clustering: maximizing agreements via semidefinite programming. In
Proceedings of the 15th Annual ACM-SIAM Symposium On Discrete Algorithms (SODA) , pages 526–527, 2004. 2[34] Michel Talagrand.
Spin Glasses: A Challenge for Mathematicians , volume 46 of
A Series of ModernSurveys in Mathematics . Springer, 1st edition, 2003. 212
An Exact Algorithm for Bounded Treewidth Graphs
We next prove Lemma 10 by presenting an algorithm for
MaxQP restricted to graphs of treewidth at most k running in O (2 k k · n ) time. For this we require the concept of nice tree decompositions [28].A tree decomposition ( T , X ) is rooted if there is a designated bag R ∈ X being the root of T . A rootedtree decomposition is nice if each bag X ∈ X is either (i) a leaf node ( X contains exactly one vertex andhas no children in T ), (ii) an introduce node ( X has one child Y in T with Y ⊂ X and | X \ Y | = 1), (iii)a forget node ( X has one child in Y in T with X ⊂ Y and | Y \ X | = 1), or (iv) a join node ( X has twochildren Y, Z in T with X = Y = Z ). Given a tree decomposition, one can compute a corresponding nicetree decomposition with the same width in linear time [28].Our algorithm employs the standard dynamic programming technique on nice tree decompositions. Proof of Lemma 10.
Let ( T , X ) be a nice tree decomposition of G of width k with root bag R . For anode X ∈ X let T X be the subtree of T rooted at X . Furthermore, let G X be the subgraph of G inducedby the vertices in the bags of T X (while G [ X ] is the subgraph of G induced only by the vertices in X ). Wedescribe a table in which we have an entry D [ X, x ] for each bag X ∈ X and for each solution x : X → {− , } .The entry D [ X, x ] contains the value of an optimum solution for G X , where the values of the vertices in X are fixed by the solution x .If X is a leaf node , then G X contains no edges and so D [ X, x ] = 0. If X is an introduce node , thenlet v ∈ X \ Y be the introduced vertex, where Y is the child of X in T , and let x \ x v be the solution x restricted to the vertices of Y . Then D [ X, x ] additionally contains the value of all edges incident to v , thatis, D [ X, x ] = D [ Y, x \ x v ] + X u ∈ N ( v ) x u x v a u,v . If X is a forget node , then let v ∈ Y \ X be the forgotten vertex, where Y is the child of X in T . Then,every value except for x v is set in x , so we must choose it so that the value is maximized. Then D [ X, x ] = max x v : v →{− , } D [ Y, x ∪ x v ] . Finally, if X is a join node , then let Y and Z be the children of X in T . Note that D [ Y, x ] + D [ Z, x ] containsthe value of G [ X ] twice, so D [ X, x ] = D [ Y, x ] + D [ Z, x ] − val x ( G [ X ]) . The tree decomposition contains O ( n ) nodes, and for each node there are at most O (2 k ) solutions; thuswe need to compute O (2 k · n ) entries D [ · , · ], each of which can be computed in O ( k ) time. The optimumvalue is the maximum over all O (2 k ) solutions for the root bag R . So we can compute opt( A ) in O (2 k k · n )time. B A Hardness Result
Alon and Naor [2] show that
MaxQP restricted to bipartite graphs is not approximable in polynomial timewith a ratio of 16 /
17 + ε unless P=NP. Using the same idea, we show that this approximation lower boundalso holds for Unit MaxQP on 2-degenerate bipartite graphs.
Theorem 8. If Unit MaxQP on -degenerate bipartite graphs admits a polynomial-time (16 /
17 + ε ) -approximation, then P = NP.Proof.
We reduce from unweighted
MaxCut which does not admit a (16 /
17 + ε )-approximation unlessP=NP [25]. Given an undirected unweighted graph G = ( V, E ), we compute a graph G ′ = ( V ∪ V ′ , E ′ )by subdividing each edge in E , that is, for every edge { u, w } ∈ E we add a vertex v to V ′ and theedges { u, v } , { v, w } to E ′ . One edge has weight 1 while the other edge has weight −
1. Clearly, G ′ isbipartite; V ′ is one bipartition. As all vertices in V ′ have degree two, G ′ is 2-degenerate as well.13et x be a solution for G ′ . Observe that for every vertex v ∈ V ′ we can assume that at least one of itsincident edges contributes positively to val x ( G ′ ); otherwise multiply x v by −
1. Furthermore, note that thecut in G corresponding to x (restricted to V ) is of size val x ( G ′ ) /
2: If both edges incident to v contributepositively to val x ( G ′ ), then the edge in G subdivided by v is cut. Otherwise, the two edges contribute 0to val x ( G ′ ), and the corresponding edge in G is not cut.It follows that if there is a (16 /
17 + ε )-approximation for MaxQP , then there is one for