MMAX-CUT VIA KURAMOTO-TYPE OSCILLATORS
STEFAN STEINERBERGER
Abstract.
We consider the
Max-Cut problem. Let G = ( V, E ) be a graphwith adjacency matrix ( a ij ) ni,j =1 . Burer, Monteiro & Zhang proposed to find,for n angles { θ , θ , . . . , θ n } ⊂ [0 , π ], minima of the energy f ( θ , . . . , θ n ) = n (cid:88) i,j =1 a ij cos ( θ i − θ j )because configurations achieving a global minimum leads to a partition of size0 . · Max-Cut ( G ). This approach is known to be computationally viable andleads to very good results in practice. We prove that by replacing cos ( θ i − θ j )with an explicit function g ε ( θ i − θ j ) global minima of this new functional leadto a (1 − ε ) · Max-Cut ( G ). This suggests some interesting algorithms thatperform well. It also shows that the problem of finding approximate globalminima of energy functionals of this type is NP-hard in general. Introduction
Max-Cut.
We consider a classical problem: given a graph G = ( V, E ), whatis the best decomposition of its vertices into two sets such that the number of edgesbetween the two sets is maximal?
Max-Cut is known to be NP-hard.
Figure 1.
Decomposing Vertices of a Graph into two sets so thatmany edges run between them.It is easy to see that by picking the subsets uniformly at random, we will get, inexpectation, a partition such that at least | E | / . · Max-Cut ( G ) edges runbetween them. In a seminal paper, Goemans & Williamson [11] constructed a0 . − approximation algorithm for Max-Cut , where the constant is given by0 . · · · = 2 π min ≤ θ ≤ π θ − cos ( θ ) . Key words and phrases.
MaxCut, Maximum Cut, Randomized Rounding, Oscillator.S.S. is supported by the NSF (DMS-1763179) and the Alfred P. Sloan Foundation. a r X i v : . [ m a t h . O C ] F e b The algorithm uses semi-definite programming and randomized rounding (random-ized rounding is explained in greater detail below). If the Unique Games Conjectureis true, this is the best possible approximation ratio for
Max-Cut [13] that canbe computed in polynomial time. It is known unconditionally that approximating
Max-Cut by any factor better than 16 / ∼ .
941 is also NP-hard [1, 12, 21].1.2.
Kuramoto Oscillators.
Kuramoto Oscillators refers to a broad class of prob-lems where we are given n particles θ , . . . , θ n ∈ S ∼ = [0 , π ]. These particles, whichoften depend on time, are assumed to be coupled in some nontrivial way (we referto the surveys [8, 9]). A particularly nice setting is to define the energy f ( θ , . . . , θ n ) = n (cid:88) i,j =1 a ij cos ( θ i − θ j ) , where a ij ∈ { , } is the entry of an adjacency matrix of a graph whose structuremodels the dependency between the particles. The particles are then assumed tomove along the gradient of the energy, the overarching question is whether theunderlying graph structure forces some type of universal behavior on the particles.The energy landscape of this particular energy, for example, is quite intricate.Taylor [19] proved that if each vertex is connected to at least µ ( n −
1) vertices for µ ≥ . f ( θ , . . . , θ n ) does not have local maxima that are not also global.This was then improved by Ling, Xu & Bandeira [16] to µ ≥ . µ ≥ . µ c = 0 .
75 – they also identify networks with µ = 0 . The Approach of Burer, Monteiro & Zhang.
Burer, Monteiro & Zhang[6] proposed a particular rank-two relaxation of the Goemans-Williamson approach[11]. We recall that Goemans-Williamson suggested to relax2 · | E | − · MaxCut ( G ) = min x i ∈{− , } n (cid:88) i,j =1 a ij x i x j by replacing the x i ∈ {− , } with unit vectors v i ∈ R n and x i x j with (cid:104) v i , v j (cid:105) .This is clearly a more general problem but one that is amenable to being solvedwith SDP methods in polynomial time. In the last step, they perform a randomizedrounding step and prove that this leads to a 0 . · Max-Cut approximation. Burer,Monteiro & Zhang [6] suggest that it might be possible to bypass the SDP step byarguing directly on the relaxed problem in R . Parametrizing unit vectors in R byan angle θ ∈ S , we see that (cid:10) v θ i , v θ j (cid:11) = cos ( θ i − θ j )and this leads to the notion of energy f : ( S ) n ∼ = [0 , π ] n → R f ( θ , . . . , θ n ) = n (cid:88) i,j =1 a ij cos ( θ i − θ j ) . Burer, Monteiro & Zhang propose to minimize this energy instead and then usethe same randomized rounding step as in the Goemans-Williamson approach. Thesuccess of this particular relaxation will depend on two competing factors. • Upside.
There is no longer any need for solving a semi-definite program(which becomes computationally expensive when n is large), one simplyhas to find a configuration of particles { θ , . . . , θ n } ⊂ [0 , π ] for which theenergy is small. Moreover, there is no need to find the global minimum(but: the smaller the energy, the better the configuration). • Downside.
Without the SDP, there is no particular insight into how onewould start looking for a good configuration { θ , . . . , θ n } ⊂ [0 , π ]. Gra-dient descent methods are at the mercy of the energy landscape, it mightpotentially be hard to find a configuration for which the energy is small.In practice, the (hypothetical) downside does not seem to cause any difficulties, theBurer, Monteiro & Zhang (BUR02) approach is known to work very well. Indeed, inan extensive 2018 comparison, Dunning, Gupta & Silberholz [10] compared 37 dif-ferent heuristics over 3296 problem instances concluding: “The best overall heuristicon the expanded instance library with respect to the performance of its mean so-lution across the five replicates for each instance was max-cut heuristic BUR02,which not only matched the best performance on 22.9% of instances but also hadstrictly better performance than any other heuristic on 16.2% of instances and amean deviation of only 0.3%.” It is not entirely understood why this relaxationworks so well and this is being actively studied, we refer to Boumal, Voroninski,Bandeira [2, 3], Ling [15] and Ling, Xu & Bandeira [16].2. The Result
Main Idea.
Our main idea is the following: the cosine arises naturally whenconsidering the inner product between two vectors since (cid:10) v θ i , v θ j (cid:11) = cos ( θ i − θ j ) . However, since we are not actually using any type of SDP approach, we do notreally have to use the cosine. Maybe there are other functions g : S → [ − ,
1] thatare as good or possibly even better? The only constraint is that the randomizedrounding step, when applied to a minimal energy configuration, should work well.We will consider more general notions of a Kuramoto-type energy of the form f ( θ , . . . , θ n ) = n (cid:88) i,j =1 a ij · g ( θ i − θ j ) , where g : S ∼ = [0 , π ] → R is assumed to(1) be differentiable everywhere,(2) be symmetric in the sense of g ( x ) = g ( − x ) and(3) to assume its maximum in g (0) = 1 and its minimum in g ( π ) = − v , v that are connected by an edge ( v , v ) ∈ E are moved to antipodal pointson the circle. If the underlying graph is bipartite, this will indeed be the uniqueminimal energy configuration. For more general graphs, this is not so simple andone would expect a minimal energy configuration to depend on the graph. We wantthat minimal energy configurations are well-behaved with respect to randomizedrounding. For any given set of angles { θ , . . . , θ n } ⊂ S , the randomized roundingprocedure results in an assignment of points into two sets as follows (see Fig. 2):pick a random line going through the origin which splits the sets into the two groupsinduced by two half-spaces and use those sets as a partition. θ θ θ θ θ θ θ Figure 2.
Randomized Rounding: for given { θ , . . . , θ n } ⊂ S ,we can pick a random line through the origin and the partition thevertices of the Graph according to the two half-spaces.We can analyze the expected behavior of randomized rounding completely in termsof the energy f ( θ , . . . , θ n ). For the minimal energy configuration of Kuramoto-typeenergies of this flavor, we obtain the following approximation result. Theorem.
Let g : S → R be an admissible function and let { θ , θ , . . . , θ n } ⊂ S bea minimal energy configuration of the associated energy. Then the expected numberof edges for a randomized rounding partition satisfies E edges ≥ (cid:18) min ≤ x ≤ π π x − g ( x ) (cid:19) · Max-Cut ( G ) . If g ( x ) = cos ( x ), we recover the classical 0 . · Max-Cut ( G ) result. However, formore general g ( x ), the constant can be arbitrarily close to 1. We also show thatthe size of the energy functional has immediate implications for the quality of therandomized rounding step by proving the inequality E edges ≥ (cid:20)(cid:18) min ≤ x ≤ π π x − g ( x ) (cid:19)(cid:21) · (cid:18) | E | − f ( θ , . . . , θ n ) (cid:19) . Thus, as in BUR02 [6], we do not necessarily need to find a global minimum, itsuffices to find configurations with small energy (and the smaller, the better).01 − ππ g ( x ) = (cid:16) cos ( x ) + cos (3 x )9 (cid:17) Figure 3.
Minima of this energy give 0 . · Max-Cut . Our result is especially interesting when ε > ε < /
17 any (1 − ε ) − approximation of Max-Cut is necessarilyNP-hard. In fact, if the unique games conjecture [13] is true, then the Goemans-Williamson approximation ratio of 0 . · Max-Cut is the best that one can do inpolynomial time. This has an interesting consequence for the energy landscape ofthe energy f ( θ , . . . , θ n ) for functions g for which the constant is > /
17: it mustthen, in general, be NP-hard to find a configuration { θ , . . . , θ n } ⊂ S with energyclose to the global minimum. We believe this to be an interesting statement abouta large class of Kuramoto-type energy functionals.2.2. Related results.
We are not aware of any results of this type. Most closelyrelated in spirit is perhaps the idea of using oscillators to solve problems of this type[7, 18, 23]. Wang & Roychowdhury [22], for example, consider systems of coupledself-sustaining nonlinear oscillators. The main idea is that these are governed by aLyapunov function that is closely related to the Ising Hamiltonian of the couplinggraph which allows for approximations to
Max-Cut .2.3.
Examples.
We start with a completely explicit example and take an Erd˝os-Renyi random graph G (500 , . | V | = 500 vertices and | E | = 1549edges. Several runs of Goemans-Williamson (GW) show Max-Cut ( G ) ≥ f ( θ , . . . , θ n ) = n (cid:88) i,j =1 a ij cos ( θ i − θ j )using a random initialization for the angles θ , . . . , θ n and standard gradient descent.The result is shown in Fig. 4. We see that the points seem to be distributed allover the circle and we get somewhat nice uniform control: the arising cut is nevertoo small and for certain angles clearly improves on the GW method. angle11851155 Figure 4.
The distribution of points and the size of the cut ob-tained as a function of the angle of random line.The question is now whether this can be improved by picking a function differentfrom the cosine. There is a theoretical criterion (coming from Theorem 1) on howthis function should look like in the sense thatmin ≤ x ≤ π π x − g ( x ) should be close to 1 . There are many such functions – a better understanding of which function g ( x ) tochoose would be interesting (see § g ( x ) = 99225117469 (cid:18) cos ( x ) + cos (3 x )9 + cos (5 x )25 + cos (7 x )49 + cos (9 x )81 (cid:19) which comes from the Fourier series (normalized to g (0) = 1 = − g ( π )) of (see § − π · d S (0 , x ) = (cid:40) − xπ if 0 ≤ x ≤ π − π − x ) π if π ≤ x ≤ π, where d S ( · , · ) is the shortest distance on S (always less than π ). In particular,min ≤ x ≤ π π x − g ( x ) = 0 . . · Max-Cut approxi-mation. We run gradient descent (using the previously obtained final configurationof angles from the BUR02 method as initial set) and arrive at a nice result: thebest cut has an additional 15 edges and all the cuts are uniformly closer to themaximum. Moreover, there is an additional ‘crystallization’ of the points, hard tosee in the picture, which are more structured (see § Figure 5.
The distribution of points and the size of the cut ob-tained as a function of the angle of random line.This example appears to be quite typical for Erd˝os-Renyi random graphs. Typicallyboth the quality of the largest cut as well as the expected size of a random cutincreases (our proof suggests why the expected size would increase).Graph | V | | E | GW BUR02 Our MethodMesner Graph M
77 616 400 420 420Livingstone Graph 266 1463 955 981 991Berlekamp-Van Lint-Seidel 243 2673 1572 1590 1606Cameron Graph 231 3465 1870 1884 1896
Table 1.
Lower bounds on
Max-Cut obtained by three methods.
One could wonder whether these are artifacts coming from the randomness ofthe Erd˝os-Renyi graphs. We decided to compare performance on some structuredgraphs for which we were unable to find the value of
Max-Cut in the literature.Several runs of each method leads to the bounds on
Max-Cut in Table 1.2.4.
The Crystallization Phenomenon.
Minimal energy configurations of ourfunctional tend to be somewhat structured. We start with an example. The 600-cell is the finite regular four-dimensional polytope composed of 600 tetraheda. Itsskeleton G = ( V, E ) has | V | = 120 vertices and | E | = 720 edges. Max-Cut (G)seems to be unknown. The Goemans-Williamson algorithm run over many instancesyields
Max-Cut ( G ) ≥ Max-Cut ( G ) ≥ Max-Cut ( G ) ≥ Figure 6.
Final configuration of BUR02 (left) and our method(right) when applied to the skeleton graph of the 600-cell. Bothconfigurations show
Max-Cut ( G ) ≥ g that we use, we have E edges ≥ . · Max-Cut ( G )for the minimal energy configuration (though, of course, we cannot be sure ofhaving found a minimal energy configuration). This has a very powerful implicationbecause it means the for virtually every line , the induced partition is necessarily very close to Max-Cut . Whenever the arising distribution of points is not simplyconcentrated at two antipodal points, then the final configuration has to showdifferent ways how a partition of vertices with a number of edges close to
Max-Cut can be achieved. We believe that this explains the arising crystallization thatwe observe. It also indicates that this such minima should actually induce a ratherinteresting ordering of the vertices of the Graph in terms of groups that have stronginteractions with antipodal groups. One would assume that the highly structuredpicture in Fig. 6 somehow reflects the underlying structure of the Graph.2.5.
Which g should one use? One important question is the choice of thefunction g . Our main result suggests that we should pick g ( x ) ∼ − π d S (0 , x ) so that global minima correspond to a good approximation of Max-Cut . However,this is counter-balanced by optimization concerns – the global minimum havinggood properties will not be of any use to us if we cannot get close to it. At thispoint, we have no good theoretical reason to choose any particular g ( x ) and webelieve this to be an interesting problem. Question.
What are good choices for g ? Which properties of g lead to the functional having ‘nice’ energy landscapes?We found that g ( x ) being close to 1 − (2 /π ) · d S ( x,
0) is indeed beneficial forthe quality of the solution but also makes optimization harder. Smooth functionstend to be easier to optimize, hence our choice to use a truncated Fourier seriesapproximation of 1 − (2 /π ) · d S ( x, g and then change the choice of g after a while. Trigonometric Polynomials.
We mention one particular reason that might speakin favor if using trigonometric polynomials (and is completely unconnected to anyconsiderations about the energy landscape). Fix { θ , . . . , θ n } ⊂ S . We can pick anarbitrary θ i , keep the remaining angles fixed and ask ourselves how the function n (cid:88) i,j =1 a ij · g ( θ i − θ j ) behaves as a function of θ i . If g is a trigonometric polynomial of degree d , then this sum is, as a function of θ i ,also a trigonometric polynomial of degree d because trigonometric polynomials ofdegree d are an invariant subspace under translation. This means that this function,as a function θ i , is globally quite simple and we can find its global minimum. Thisis particularly striking in the case of BUR02: the function h ( θ i ) = n (cid:88) j =1 j (cid:54) = i a ij · cos( θ i − θ j ) + n (cid:88) j =1 j (cid:54) = i a ji · cos( θ j − θ i )is a function of the form h ( θ i ) = A · cos ( θ i − B ) , where A and B depend on all the other variables. However, such a function is veryeasy to minimize globally: set θ i = B + 3 π/ π ). This persists when passingfrom the cosine to trigonometric polynomials of degree d (which is a 2 d − dimensionalvector space with rather nicely behaved functions in it that always have a lot ofstructure and are easier to minimize than generic functions). This allows for non-local optimization schemes along the following lines: pick a variable θ i , freeze allthe other varables, compute where one would place θ i to minimize the energy andmove it there. One would expect that the effectiveness of such a scheme dependson the function g ( x ) which brings us back to the question raised above.3. Proof
Proof.
We will now prove the Theorem. The argument is identical to the classi-cal randomized rounding argument except that we are working with an arbitraryfunction g and track its dependence. Suppose the Max-Cut solution is given by thesplitting V = A ∪ B . Then we can set all the vertices in A to have angle θ a = 0 andall the angles in B to have θ b = π and compute the energy of this configuration. There are
Max-Cut ( G ) edges getting weight − | E | − Max-Cut ( G ) edgesgetting weight 1. Every edge is counted twice, thereforemin θ ,...,θ n f ( θ , . . . , θ n ) ≤ · | E | − · Max-Cut ( G ) . Suppose conversely that we have a configuration with small energy given by theconfiguration of angles { θ , θ , . . . , θ n } ⊂ [0 , π ]. The likelihood of two specificvertices i, j ∈ V ending up in different partitions is given by the likelihood of θ i and θ j being cut by a hyperplane. That quantity has a simple expression given by P ( θ i , θ j in different halfspaces) = | θ i − θ j | S π , where |·| denotes the shortest distance on S (and is thus always less than π ). Usinglinearity of expectation, we can compute the expected number of edges across arandomly chosen line E edges = 12 n (cid:88) i,j =1 a ij · P ( θ i , θ j in different halfspaces)= 12 n (cid:88) i,j =1 a ij · | θ i − θ j | S π . At this point, we use that the distance function satisfies, tautologically, | θ i − θ j | S π ≥ (cid:18) min ≤ x ≤ π π x − g ( x ) (cid:19) · − g ( θ i − θ j )2 , we have 12 n (cid:88) i,j =1 a ij | θ i − θ j | π ≥ (cid:18) min ≤ x ≤ π π x − g ( x ) (cid:19) n (cid:88) i,j =1 a ij − g ( θ i − θ j )4 . This sum simplifies to n (cid:88) i,j =1 a ij − g ( θ i − θ j )4 = | E | − n (cid:88) i,j =1 a ij g ( θ i − θ j )= | E | − f ( θ , . . . , θ n ) . Therefore, we have E edges ≥ (cid:18) min ≤ x ≤ π π x − g ( x ) (cid:19) | E | − n (cid:88) i,j =1 a ij cos ( θ i − θ j ) . The remaining question is simply how small we can make this Kuramoto-typeenergy: by the argument above, we havemin θ ,...,θ n f ( θ , . . . , θ n ) ≤ · | E | − · Max-Cut ( G ) . Thus, if { θ , . . . , θ n } is a minimal energy configuration of the Kuramoto energy, E edges ≥ (cid:18) min ≤ x ≤ π π x − g ( x ) (cid:19) · Max-Cut ( G )which completes the argument. (cid:3) References [1] M. Bellare, O. Goldreich, and M. Sudan, Free bits, PCPs and nonapproximability – towardstight results, SIAM J. Comput., 27 (1998), pp. 804 – 915.[2] N. Boumal, V. Voroninski, and A. Bandeira, The non-convex Burer-Monteiro approach workson smooth semidefinite programs, NIPS’16: Proceedings of the 30th International Conferenceon Neural Information Processing Systems 2016, p. 2765–2773[3] N. Boumal, V. Voroninski, and A. Bandeira, Deterministic Guarantees for Burer-MonteiroFactorizations of Smooth Semidefinite Programs, Comm. Pure Appl. Math, to appear.[4] S. Burer and R.D.C. Monteiro. A nonlinear programming algorithm for solving semidefiniteprograms via low-rank factorization. Mathematical Programming, 95(2003): p. 329–357.[5] S. Burer and R.D.C. Monteiro. Local minima and convergence in low-rank semidefinite pro-gramming. Mathematical Programming, 103 (2005): p. 427–444.[6] S. Burer, R.D.C. Monteiro, and Y. Zhang. Rank-two relaxation heuristics for Max-Cut andother binary quadratic programs. SIAM Journal on Optimization, 12 (2002): p. 503–521.[7] J. Chou, B. Suraj, G. Siddhartha, and W. Herzog. Analog coupled oscillator based weightedIsing machine. Scientific reports 9, no. 1 (2019): p. 1–10.[8] F. Dorfler and F. Bullo. Synchronization in complex networks of phase oscillators: A survey.Automatica, 50 (6):1539–1564, 2014.[9] F. Dorfler, M. Chertkov, and F. Bullo. Synchronization in complex oscillator networks andsmart grids. Proceedings of the National Academy of Sciences, 110 (6):2005–2010, 2013.[10] I. Dunning, S. Gupta, J. Silberholz, What works best when? A systematic evaluation ofheuristics for Max-Cut and QUBO, INFORMS Journal on Computing 30 (2018): p. 608–624.[11] M. Goemans and D. Williamson, Improved approximation algorithms for maximum cut andsatisfiability problems using semidefinite programming, Journal of the ACM, 42 (1995): p.1115–1145.[12] J. Hastad, Some optimal inapproximability results, in Proceedings of the 29th ACM Sympo-sium on Theory of Computing, El Paso, TX, 1997, pp. 1–10.[13] S. Khot, G. Kindler, E. Mossel and R. O’Donnell, Optimal inapproximability results forMAX-CUT and other 2-variable CSPs?, SIAM Journal on Comp., 37 (2007): p. 319–357.[14] Y. Kuramoto. Self-entrainment of a population of coupled non-linear oscillators. In Interna-tional Symposium on Mathematical Problems in Theoretical Physics (175), p. 420–422.[15] S. Ling, Solving Orthogonal Group Synchronization via Convex and Low-Rank Optimization:Tightness and Landscape Analysis, arXiv:2006.00902[16] S. Ling, R. Xu, and A. S. Bandeira. On the landscape of synchronization networks: A per-spective from nonconvex optimization. SIAM J. Optim., 29, 1879–1907.[17] J. Lu and S. Steinerberger, Synchronization of Kuramoto oscillators in dense networks, Non-linearity 33 (2020), 5905[18] A. Mallick, M. Bashar, D. Truesdell, B. Calhoun, S. Joshi, N. Shukla, Using synchronizedoscillators to compute the maximum independent set. Nature Comm. 11 (2020), p.1–7.[19] R. Taylor. There is no non-zero stable fixed point for dense networks in the homogeneousKuramoto model. J. Phys. A: Math. Theor., 45:055102, 2012[20] A. Townsend, M. Stillman, and S. H. Strogatz. Circulant networks of identical Kuramotooscillators: Seeking dense networks that do not globally synchronize and sparse ones that do.preprint. arXiv:1906.10627[21] L. Trevisan, G. Sorkin, M. Sudan, D. Williamson, Gadgets, Approximation, and LinearProgramming, Proceedings of the 37th IEEE Symposium on Foundations of Computer Science(2000): p. 617–626.[22] T. Wang and J. Roychowdhury, OIM: Oscillator-based Ising Machines for Solving Combi-natorial Optimisation Problems, UCNC 2019: Unconventional Computation and NaturalComputation, p 232–256.[23] T. Wang, L. Wu and J. Roychowdhury, New computational results and hardware prototypesfor oscillator-based Ising machines. In Proceedings of the 56th Annual Design AutomationConference 2019 (pp. 1-2).
Department of Mathematics, University of Washington, Seattle
Email address ::