[PDF] ILP-based Local Search for Graph Partitioning

Abstract

Computing high-quality graph partitions is a challenging problem with numerous applications. In this paper, we present a novel meta-heuristic for the balanced graph partitioning problem. Our approach is based on integer linear programs that solve the partitioning problem to optimality. However, since those programs typically do not scale to large inputs, we adapt them to heuristically improve a given partition. We do so by defining a much smaller model that allows us to use symmetry breaking and other techniques that make the approach scalable. For example, in Walshaw's well-known benchmark tables we are able to improve roughly half of all entries when the number of blocks is high.

Full PDF

IILP-based Local Search for Graph Partitioning

Alexandra Henzinger , Alexander Noe , and Christian Schulz [email protected] [email protected] [email protected] Abstract

Computing high-quality graph partitions is a challenging problem with numerous applications.In this paper, we present a novel meta-heuristic for the balanced graph partitioning problem. Ourapproach is based on integer linear programs that solve the partitioning problem to optimality.However, since those programs typically do not scale to large inputs, we adapt them to heurist-ically improve a given partition. We do so by deﬁning a much smaller model that allows us touse symmetry breaking and other techniques that make the approach scalable. For example, inWalshaw’s well-known benchmark tables we are able to improve roughly half of all entries whenthe number of blocks is high.

G.2.2 Graph Theory

Keywords and phrases

Graph Partitioning, Integer Linear Programming

Balanced graph partitioning is an important problem in computer science and engineeringwith an abundant amount of application domains, such as VLSI circuit design, data miningand distributed systems [37]. It is well known that this problem is NP-complete [8] andthat no approximation algorithm with a constant ratio factor exists for general graphsunless P=NP [8]. Still, there is a large amount of literature on methods (with worst-caseexponential time) that solve the graph partitioning problem to optimality. This includesmethods dedicated to the bipartitioning case [3, 4, 12, 13, 14, 15, 23, 21, 29, 38] and somemethods that solve the general graph partitioning problem [16, 39]. Most of these methodsrely on the branch-and-bound framework [27]. However, these methods can typically solveonly very small problems as their running time grows exponentially, or if they can solvelarge bipartitioning instances using a moderate amount of time [12, 13], the running timehighly depends on the bisection width of the graph. Methods that solve the general graphpartitioning problem [16, 39] have huge running times for graphs with up to a few hundredvertices. Thus in practice mostly heuristic algorithms are used.Typically the graph partitioning problem asks for a partition of a graph into k blocks ofabout equal size such that there are few edges between them. Here, we focus on the casewhen the bounds on the size are very strict, including the case of perfect balance when themaximal block size has to equal the average block size.Our focus in this paper is on solution quality, i.e. minimize the number of edges that runbetween blocks. During the past two decades there have been numerous researchers trying toimprove the best graph partitions in Walshaw’s well-known partitioning benchmark [40, 41]. a r X i v : . [ c s . D S ] F e b X:2 ILP-based Local Search for Graph Partitioning

Overall there have been more than forty diﬀerent approaches that participated in thisbenchmark. Indeed, high solution quality is of major importance in applications such asVLSI Design [1, 2] where even minor improvements in the objective can have a large impacton the production costs and quality of a chip. High-quality solutions are also favorablein applications where the graph needs to be partitioned only once and then the partitionis used over and over again, implying that the running time of the graph partitioningalgorithms is of a minor concern [11, 18, 26, 28, 31, 30]. Thirdly, high-quality solutionsare even important in areas in which the running time overhead is paramount [40], such asﬁnite element computations [36] or the direct solution of sparse linear systems [20]. Here,high-quality graph partitions can be useful for benchmarking purposes, i.e. measuring howmuch more running time can be saved by higher quality solutions.In order to compute high-quality solutions, state-of-the-art local search algorithmsexchange vertices between blocks of the partition trying to decrease the cut size whilealso maintaining balance. This highly restricts the set of possible improvements. Recently, weintroduced new techniques that relax the balance constraint for vertex movements but globallymaintain balance by combining multiple local searches [35]. This was done by reducing thiscombination problem to ﬁnding negative cycles in a graph. In this paper, we extend theneighborhood of the combination problem by employing integer linear programming. Thisenables us to ﬁnd even more complex combinations and hence to further improve solutions.More precisely, our approach is based on integer linear programs that solve the partitioningproblem to optimality. However, out of the box those programs typically do not scale tolarge inputs, in particular because the graph partitioning problem has a very large amountof symmetry – given a partition of the graph, each permutation of the block IDs gives asolution having the same objective and balance. Hence, we adapt the integer linear programto improve a given input partition. We do so by deﬁning a much smaller graph, called model ,and solve the graph partitioning problem on the model to optimality by the integer linearprogram. More speciﬁcally, we select vertices close to the cut of the given input partition forpotential movement and contract all remaining vertices of a block into a single vertex. Afeasible partition of this model corresponds to a partition of the input graph having the samebalance and objective. Moreover, this model enables us to use symmetry breaking, whichallows us to scale to much larger inputs. To make the approach even faster, we combine itwith initial bounds on the objective provided by the input partition, as well as providing theinput partition to the integer linear program solver. Overall, we arrive at a system that isable to improve more than half of all entries in Walshaw’s benchmark when the number ofblocks is high.The rest of the paper is organized as follows. We begin in Section 2 by introducingbasic concepts. After presenting some related work in Section 3 we outline the integer linearprogram as well as our novel local search algorithm in Section 4. Here, we start by explainingthe very basic idea that allows us to ﬁnd combinations of simple vertex movements. Wethen explain our strategies to improve the running time of the solver and strategies toselect vertices for movement. A summary of extensive experiments done to evaluate theperformance of our algorithms is presented in Section 5. Finally, we conclude in Section 6.

Let G = ( V = { , . . . , n − } , E ) be an undirected graph. We consider positive, real-valuededge and vertex weight functions ω resp. c and extend them to sets, i.e., ω ( E ) := P x ∈ E ω ( x ) . Henzinger, A. Noe and C. Schulz XX:3 and c ( V ) := P x ∈ V c ( x ). Let N ( v ) := { u : { v, u } ∈ E } denote the neighbors of v . Thedegree of a vertex v is d ( v ) := | N ( v ) | . A vertex is a boundary vertex if it is incident to atleast one vertex in a diﬀerent block. We are looking for disjoint blocks of vertices V ,. . . , V k that partition V ; i.e., V ∪ · · · ∪ V k = V . The balancing constraint demands that each blockhas weight c ( V i ) ≤ (1 + (cid:15) ) d c ( V ) k e =: L max for some imbalance parameter (cid:15) . We call a block V i overloaded if its weight exceeds L max . The objective of the problem is to minimize the total cut ω ( E ∩ S i

We now explain our algorithm that combines integer linear programming and local search.We start by explaining the integer linear program that can solve the graph partitioningproblem to optimality. However, out-of-the-box this program does not scale to large inputs,in particular because the graph partitioning problem has a very large amount of symmetry.Thus, we reduce the size of the graph by ﬁrst computing a partition using an existingheuristic and based on it collapsing parts of the graph. Roughly speaking, we compute asmall graph, called model , in which we only keep a small amount of selected vertices for

X:4 ILP-based Local Search for Graph Partitioning potential movement and perform graph contractions on the remaining ones. A partition ofthe model corresponds to a partition of the input network having the same objective andbalance. The computed model is then solved to optimality using the integer linear program.As we will see this process enables us to use symmetry breaking in the linear program, whichin turn drastically speeds up computation times.

We now introduce a generalization of an integer linear program formulation for balancedbipartitioning [7] to the general graph partitioning problem. First, we introduce binarydecision variables for all edges and vertices of the graph. More precisely, for each edge e = { u, v } ∈ E , we introduce the variable e uv ∈ { , } which is one if e is a cut edge and zerootherwise. Moreover, for each v ∈ V and block k , we introduce the variable x v,k ∈ { , } which is one if v is in block k and zero otherwise. Hence, we have a total of | E | + k | V | variables. We use the following constraints to ensure that the result is a valid k -partition: ∀{ u, v } ∈ E, ∀ k : e uv ≥ x u,k − x v,k (1) ∀{ u, v } ∈ E, ∀ k : e uv ≥ x v,k − x u,k (2) ∀ k : X v ∈ V x v,k c ( v ) ≤ L max (3) ∀ v ∈ V : X k x v,k = 1 (4)The ﬁrst two constraints ensure that e uv is set to one if the vertices u and v are indiﬀerent blocks. For an edge { u, v } ∈ E and a block k , the right-hand side in this equation isone if one of the vertices u and v is in block k and the other one is not. If both vertices arein the same block then the right-hand side is zero for all values of k . Hence, the variable caneither be zero or one in this case. However, since the variable participates in the objectivefunction and the problem is a minimization problem, it will be zero in an optimum solution.The third constraint ensures that the balance constraint is satisﬁed for each partition. Andﬁnally, the last constraint ensures that each vertex is assigned to exactly one block. To sumup, our program has 2 k | E | + k + | V | constraints and k · (6 | E | + 2 | V | ) non-zeros. Since wewant to minimize the weight of cut edges, the objective function of our program is written as:min X { u,v }∈ E e uv · ω ( { u, v } ) (5) The graph partitioning problem has a large amount of symmetry – each permutation of theblock IDs gives a solution with equal objective and balance. Hence, the integer linear programdescribed above will scan many branches that contain essentially the same solutions so thatthe program does not scale to large instances. Moreover, it is not immediately clear how toimprove the scalability of the program by using symmetry breaking or other techniques.Our goal in this section is to develop a local search algorithm using the integer linearprogram above. Given a partition as input to be improved, our main idea is to contractvertices “that are far away” from the cut of the partition. In other words, we want tokeep vertices close to the cut and contract all remaining vertices into one vertex for eachblock of the input partition. This ensures that a partition of the contracted graph yields apartition of the input graph with the same objective and balance. Hence, we apply the integer . Henzinger, A. Noe and C. Schulz XX:5

A B CD A B CD K Figure 1

From left to right: a graph that is partitioned into four blocks, the set K close tothe boundary that will stay in the model, and lastly the model in which the sets V i \ K havebeen contracted. linear program to the model and solve the partitioning problem on it to optimality. Note,however, that due to the performed contractions this does not imply an optimal solutionon the input graph.We now outline the details of the algorithm. Our local algorithm has two inputs, agraph G and a partition V , . . . , V k of its vertices. For now assume that we have a set ofvertices K ⊂ V which we want to keep in the coarse model, i.e. a set of vertices which wedo not want to contract. We outline in Section 4.4 which strategies we have to select thevertices K . For the purpose of contraction we deﬁne k sets V i := V i \ K . We obtain ourcoarse model by contracting each of these vertex sets. The contraction of a vertex set V i works as follows: the set of vertices is contracted into a single vertex µ i . The weight of µ i isset to the sum of the weight of all vertices in the set that is contracted. There is an edgebetween two vertices µ i and v in the contracted graph if there is an edge between a vertex ofthe set and v in the original graph G . The weight of an edge ( µ i , v ) is set to the sum of theweight of edges that run between the vertices of the set and v . After all contractions havebeen performed the coarse model contains k + |K| vertices, and potentially much less edgesthan the input graph. Figure 1 gives an abstract example of our model.There are two things that are important to see: ﬁrst, due to the way we performcontraction, the given partition of the input network yields a partition of our coarse modelthat has the same objective and balance simply by putting µ i into block i and keeping theblock of the input for the vertices in K . Moreover, if we compute a new partition of ourcoarse model, we can build a partition in the original graph with the same properties byputting the vertices V i into the block of their coarse representative µ i together with thevertices of K that are in this block. Hence, we can solve the integer linear program on thecoarse model to compute a partition for the input graph. After the solver terminates, i.e.found an optimum solution of our mode or has reached a predeﬁned time limit T , we transferthe best solution to the original graph. Note that the latter is possible since an integer linearprogram solver typically computes intermediate solutions that may not be optimal. Independent of the vertices K that are selected to be kept in the coarse model, the approachabove allows us to deﬁne optimizations to solve our integer linear program faster. We applyfour strategies: (i) symmetry breaking, (ii) providing a start solution to the solver, (iii) addthe objective of the input as a constraint as well as (iv) using the parallel solving facilities ofthe underlying solver. We outline the ﬁrst three strategies in greater detail: Symmetry Breaking.

If the set K is small, then the solver will ﬁnd a solution much faster.Ideally, our algorithms selects the vertices K such that c ( µ i ) + c ( µ j ) > L max . In other words, X:6 ILP-based Local Search for Graph Partitioning no two contracted vertices can be clustered in one block. We can use this to break symmetryin our integer linear programming by adding constraints that ﬁx the block of µ i to block i , i.e. we set x µ i ,i = 1 and x µ i ,j = 0 for i = j . Moreover, for those vertices we can removethe constraint which ensures that the vertex is assigned to a single unique block—since weassigned those vertices to a block using the new additional constraints. Providing a Start Solution to the Solver.

The integer linear program performs a signiﬁcantamount of work in branches which correspond to solutions that are worse than the inputpartitioning. Only very few - if any - solutions are better than the given partition. However,we already know a fairly good partition (the given partition from the input) and give thispartition to the solver by setting according initial values for all variables. This ensures thatthe integer linear program solver can omit many branches and hence speeds up the timeneeded to solve the integer linear program.

Solution Quality as a Constraint.

Since we are only interested in improved partitions, wecan add an additional constraint that disallows solutions which have a worse objective thanthe input partition. Indeed, the objective function of the linear program is linear, and hencethe additional constraint is also linear. Depending on the objective value, this reduces thenumber of branches that the linear program solver needs to look at. However, note that thiscomes at the cost of an additional constraint that needs to be evaluated.

The algorithm above works for diﬀerent vertex sets K that should be kept in the coarsemodel. There is an obvious trade-oﬀ: on the one hand, the set K should not be too large,otherwise the coarse model would be large and hence the linear programming solver needs alarge amount of time to ﬁnd a solution. On the other hand, the set should also not be toosmall, since this restricts the amount of possible vertex movements, and hence the approachis unlikely to ﬁnd an improved solution. We now explain diﬀerent strategies to select thevertex set K . In any case, while we add vertices to the set K , we compute the number ofnon-zeros in the corresponding ILP. We stop to add vertices when the number of non-zerosin the corresponding ILP is larger than a parameter N . Vertices Close to Input Cut.

The intuition of the ﬁrst strategy,

Boundary , is that changesor improvements of the partition will occur reasonable close to the input partition. In thissimple strategy our algorithm tries to use all boundary vertices as the set K . In order to adhereto the constraint on the number of non-zeros in the ILP, we add the vertices of the boundaryuniformly at random and stop if the number of non-zeros N is reached. If the algorithmmanaged to add all boundary vertices whilst not exceeding the speciﬁed number of non-zeros,we do the following extension: we perform a breadth-ﬁrst search that is initialized with arandom permutation of the boundary vertices. All additional vertices that are reached bythe BFS are added to K . As soon as the number of non-zeros N is reached, the algorithm stops. Start at Promising Vertices.

Especially for high values of k the boundary contains manyvertices. The Boundary strategy quickly adds a lot of random vertices while ignoring verticesthat have high gain. However, note that even in good partitions it is possible that verticeswith positive gain exist but cannot be moved due to the balance constraint. . Henzinger, A. Noe and C. Schulz XX:7

Hence, our second strategy,

Gain ρ , tries to ﬁx this issue by starting a breadth-ﬁrst searchinitialized with only high gain vertices. More precisely, we initialize the BFS with each vertexhaving gain ≥ ρ where ρ is a tuning parameter. Our last strategy, TopVertices δ , starts bysorting the boundary vertices by their gain. We break ties uniformly at random. Verticesare then traversed in decreasing order (highest gain vertices ﬁrst) and for each start vertex v our algorithm adds all vertices with distance ≤ δ to the model. The algorithm stops as soonas the number of non-zeros exceeds N .Early gain-based local search heuristics for the (cid:15) -balanced graph partitioning problemsearched for pairwise swaps with positive gain [17, 25]. More recent algorithms generalizedthis idea to also search for cycles or paths with positive total gain [35]. An importantadvantage of our new approach is that we solve the combination problem to optimality, i.e.our algorithm ﬁnds the best combination of vertex movements of the vertices in K w.r.t to theinput partition of the original graph. Therefore we can also ﬁnd more complex optimizationsthat cannot be reduced to positive gain cycles and paths. We implemented the algorithms using C ++-17 and compiled all codes using g++-7.2.0 with full optimization ( -O3 ). We use Gurobi 7.5.2 as an ILP solver and always use itsparallel version. We perform experiments on the Phase 2 Haswell nodes of the SuperMUCsupercomputer. The Phase 2 of SuperMUC consists of 3072 nodes, each with two HaswellXeon E5-2697 v3 processors. Each node has 28 cores at 2.6GHz, as well as 64GB of mainmemory and runs the SUSE Linux Enterprise Server (SLES) operating system. Unlessotherwise mentioned, our approach uses the shared-memory parallel variant of Gurobi usingall 28 cores of a single node of the machine. In general, we perform ﬁve repetitions perinstance and report the average running time as well as cut. Unless otherwise mentioned,we use a time limit for the integer linear program. When the time limit is passed, theinteger linear program solver outputs the best solution that has currently been discovered.This solution does not have to be optimal. Note that we do not perform experiments withMetis [24] and Scotch [32] in here, since previous papers, e.g. [33, 34], have already shownthat solution quality obtained is much worse than results achieved in the Walshaw benchmark.When averaging over multiple instances, we use the geometric mean in order to give everyinstance the same inﬂuence on the ﬁnal score . Performance Plots.

These plots relate the fastest running time to the running time of eachother ILP-based local search algorithm on a per-instance basis. For each algorithm, theseratios are sorted in increasing order. The plots show the ratio t best /t algorithm on the y-axisto highlight the instances in which each algorithm performs badly. For plots in which wemeasure solution quality, the y-axis shows the ratio cut best / cut algorithm . A point close tozero indicates that the running time/quality of the algorithm was considerably worse thanthe fastest/best algorithm on the same instance. A value of one therefore indicates thatthe corresponding algorithm was one of the fastest/best algorithms to compute the solution.Thus an algorithm is considered to outperform another algorithm if its corresponding ratiovalues are above those of the other algorithm. In order to include instances that hit the timelimit, we set the corresponding values below zero for ratio computations. X:8 ILP-based Local Search for Graph Partitioning

Table 1

Basic properties of the our benchmark instances.Graph n m

Graph n m

Walshaw Graphs (Set B) Walshaw Graphs (Set B)add20 2 395 7 462 wing 62 032 ≈ ≈ ≈ ≈ ≈ ≈ ≈ ≈ ≈ . ≈ . ≈ . ≈ ≈ . ≈ ≈ ≈ ≈ . ≈ .

0M boneS01 127 224 ≈ . ≈ ≈ . ≈ ≈ ≈ ≈ . ≈ ≈ . ≈ . Instances.

We perform experiments on two sets of instances. Set A is used to determine theperformance of the integer linear programming optimizations and to tune the algorithm. Weobtained these instances from the Florida Sparse Matrix collection [10] and the 10th DIMACSImplementation Challenge [5] to test our algorithm. Set B are all graphs from ChrisWalshaw’s graph partitioning benchmark archive [40, 41]. This archive is a collection ofinstances from ﬁnite-element applications, VLSI design and is one of the default benchmarkingsets for graph partitioning.Table 1 gives basic properties of the graphs from both benchmark sets. We ran theunoptimized integer linear program that solves the graph partitioning problem to optimalityfrom Section 4.1 on the ﬁve smallest instances from the Walshaw benchmark set. With atime limit of 30 minutes, the solver has only been able to compute a solution for two graphswith k = 2. For higher values of k the solver was unable to ﬁnd any solution in the timelimit. Even applying feasible optimizations does not increase the amount of ILPs solved.Hence, we omit further experiments in which we run an ILP solver on the full graph. We now evaluate the impact of the optimization strategies for the ILP that we presentedin Section 4.3. In this section, we use the variant of our local search algorithm in which K is obtained by starting depth-one breadth-ﬁrst search at the 25 highest gain vertices, andset the limit on the non-zeros in the ILP to N = ∞ . However, we expect the results interms of speedup to be similar for diﬀerent vertex selection strategies. To evaluate the ILPperformance, we run KaFFPa using the strong preconﬁguration on each of the graphs from . Henzinger, A. Noe and C. Schulz XX:9 t b e s t / t a l g o Non-zeros10 t a l g o [ s ] t b e s t / t a l g o BasicSymSSolBSSSConst= BasicSymBSSSConst< Basic

Figure 2

Left: performance plot for ﬁve variants of our algorithm:

Basic does not containany optimizations;

BasicSym enables symmetry breaking;

BasicSymSSol additionally gives theinput partitioning to the ILP solver. The two variants

BSSSConst= and

BSSSConst < are the sameas BasicSymSSol with additional constraints:

BSSSConst= has the additional constraint that theobjective has to be smaller or equal to the start solution,

BSSSConst < has the constraint that thesolution must be better than the start solution. Right: performance of the slowest (

Basic ) andfastest ILPs (

BasicSymSSol ) depending on the number of non-zeros in the ILP. set A using (cid:15) = 0 and k ∈ { , , , , , } and then use the computed partition as inputto each ILP (with the diﬀerent optimizations). As the optimizations do not change theobjective value achieved in the ILP, we only report running times of our diﬀerent approaches.We set the time limit of the ILP solver to 30 minutes.We use ﬁve variants of our algorithm: Basic does not contain any optimizations;

BasicSym enables symmetry breaking;

BasicSymSSol additionally gives the input partitioning to theILP solver. The two variants

BSSSConst= and

BSSSConst < are the same as BasicSymSSol with additional constraints:

BSSSConst= has the additional constraint that the objectivehas to be smaller or equal to the start solution,

BSSSConst < has the constraint that theobjective value of a solution must be better than the objective value of the start solution.Figure 3 summarises the results.In our experiments, the basic conﬁguration reaches the time limit in 95 out of the 300runs. Overall, enabling symmetry breaking drastically speeds up computations. On allof the instances which the Basic conﬁguration could solve within the time limit, eachother conﬁguration is faster than the

Basic conﬁguration. Symmetry breaking speeds upcomputations by a factor of 41 in the geometric mean on those instances. The largestobtained speedup on those instances was a factor of 5663 on the graph adaptive for k = 32.The conﬁguration solves all but the two instances (boneS01, k = 32) and (Dubcova3, k = 16)within the time limit. Additionally providing the start solution ( BasicSymSSol ) gives anaddition speedup of 22% on average. Over the

Basic conﬁguration, the average speedupis 50 with the largest speedup being 6495 and the smallest speedup being 47%. Thisconﬁguration can solve all instances within the time limit except the instance boneS01 for k = 32. Providing the objective function as a constraint (or strictly smaller constraint) doesnot further reduce the running time of the solver. Instead, the additional constraints evenincrease the running time. We adhere this to the fact that the solver has to do additionalwork to evaluate the constraint. We conclude that BasicSymSSol is the fastest conﬁgurationof the ILP. Hence, we use this conﬁguration in all the following experiments. Moreover, fromFigure 2 we can see that this conﬁguration can solve most of the instance within the timelimit if the number of non-zeros in the ILP is below 10 . Hence, we set the parameter N to 10 in the following section. X:10 ILP-based Local Search for Graph Partitioning

We now evaluate the vertex selection strategies to ﬁnd the set of vertices K that modelthe ILP. We look at all strategies described in Section 4.4, i.e. Boundary , Gain ρ with theparameter ρ ∈ {− , − , , } as well as TopVertices δ for δ ∈ { , , } . To evaluate thediﬀerent selection strategies, we use the best of ﬁve runs of KaFFPa strong on each of thegraphs from set A using (cid:15) = 0 and k ∈ { , , , , , } and then use the computed partitionas input to the ILP (with diﬀerent sets K ). Table 2 summarizes the results of the experiment,i.e. the number of cases in which our algorithm was able to improve the result, the averagerunning time in seconds for these selection strategies as well as the number of cases in whichthe strategy computed the best result (the partition having the lowest cut). We set the timelimit to 2 days to be able to ﬁnish almost all runs without running into timeout. For theaverage running time we exclude all graphs in which at least one algorithm did not ﬁnish in2 days (rgg_15 k = 16, delaunay_n15 k = 4, G2_circuit k = 4 , Boundary strategy is able to improve theinput for small values of k , but with increasing number of blocks k improvements decreaseto no improvement in all runs with k = 64. Because of the limit on the number of non-zeros,the ILP contains only random boundary vertices for large values of k in this case. Hence,there are not suﬃciently many high gain vertices in the model and fewer improvements forlarge values of k are expected. For small values of k ∈ { , } , the Boundary strategy can

Table 2

From top to bottom: Number of improvements found by diﬀerent vertex selection rulesrelative to the total number of instances, average running time of the strategy on the subset ofinstances (graph, k ) in which all strategies ﬁnished within the time limit, and the relative number ofinstances in which the strategy computed the lowest cut. Best values are highlighted in bold. Gain TopVertices Boundary k ρ = 0 ρ = − ρ = − δ = 1 δ = 2 δ = 3Relative Number of Improvements2

70% 70% 70%

60% 60% 60% 48%16 30% 50%

40% 30% 30% 40%32

60% 60%

46% 50% 50% 20% 20%64

70% 70%

50% 30% 20% 20% 0%Average Running Time2 189.943s 292.573s 357.145s

16 118.532s 52.547s 90.363s 53.385s 141.814s 243.957s

32 40.300s 24.607s 94.146s 27.156s 80.252s 116.023s

64 15.866s 21.908s 24.253s 14.627s 30.558s 44.813s

Relative Number Best Algorithm2 20%

50% 10% 10% 0%

10% 0% 0% 30%8 0% 20%

10% 10% 10% 26%16 0% 10%

10% 0% 10% 20%32 0% 8%

0% 0% 0% 4%64 0% 16%

0% 0% 0% 0% . Henzinger, A. Noe and C. Schulz XX:11 t b e s t / t a l g o c u t b e s t / c u t a l g o t b e s t / t a l g o BoundaryGain =0 Gain = 1

Gain = 2

TopVertices =1 TopVertices =2 TopVertices =3 Figure 3

Left: performance plot for all vertex selection strategies

Right: cut value of vertexselection strategies in comparison to the best result given by any strategy. improve as many as the

Gain ρ = − strategy but the average running times are higher.For k = { , , , } , the strategy Gain ρ = − has the highest number of improvements, for k = { , } it is surpassed by the strategy Gain ρ = − . However, the strategy Gain ρ = − ﬁndsthe best cuts in most cases among all tested strategies. Due to the way these strategies aredesigned, they are able to put a lot of high gain vertices into the model as well as verticesthat can be used to balance vertex movements. The TopVertices strategies are overall alsoable to ﬁnd a large number of improvements. However, the found improvements are typicallysmaller than the

Gain strategies. This is due to the fact that the

TopVertices strategiesgrow BFS balls with a predeﬁned depth around high gain vertices ﬁrst, and later on are notable to include vertices that could be used to balance their movement. Hence, there are lesspotential vertex movements that could yield an improvement.For almost all strategies, we can see that the average running time decreases as thenumber of blocks k increases. This happens because we limit the number of non-zeros N in our ILP. As the number of non-zeros grows linear with the underlying model size, themodels are far smaller for higher values of k . Using symmetry breaking, we already ﬁxed theblock of the k vertices µ i which represent the vertices not part of K . Thus the ILP solvercan quickly prune branches which would place vertices connected heavily to one of thesevertices in a diﬀerent block. Additionally, our data indicate that a large number of smallareas in our model results faster in solve times than when the model contains few large areas.The performance plot in Figure 3 shows that the strategies Boundary , TopVertices δ =1 and Gain ρ = − have lower running times than other strategies. These strategies all select a largenumber of vertices to initialize the breadth-ﬁrst search. Therefore they output a vertex set K that is the union of many small areas around these vertices. Variants that initialize thebreadth-ﬁrst search with fewer vertices have fewer areas, however each of the areas is larger. In this section, we present the results when running our best conﬁguration on all graphs fromWalshaw’s benchmark archive. Note that the rules of the benchmark imply that runningtime is not an issue, but algorithms should achieve the smallest possible cut value whilesatisfying the balance constraint. We run our algorithm in the following setting: We takeexisting partitions from the archive and use those as input to our algorithm. As indicatedby the experiments in Section 5.3, the vertex selection strategies

Gain ρ ∈{− , − } performbest for diﬀerent values of k . Thus we use the variant Gain ρ = − for k ≤

16 and both

Gain ρ = − and Gain ρ = − otherwise in this section. We repeat the experiment once for each X:12 ILP-based Local Search for Graph Partitioning

Table 3

Relative number of improved in-stances in the Walshaw Benchmark startingfrom current entries reported in the Walshawbenchmark. k / (cid:15)

0% 1% 3% 5%2 6% 12% 6% 6%4 18% 9% 6% 18%8 26% 24% 12% 15%16 50% 26% 29% 29%32 62% 47% 47% 53%64 68% 59% 71% 76%sum 38% 29% 28% 33% instance (graph, k ) and run our algorithm for k = { , , , , , } and (cid:15) ∈ { , , , } .For larger values of k ∈ { , } , we strengthenour strategy and use N = 5 · as a bound forthe number of non-zeros. Table 3 summarizesthe results and Table 7 in the Appendix givesdetailed per-instance results.When running our algorithm using the cur-rently best partitions provided in the benchmark,we are able to improve 38% of the currently re-ported perfectly balanced results. We are ableto improve a larger amount of results for largervalues of k , more speciﬁcally, out of the partitionswith k ≥

16, we can improve 60% of all perfectlybalanced partitions. This is due to the fact that the graph partitioning problem becomesmore diﬃcult for larger values of k . There is a wide range of improvements with the smallestimprovement being 0 . k = 32 and (cid:15) = 3% and with the largestimprovement that we found being 1 .

72% for fe_body for k = 32 and (cid:15) = 0%. The largestabsolute improvement we found is 117 for bcsstk32 with k = 64 and (cid:15) = 0%. In general,the total number of improvements becomes less if more imbalance is allowed. This is alsoexpected since traditional local search methods have a larger amount of freedom to movevertices. However, the number of improvements still shows that the method is also able toimprove a large number of partitions for large values of k even if more imbalance is allowed. We presented a novel meta-heuristic for the balanced graph partitioning problem. Ourapproach is based on an integer linear program that solves a model to combine unconstraintvertex movements into a global feasible improvement. Through a given input partition, wewere able to use symmetry breaking and other techniques that make the approach scale tolarge inputs. In Walshaw’s well known benchmark tables, we were able to improve a largeamount of partitions given in the benchmark.In the future, we plan to further improve our implementation and integrate it into theKaHIP framework. We would like to look at other objective functions as long as they canbe modelled linearly. Moreover, we want to investigate weather this kind of contractionscan be useful for other ILPs. It may be interesting to ﬁnd cores for contraction by usingthe information provided an evolutionary algorithm like KaFFPaE [34], i.e. if many of theindividuals of the population of the evolutionary algorithm agree that two vertices should beput together in a block then those should be contracted in our model. Lastly, besides usingother exact techniques like branch-and-bound to solve our combination model, it may alsobe worthwhile to use a heuristic algorithm instead.

Acknowledgements . Henzinger, A. Noe and C. Schulz XX:13

References C. J. Alpert and A. B. Kahng. Recent Directions in Netlist Partitioning: A Survey.

Integ-ration, the VLSI Journal , 19(1-2):1–81, 1995. C. J. Alpert, A. B. Kahng, and S. Z. Yao. Spectral Partitioning with Multiple Eigenvectors.

Discrete Applied Mathematics , 90(1):3–26, 1999. M. Armbruster.

Branch-and-Cut for a Semideﬁnite Relaxation of Large-Scale MinimumBisection Problems . PhD thesis, 2007. M. Armbruster, M. Fügenschuh, C. Helmberg, and A. Martin. A Comparative Study ofLinear and Semideﬁnite Branch-and-Cut Methods for Solving the Minimum Graph Bisec-tion Problem. In

Proc. of the 13th International Conference on Integer Programming andCombinatorial Optimization , volume 5035 of

LNCS , pages 112–124. Springer, 2008. D. A. Bader, H. Meyerhenke, P. Sanders, C. Schulz, A. Kappes, and D. Wagner. Bench-marking for Graph Clustering and Partitioning. In

Encyclopedia of Social Network Analysisand Mining , pages 73–82. Springer, 2014. C. Bichot and P. Siarry, editors.

Graph Partitioning . Wiley, 2011. R. Brillout. A Multi-Level Framework for Bisection Heuristics. 2009. T. N. Bui and C. Jones. Finding Good Approximate Vertex and Edge Partitions is NP-Hard.

Information Processing Letters , 42(3):153–159, 1992. A. Buluç, H. Meyerhenke, I. Safro, P. Sanders, and C. Schulz. Recent Advances in GraphPartitioning. In

Algorithm Engineering , pages 117–158. Springer, 2016. T. Davis. The University of Florida Sparse Matrix Collection. D. Delling, A. V. Goldberg, T. Pajor, and R. F. Werneck. Customizable Route Planning.In

Proc. of the 10th International Symposium on Experimental Algorithms , volume 6630 of

LCNS , pages 376–387. Springer, 2011. D. Delling, A. V. Goldberg, I. Razenshteyn, and R. F. Werneck. Exact CombinatorialBranch-and-Bound for Graph Bisection. In

Proc. of the 12th Workshop on Algorithm En-gineering and Experimentation (ALENEX’12) , pages 30–44, 2012. D. Delling and R. F. Werneck. Better Bounds for Graph Bisection. In

Proc. of the 20thEuropean Symposium on Algorithms , volume 7501 of

LNCS , pages 407–418, 2012. A. Feldmann and P. Widmayer. An O ( n ) Time Algorithm to Compute the BisectionWidth of Solid Grid Graphs. In Proc. of the 19th European Conference on Algorithms ,volume 6942 of

LNCS , pages 143–154. Springer, 2011. A. Felner. Finding Optimal Solutions to the Graph Partitioning Problem with HeuristicSearch.

Annals of Mathematics and Artiﬁcial Intelligence , 45:293–322, 2005. C. E. Ferreira, A. Martin, C. C. De Souza, R. Weismantel, and L. A. Wolsey. The NodeCapacitated Graph Partitioning Problem: A Computational Study.

Mathematical Pro-gramming , 81(2):229–256, 1998. C. M. Fiduccia and R. M. Mattheyses. A Linear-Time Heuristic for Improving Network

Partitions. In

Proc. of the 19th Conference on Design Automation , pages 175–181, 1982. J. Fietz, M. Krause, C. Schulz, P. Sanders, and V. Heuveline. Optimized Hybrid ParallelLattice Boltzmann Fluid Flow Simulations on Complex Geometries. In

Proc. of Euro-Par2012 Parallel Processing , volume 7484 of

LNCS , pages 818–829. Springer, 2012. P. Galinier, Z. Boujbel, and M. C. Fernandes. An Eﬃcient Memetic Algorithm for theGraph Partitioning Problem.

Annals of Operations Research , 191(1):1–22, 2011. A. George. Nested Dissection of a Regular Finite Element Mesh.

SIAM Journal on Nu-merical Analysis , 10(2):345–363, 1973. W. W. Hager, D. T. Phan, and H. Zhang. An Exact Algorithm for Graph Partitioning.

Mathematical Programming , 137(1-2):531–556, 2013. M. Hein and S. Setzer. Beyond Spectral Clustering - Tight Relaxations of Balanced GraphCuts. In

Advances in Neural Information Processing Systems , pages 2366–2374, 2011.

X:14 ILP-based Local Search for Graph Partitioning S. E. Karisch, F. Rendl, and J. Clausen. Solving Graph Bisection Problems with Semidef-inite Programming.

INFORMS Journal on Computing , 12(3):177–191, 2000. G. Karypis and V. Kumar. A Fast and High Quality Multilevel Scheme for PartitioningIrregular Graphs.

SIAM Journal on Scientiﬁc Computing , 20(1):359–392, 1998. B. W. Kernighan and S. Lin. An Eﬃcient Heuristic Procedure for Partitioning Graphs.

The Bell System Technical Journal , 49(1):291–307, 1970. T. Kieritz, D. Luxen, P. Sanders, and C. Vetter. Distributed Time-Dependent ContractionHierarchies. In

Proc. of the 9th International Symposium on Experimental Algorithms ,volume 6049 of

LNCS , pages 83–93. Springer, 2010. A. H. Land and A. G. Doig. An Automatic Method of Solving Discrete ProgrammingProblems.

Econometrica , 28(3):497–520, 1960. U. Lauther. An Extremely Fast, Exact Algorithm for Finding Shortest Paths in StaticNetworks with Geographical Background, 2004. A. Lisser and F. Rendl. Graph Partitioning using Linear and Semideﬁnite Programming.

Mathematical Programming , 95(1):91–101, 2003. doi:10.1007/s10107-002-0342-x . D. Luxen and D. Schieferdecker. Candidate Sets for Alternative Routes in Road Net-works. In

Proc. of the 11th International Symposium on Experimental Algorithms (SEA’12) ,volume 7276 of

LNCS , pages 260–270. Springer, 2012. R. H. Möhring, H. Schilling, B. Schütz, D. Wagner, and T. Willhalm. Partitioning Graphsto Speedup Dijkstra’s Algorithm.

Journal of Experimental Algorithmics (JEA) , 11(2006),2007. F. Pellegrini. Scotch Home Page. . P. Sanders and C. Schulz. Engineering Multilevel Graph Partitioning Algorithms. In

Proc.of the 19th European Symp. on Algorithms , volume 6942 of

LNCS , pages 469–480. Springer,2011. P. Sanders and C. Schulz. Distributed Evolutionary Graph Partitioning. In

Proc. of the12th Workshop on Algorithm Engineering and Experimentation (ALENEX’12) , pages 16–29,2012. P. Sanders and C. Schulz. Think Locally, Act Globally: Highly Balanced Graph Partition-ing. In

Proc. of the 12th Int. Symp. on Experimental Algorithms (SEA’13) , LNCS. Springer,2013. K. Schloegel, G. Karypis, and V. Kumar. Graph Partitioning for High Performance Sci-entiﬁc Simulations. In

The Sourcebook of Parallel Computing , pages 491–541, 2003. C. Schulz and D. Strash. Graph Partitioning Formulations and Applications to Big Data.In

Encyclopedia on Big Data Technologies , 2018, to appear. M. Sellmann, N. Sensen, and L. Timajev. Multicommodity Flow Approximation used forExact Graph Partitioning. In

Proc. of the 11th European Symposium on Algorithms , volume2832 of

LNCS , pages 752–764. Springer, 2003. N. Sensen. Lower Bounds and Exact Algorithms for the Graph Partitioning Problem Using

Multicommodity Flows. In

Proc. of the 9th European Symposium on Algorithms , volume2161 of

LNCS , pages 391–403. Springer, 2001. A. J. Soper, C. Walshaw, and M. Cross. A Combined Evolutionary Search and MultilevelOptimisation Approach to Graph-Partitioning.

Journal of Global Optimization , 29(2):225–241, 2004. C. Walshaw. Walshaw Partitioning Benchmark. http://staffweb.cms.gre.ac.uk/~wc06/partition/ . C. Walshaw and M. Cross. JOSTLE: Parallel Multilevel Graph-Partitioning Software – AnOverview. In

Mesh Partitioning Techniques and Domain Decomposition Techniques , pages27–58. 2007. . Henzinger, A. Noe and C. Schulz XX:15

A Additional Tables

Table 4

Improvement of existing partitions from the Walshaw benchmark with (cid:15) = 0% usingour ILP approach. In each k -column the results computed by our approach are on the left and thecurrent Walshaw cuts are on the right. Results achieved by Gain ρ = − are marked with ˆ and resultsachieved by Gain ρ = − are marked with *. Graph / k 2 4 8 16 32 64add20 596 596 1151 1151 1681 1681 2040 2040 *2360 ˆ2947 *ˆ246

247 408 408add32 11 11 34 34 67 67 118 118 213 213 485 485bcsstk33 10171 10171 21717 21717 34437 34437 54680 54680 77414 77414 107185 107185whitaker3 127 127 381 381 656 656 1085 1085 1668 1668 2491 2491crack 184 184 366 366 679 679 1088 1088 *1678 *8333 *ˆ15774 *ˆ31848 *39474 *46568 *34733 *ˆ933

934 1551 1551 ˆ2564 *5499 *9442 *ˆ11710 ˆ12893 *ˆ13947 ˆ16188 *2907 ˆ4025 *ˆ70407 *171148 *13280 *23857 *37143 *57354 *ˆ5574 ˆ8177 *ˆ20008 *ˆ36249 *60013 *90778 *1722 ˆ2797 *4728 ˆ812

813 1323 1323 *ˆ2074 ˆ3870 ˆ5592 ˆ7622 ˆ17382 *25805 *6888 *11414 *ˆ17352 *24879 *34234 ˆ12838 *20389 *31132 *45677 *15921 *25694 *38576 *ˆ56094 ˆ12667 ˆ20061 ˆ15194 *37566 *55467 *77391 *17193 *29188 *42639 *61100 ˆ83987 *13061 *25834 *42161 *65469 ˆ96446 *ˆ10101 *27092 *45991 ˆ77391 *121911 ˆ172966

X:16 ILP-based Local Search for Graph Partitioning

Table 5

Improvement of existing partitions from the Walshaw benchmark with (cid:15) = 1% usingour ILP approach. In each k -column the results computed by our approach are on the left and thecurrent Walshaw cuts are on the right. Results achieved by Gain ρ = − are marked with ˆ and resultsachieved by Gain ρ = − are marked with *. Graph / k 2 4 8 16 32 64add20 585 585 1147 1147 *ˆ1680 *11731 *ˆ15734 ˆ2470 *ˆ31710 *ˆ39396 *46529 *ˆ54950 *5452 ˆ13931 ˆ16091 ˆ3981 *ˆ13134 *23333 *37057 *57000 ˆ8128 *19612 *59501 *89893 ˆ2748 *ˆ4664 ˆ5576 ˆ7585 *17120 ˆ25604 *6843 *ˆ17264 *24799 ˆ34159 ˆ20146 *30975 *45304 *25620 ˆ38410 *55867 ˆ385

387 1813 1813 *4060 ˆ12523 *19851 *6476 *25225 *37341 *55258 *76964 *ˆ8656 ˆ16745 *28749 *42349 *60617 ˆ83451 *ˆ25626 *42067 *64684 ˆ96145 *26611 *45424 *76533 *120470 ˆ171866 . Henzinger, A. Noe and C. Schulz XX:17

Table 6

Improvement of existing partitions from the Walshaw benchmark with (cid:15) = 3% usingour ILP approach. In each k -column the results computed by our approach are on the left and thecurrent Walshaw cuts are on the right. Results achieved by Gain ρ = − are marked with ˆ and resultsachieved by Gain ρ = − are marked with *. Graph / k 2 4 8 16 32 64add20 560 560 1134 1134 1673 1673 2030 2030 2346 2346 2920 2920data 185 185 369 369 638 638 1088 1088 1768 1768 *2781 *105737 *2456 *ˆ2487 *11630 *ˆ15612 ˆ2431 *ˆ31440 *39197 *46231 *ˆ3503 *5522 *ˆ5352 *ˆ11584 *13887 *15950 *ˆ1423 *2884 ˆ3979 *168271 *22949 *36567 *56025 *ˆ34869 ˆ58739 *89478 *ˆ2708 *ˆ4522 *ˆ2034 ˆ3783 *11253 *ˆ16981 *ˆ25362 *17107 *24623 *33779 *ˆ7049 *19863 *30579 *44811 *ˆ25379 *38093 *55358 ˆ12283 *ˆ6430 *24901 *ˆ36999 *54800 *76548 ˆ16633 *60334 *82809 ˆ64354 *ˆ95575 *ˆ44724 *ˆ75665 ˆ119131 ˆ170295

X:18 ILP-based Local Search for Graph Partitioning

Table 7

Improvement of existing partitions from the Walshaw benchmark with (cid:15) = 5% usingour ILP approach. In each k -column the results computed by our approach are on the left and thecurrent Walshaw cuts are on the right. Results achieved by Gain ρ = − are marked with ˆ and resultsachieved by Gain ρ = − are marked with *. Graph / k 2 4 8 16 32 64add20 536 536 1120 1120 1657 1657 2027 2027 2341 2341 2920 2920data 181 181 363 363 628 628 1076 1076 1743 1743 2747 27473elt 87 87 197 197 329 329 557 557 930 930 1498 1498uk 18 18 39 39 75 75 137 137 236 236 394 394add32 10 10 33 33 63 63 117 117 212 212 476 476bcsstk33 9914 9914 20158 20158 33908 33908 54119 54119 ˆ76070 *105297 *ˆ2425 *ˆ2479 *11533 *ˆ15514 ˆ2406 ˆ31216 *ˆ38823 *45987 ˆ2478 ˆ5460 *ˆ5253 *9281 *ˆ11540 *13857 *15875 ˆ2042 *2855 *ˆ3959 *166787 *ˆ2660 *ˆ12823 *22718 *36354 *55250 ˆ5305 ˆ7956 *88595 *ˆ2677 ˆ4500 *1289 *ˆ2013 *1589 *ˆ5512 ˆ7529 *11052 *25100 *ˆ11147 *16983 ˆ24270 *33387 *12308 *19677 *30355 *44368 *7722 ˆ37632 *54677 ˆ12033 *ˆ19391 ˆ14978 *24174 *ˆ36608 *54160 *75753 *16528 *ˆ42024 *ˆ59608 *81989 *ˆ12858 *41097 *63397 *94123 *74266 *118998 ˆ169260ˆ169260