[PDF] Multilevel Hypergraph Partitioning with Vertex Weights Revisited

Abstract

The balanced hypergraph partitioning problem (HGP) is to partition the vertex set of a hypergraph into k disjoint blocks of bounded weight, while minimizing an objective function defined on the hyperedges. Whereas real-world applications often use vertex and edge weights to accurately model the underlying problem, the HGP research community commonly works with unweighted instances. In this paper, we argue that, in the presence of vertex weights, current balance constraint definitions either yield infeasible partitioning problems or allow unnecessarily large imbalances and propose a new definition that overcomes these problems. We show that state-of-the-art hypergraph partitioners often struggle considerably with weighted instances and tight balance constraints (even with our new balance definition). Thus, we present a recursive-bipartitioning technique that is able to reliably compute balanced (and hence feasible) solutions. The proposed method balances the partition by pre-assigning a small subset of the heaviest vertices to the two blocks of each bipartition (using an algorithm originally developed for the job scheduling problem) and optimizes the actual partitioning objective on the remaining vertices. We integrate our algorithm into the multilevel hypergraph partitioner KaHyPar and show that our approach is able to compute balanced partitions of high quality on a diverse set of benchmark instances.

Full PDF

MMultilevel Hypergraph Partitioning with VertexWeights Revisited

Tobias Heuer ! Karlsruhe Institute of Technology, Karlsruhe, Germany

Nikolai Maas ! Karlsruhe Institute of Technology, Karlsruhe, Germany

Sebastian Schlag ! Karlsruhe Institute of Technology, Karlsruhe, Germany

Abstract

The balanced hypergraph partitioning problem (HGP) is to partition the vertex set of a hypergraphinto k disjoint blocks of bounded weight, while minimizing an objective function defined on thehyperedges. Whereas real-world applications often use vertex and edge weights to accurately modelthe underlying problem, the HGP research community commonly works with unweighted instances.In this paper, we argue that, in the presence of vertex weights, current balance constraintdefinitions either yield infeasible partitioning problems or allow unnecessarily large imbalances andpropose a new definition that overcomes these problems. We show that state-of-the-art hypergraphpartitioners often struggle considerably with weighted instances and tight balance constraints (evenwith our new balance definition). Thus, we present a recursive-bipartitioning technique that is ableto reliably compute balanced (and hence feasible) solutions. The proposed method balances thepartition by pre-assigning a small subset of the heaviest vertices to the two blocks of each bipartition(using an algorithm originally developed for the job scheduling problem) and optimizes the actualpartitioning objective on the remaining vertices. We integrate our algorithm into the multilevelhypergraph partitioner KaHyPar and show that our approach is able to compute balanced partitionsof high quality on a diverse set of benchmark instances.

Mathematics of computing → Hypergraphs; Mathematics ofcomputing → Graph algorithms

Keywords and phrases multilevel hypergraph partitioning, balanced partitioning, vertex weights

Supplementary Material

Source Code: https://github.com/kahypar/kahypar

Benchmark Set & Experimental Results: http://algo2.iti.kit.edu/heuer/sea21/

Hypergraphs are a generalization of graphs where each hyperedge can connect more thantwo vertices. The k -way hypergraph partitioning problem (HGP) asks for a partition ofthe vertex set into k disjoint blocks, while minimizing an objective function defined on thehyperedges. Additionally, a balance constraint requires that the weight of each block issmaller than or equal to a predefined upper bound (most often L k := (1 + ε ) ⌈ c ( V ) k ⌉ for someparameter ε , where c ( V ) is the sum of all vertex weights). The hypergraph partitioningproblem is NP-hard [32] and it is even NP-hard to find good approximations [8]. The mostcommonly used heuristic to solve HGP in practice is the multilevel paradigm [1, 11, 29] whichconsists of three phases: First, the hypergraph is coarsened to obtain a hierarchy of smallerhypergraphs. After an initial partitioning algorithm is applied to the smallest hypergraph, coarsening is undone, and, at each level, refinement algorithms are used to improve thequality of the solution.The two most prominent application areas of HGP are very large scale integration (VLSI)design [3, 29] and parallel computation of the sparse matrix-vector product [11]. In the a r X i v : . [ c s . D S ] F e b Multilevel Hypergraph Partitioning with Vertex Weights Revisited former, HGP is used to divide a circuit into two or more blocks such that the number ofexternal wires interconnecting circuit elements in different blocks is minimized. In this setting,each vertex is associated with a weight equal to the area of the respective circuit element [2]and tightly-balanced partitions minimize the total area required by the physical circuit [18].In the latter, HGP is used to optimize the communication volume for parallel computationsof sparse matrix-vector products [11]. In the simplest hypergraph model, vertices correspondto rows and hyperedges to columns of the matrix (or vice versa) and a partition of thehypergraphs yields an assignment of matrix entries to processors [11]. The work of a processor(which can be measured in terms of the number of non-zero entries [7]) is integrated into themodel by assigning each vertex a weight equal to its degree [11]. Tightly-balanced partitionshence ensure that the work is distributed evenly among the processors.Despite the importance of weighted instances for real-world applications, the HGP re-search community mainly uses unweighted hypergraphs in experimental evaluations [38]. Themain rationale hereby being that even unweighted instances become weighted implicitly dueto vertex contractions during the coarsening phase. Many partitioners therefore incorporatetechniques that prevent the formation of heavy vertices [13, 24, 27] during coarsening tofacilitate finding a feasible solution during the initial partitioning phase [38]. However, inpractice, many weighted hypergraphs derived from real-world applications already containheavy vertices – rendering the mitigation strategies of today’s multilevel hypergraph par-titioners ineffective. The popular ISPD98 VLSI benchmark set [2], for example, includesinstances in which vertices can weigh up to 10% of the total weight of the hypergraph.

Contributions and Outline

After introducing basic notation in Section 2 and presentingrelated work in Section 3, we first formulate an alternative balance constraint definitionin Section 4 that overcomes some drawbacks of existing definitions in presence of vertexweights. In Section 5, we then present an algorithm that enables partitioners based on therecursive bipartitioning (RB) paradigm to reliably compute balanced partitions for weightedhypergraphs. Our approach is based on the observation that usually only a small subset ofthe heaviest vertices is critical to satisfy the balance constraint. We show that pre-assigningthese vertices to the two blocks of each bipartition (i.e., treating them as fixed vertices ) andoptimizing the actual objective function on the remaining vertices yields provable balanceguarantees for the resulting k -way partition. We implemented our algorithms in the opensource HGP framework KaHyPar [38]. The experimental evaluation presented in Section 6shows that our new approach (called

KaHyPar-BP ) is able to compute balanced partitionsfor all instances of a large real-world benchmark set (without increasing the running timeor decreasing the solution quality), while other partitioners such as the latest versions of

KaHyPar , hMetis , and PaToH produced imbalanced partitions on 4 .

9% up to 42% of theinstances for ε = 0 .

01 (4 .

3% up to 23 .

1% for ε = 0 . A weighted hypergraph H = ( V, E, c, ω ) is defined as a set of vertices V and a set ofhyperedges/nets E with vertex weights c : V → R > and net weights ω : E → R > , whereeach net e is a subset of the vertex set V (i.e., e ⊆ V ). We extend c and ω to sets in thenatural way, i.e., c ( U ) := P v ∈ U c ( v ) and ω ( F ) := P e ∈ F ω ( e ). Given a subset V ′ ⊆ V , the subhypergraph H V ′ is defined as H V ′ := ( V ′ , { e ∩ V ′ | e ∈ E : e ∩ V ′ ̸ = ∅} , c, ω ).A k -way partition of a hypergraph H is a partition of the vertex set V into k non-emptydisjoint subsets Π k = { V , . . . , V k } . We refer to a k -way partition Ψ k = { P , . . . , P k } of a . Heuer, N. Maas and S. Schlag 3 subset P ⊆ V as a k -way prepacking . We call a vertex v ∈ P a fixed vertex and a vertex v ∈ V \ P an ordinary vertex. During partitioning, fixed vertices are not allowed to be movedto a different block of the partition. A k -way partition Π k is ε -balanced if each block V i satisfies the balance constraint : c ( V i ) ≤ L k := (1 + ε ) ⌈ c ( V ) k ⌉ for some parameter ε . The k -wayhypergraph partitioning problem initialized with a k -way prepacking Ψ k = { P , . . . , P k } is tofind an ε -balanced k -way partition Π k = { V , . . . , V k } of a hypergraph H that minimizes anobjective function and satisfies that ∀ i ∈ { , . . . , k } : P i ⊆ V i . In this paper, we optimize the connectivity metric ( λ − P e ∈ E ( λ ( e ) − ω ( e ), where λ ( e ) := |{ V i ∈ Π | V i ∩ e ̸ = ∅}| .The most balanced partition problem is to find a k -way partition Π k of a weightedhypergraph H = ( V, E, c, ω ) such that max(Π k ) := max V ′ ∈ Π k c ( V ′ ) is minimized. Foran optimal solution Π OPT it holds that there exists no other k -way partition Π ′ k withmax(Π ′ k ) < max(Π OPT ). We use OPT(

H, k ) := max(Π

OPT ) to denote the weight of theheaviest block of an optimal solution. Note that the problem is equivalent to the mostcommon version of the job scheduling problem: Given a sequence J = ⟨ j , . . . , j n ⟩ of n computing jobs each associated with a processing time p i for i ∈ [1 , n ], the task is to find anassignment of the n jobs to k identical machines (each job j i runs exclusively on a machinefor exactly p i time units) such that the latest completion time of a job is minimized. In the following, we will focus on work closely related to our main contributions. For an extens-ive overview on hypergraph partitioning we refer the reader to existing literature [3, 5, 35, 38].Well-known multilevel HGP software packages with certain distinguishing characteristicsinclude

PaToH [4, 11] (originating from scientific computing), hMetis [29, 30] (originatingfrom VLSI design),

KaHyPar [26, 27] (general purpose, n -level), Moondrian [41] (sparse matrixpartitioning),

UMPa [14] (multi-objective) and

Zoltan [16] (distributed partitioner).

Partitioning with Vertex Weights.

The most widely used techniques to improve the qualityof a k -way partition are move-based local search heuristics [19, 31] that greedily move verticesaccording to a gain value (i.e., the improvement in the objective function). Vertex movesviolating the balance constraint are usually rejected, which can significantly deterioratesolution quality in presence of varying vertex weights [10]. This issue is addressed usingtechniques that allow intermediate balance violations [18] or use temporary relaxations ofthe balance constraint [9, 10]. Caldwell et al. [10] proposed to preassign each vertex witha weight greater than the average block weight L k to a seperate block before partitioning(treated as fixed vertices) and build the actual k -way partition around them. All of thesetechniques were developed and evaluated for flat (i.e., non-multilevel) partitioning algorithms.In the multilevel setting, even unweighted instances become implicitly weighted due tovertex contractions in the coarsening phase, which is why the formation of heavy vertices isprevented by penalizing the contraction of vertices with large weights [13, 24, 40] or enforcinga strict upper bound for vertex weights throughout the coarsening process [1, 27]. If theinput hypergraph is unweighted, the aforementioned techniques often suffice to find a feasiblesolution [38]. PaToH [12] additionally uses bin packing techniques during initial partitioning.

Job Scheduling Problem.

The job scheduling problem is NP-hard [20] and we refer thereader to existing literature [23, 36] for a comprehensive overview of the research topic. In thiswork, we make use of the longest processing time (LPT) algorithm proposed by Graham [22].We will explain the algorithm in the context of the most balanced partition problem defined

Multilevel Hypergraph Partitioning with Vertex Weights Revisited in Section 2: For a weighted hypergraph H = ( V, E, c, ω ), the algorithm iterates over thevertices of V sorted in decreasing vertex-weight order and assigns each vertex to the block ofthe k -way partition with the lowest weight. The algorithm can be implemented to run in O ( | V | log | V | ) time, and for a k -way partition Π k produced by the algorithm it holds thatmax(Π k ) ≤ ( − k )OPT( H, k ). KaHyPar.

The Ka rlsruhe Hy pergraph Par titioning framework takes the multilevel paradigmto its extreme by only contracting a single vertex in every level of the hierarchy.

KaHyPar provides recursive bipartitioning [37] as well as direct k -way partitioning algorithms [1] (direct k -way uses RB in the initial partitioning phase). It uses a community detection algorithm aspreprocessing step to restrict contractions to densely connected regions of the hypergraphduring coarsening [27]. Furthermore, it employs a portfolio of bipartitioning algorithms forinitial partitioning of the coarsest hypergraph [25, 37], and, during the refinement phase,improves the partition with a highly engineered variant of the classical FM local search [1]and a refinement technique based on network flows [21, 26].During RB-based partitioning, KaHyPar ensures that the solution is balanced by adaptingthe imbalance ratio for each bipartition individually. Let H V ′ be the subhypergraph of thecurrent bipartition that should be partitioned recursively into k ′ ≤ k blocks. Then, ε ′ := (cid:18) (1 + ε ) c ( V ) k · k ′ c ( V ′ ) (cid:19) ⌈ log2( k ′ ) ⌉ − H V ′ . The equation is based on the observationthat the worst-case block weight of the resulting k ′ -way partition of H V ′ obtained via RB issmaller than (1 + ε ′ ) ⌈ log ( k ′ ) ⌉ c ( V ′ ) k ′ , if ε ′ is used for all further bipartitions. Requiring thatthis weight must be smaller or equal to L k = (1 + ε ) ⌈ c ( V ) k ⌉ leads to the formula defined inEquation 1. A k -way partition of a weighted hypergraph H = ( V, E, c, ω ) is balanced, if the weight ofeach block is below some predefined upper bound. In the literature, the most commonly usedbounds are L k := (1 + ε ) ⌈ c ( V ) k ⌉ (standard definition) and L max k := L k + max v ∈ V c ( v ) [19, 38,39]. The latter was initially proposed by Fiduccia and Mattheyses [19] for bipartitioning toensure that the highest-gain vertex can always be moved to the opposite block.Both definitions exhibit shortcomings in the presence of heavy vertices: As soon as thehypergraph contains even a single vertex with c ( v ) > L k , no feasible solution exists when theblock weights are constrained by L k , while for L max k it follows that L max k > L k – allowinglarge variations in block weights even if ε is small. In the following, we therefore propose anew balance constraint that (i) guarantees the existence of an ε -balanced k -way partitionand (ii) avoids unnecessarily large imbalances.While the optimal solution of the most balanced partition problem would yield a partitionwith the best possible balance, it is not feasible in practice to use L OPT k := (1 + ε )OPT( H, k )as balance constraint, because finding such a k -way partition is NP-hard [20]. Hence, wepropose to use the bound provided by the LPT algorithm instead: L LPT k := (1 + ε ) LPT( H, k ) ≤ (cid:18) − k (cid:19) L OPT k . (2)Note that if the hypergraph is unweighted, the LPT algorithm will always find an optimalsolution with OPT( H, k ) = ⌈ | V | k ⌉ and thus, L LPT k is equal to L k . Since all of today’s . Heuer, N. Maas and S. Schlag 5 V V V V V V V V V V V V c ( V ) = 8 > L ∀ i ∈ [1 ,

4] : c ( V i ) = 6 ≤ L k = 4 and ε = 0 Figure 1

Illustration of a deeply (left, green line) and a non-deeply balanced bipartition (left,red line). The numbers in each circle denotes the vertex weights. In both cases, the hypergraph ispartitioned into k = 4 blocks with ε = 0 via recursive bipartitioning. Thus, the weight of heaviestblock must be smaller or equal to L = 6 and for the first bipartition, we use L = 12 as an upperbound. partitioning algorithms bound the maximum block weight by L k , Section 6 gives more detailson how we employ this new balance constraint definition in our experimental evaluation. Most multilevel hypergraph partitioners either employ recursive bipartitioning directly [11,16, 29, 37, 41] or use RB-based algorithms in the initial partitioning phase to compute aninitial k -way partition of the coarsest hypergraph [1, 4, 14, 30]. In both settings, a k -waypartition is derived by first computing a bipartition Π = { V , V } of the (input/coarse)hypergraph H and then recursing on the subhypergraphs H V and H V by partitioning V into ⌈ k ⌉ and V into ⌊ k ⌋ blocks. Although KaHyPar adaptively adjusts the allowed imbalanceat each bipartitioning step (using the imbalance factor ε ′ as defined in Equation 1), an unfortunate distribution of the vertices in some bipartitions Π can easily lead to instancesfor which it is impossible to find a balanced solution during the recursive calls – even thoughthe current bipartition Π satisfies the adjusted balance constraint. An example is shownin Figure 1 (left): Although the current bipartition (indicated by the red line) is perfectlybalanced, it will not be possible to recursively partition the subhypergraph induced by thevertices of V into two blocks of equal weight, because each of the three vertices has a weightof four.To capture this problem, we introduce the notion of deep balance : ▶ Definition 1. (Deep Balance) . Let H = ( V, E, c, ω ) be a weighted hypergraph for whichwe want to compute an ε -balanced k -way partition, and let H V ′ be a subhypergraph of H which should be partitioned into k ′ ≤ k blocks via recursive bipartitioning. A subhypergraph H V ′ is deeply balanced w.r.t. k ′ , if there exists a k ′ -way partition Π k ′ of H V ′ such that max(Π k ′ ) ≤ L k := (1 + ε ) ⌈ c ( V ) k ⌉ . A bipartition Π = { V , V } of H V ′ is deeply balanced w.r.t. k ′ , if the subhypergraphs H V and H V are deeply balanced with respect to ⌈ k ′ ⌉ resp. ⌊ k ′ ⌋ . If a subhypergraph H V ′ is deeply balanced with respect to k ′ , there always exists a k ′ -waypartition Π k ′ of H V ′ such that weight of the heaviest block satisfies the original balanceconstraint L k imposed on the partition of the input hypergraph H . Moreover, there alsoalways exists a deeply balanced bipartition Π := { V , V } ( V is the union of the first ⌈ k ′ ⌉ and V of the last ⌊ k ′ ⌋ blocks of Π k ′ ). Hence, a RB-based partitioning algorithm that is Multilevel Hypergraph Partitioning with Vertex Weights Revisited able to compute deeply balanced bipartitions on deeply balanced subhypergraphs will alwayscompute ε -balanced k -way partitions (assuming the input hypergraph is deeply balanced). Deep Balance and Adaptive Imbalance Adjustments.

Computing deeply balanced bipar-titions in the RB setting guarantees that the resulting k -way partition is ε -balanced. Thus,the concept of deep balance could replace the adaptive imbalance factor ε ′ employed in KaHyPar [37] (see Equation 1). However, as we will see in the following example, combiningboth approaches gives the partitioner more flexibility (in terms of feasible vertex moves duringrefinement). Assume that we want to compute a 4-way partition via recursive bipartitioningand that the first bipartition Π := { V , V } is deeply balanced with c ( V ) = (1 + ε ) ⌈ c ( V )2 ⌉ .The deep-balance property ensures that we can further partition V into two blocks suchthat the weight of the heavier block is smaller than L . However, this bipartition has to beperfectly balanced: L = (1 + ε ) l c ( V )2 m = (1 + ε ) l (1 + ε ) ⌈ c ( V )2 ⌉ m ≤ (1 + ε ) l c ( V )4 m = L ⇒ ε ≈ . (3)If we would have computed the first bipartition with an adjusted imbalance factor ε ′ ,then max(Π ) ≤ (1 + ε ′ ) ⌈ c ( V )2 ⌉ = √ ε ⌈ c ( V )2 ⌉ – providing more flexibility for subsequentbipartitions. In the following, we therefore focus on computing deeply ε ′ -balanced bipartitions. Deep Balance and Multilevel Recursive Bipartitioning.

In general, computing a deeplybalanced bipartition Π := { V , V } w.r.t. k is NP-hard, as we must show that there exists a k -way partition Π k of H with max(Π k ) ≤ L k , which can be reduced to the most balancedpartition problem presented in Section 2. However, we can first compute a k -way partitionΠ k := { V ′ , . . . , V ′ k } using the LPT algorithm, thereby approximating an optimal solution.If max(Π k ) ≤ L k , we can then construct a deeply balanced bipartition Π = { V , V } bychoosing V := V ′ ∪ . . . ∪ V ′⌈ k ⌉ and V := V ′⌈ k ⌉ +1 ∪ . . . ∪ V ′ k . Unfortunately, this approachcompletely ignores the optimization of the objective function – yielding balanced partitionsof low quality. If such a bipartition were to be used as initial solution in the multilevelsetting, the objective could still be optimized during the refinement phase. However, thiswould necessitate that refinement algorithms are aware of the concept of deep balance andthat they only perform vertex moves that don’t destroy the deep-balance property of thestarting solution. Since this is infeasible in practice, we propose a different approach thatinvolves fixed vertices.The key idea of our approach is to compute a prepacking Ψ = { P , P } of the m = | P | + | P | heaviest vertices of the hypergraph and to show that this prepacking suffices to ensurethat each ε ′ -balanced bipartition Π = { V , V } with P ⊆ V and P ⊆ V is deeply balanced.Note that the upcoming definitions and theorems are formulated from the perspective of thefirst bipartition of the input hypergraph H to simplify notation. They can be generalizedto subhypergraphs H V ′ in a similar fashion as was done in Definition 1. Furthermore, wesay that the bipartition Π = { V , V } respects a prepacking Ψ = { P , P } , if P ⊆ V and P ⊆ V , and that the bipartition is balanced, if max(Π ) ≤ L := (1 + ε ′ ) ⌈ c ( V ∪ V )2 ⌉ (with ε ′ as defined in Equation 1). The following definition formalizes our idea. ▶ Definition 2. (Sufficiently Balanced Prepacking) . Let H = ( V, E, c, ω ) be a hypergraphfor which we want to compute an ε -balanced k -way partition via recursive bipartitioning. Wecall a prepacking Ψ of H sufficiently balanced if every balanced bipartition Π respecting Ψ isdeeply balanced with respect to k . . Heuer, N. Maas and S. Schlag 7 Our approach to compute ε -balanced k -way partitions is outlined in Algorithm 1. Wefirst compute a bipartition Π . Before recursing on each of the two induced subhypergraphs,we check if Π is deeply balanced using the LPT algorithm in a similar fashion as describedin the beginning of this paragraph. If it is not deeply balanced, we compute a sufficientlybalanced prepacking Ψ and re-compute Π – treating the vertices of the prepacking as fixedvertices. If this second bipartitioning call was able to compute a balanced bipartition, wefound a deeply balanced partition and proceed to partition the subhypergraphs recursively.Note that, in general, we may not detect that Π is deeply balanced or fail to find asufficiently balanced prepacking Ψ or a balanced bipartition Π , since all involved problemsare NP-hard. However, as we will see in Section 6, this only happens rarely in practice. Algorithm 1

Recursive Bipartitioning Algorithm

Data:

Hypergraph H for which we seek an ε -balanced k -way partition and subhypergraph H V ′ of H which is to be to bipartitioned recursively into k ′ ≤ k blocks. Function recursiveBipartitioning( H , k , ε H V ′ , k ′ ): L ← (1 + ε ′ ) ⌈ c ( V ′ )2 ⌉ // with ε ′ as defined in Equation 1 Π := { V , V } ← multilevelBipartitioning( H V ′ , L , ∅ ) // ∅ = empty prepacking if k ′ = 2 then return Π else if Π is not deeply balanced w.r.t. k ′ then Ψ ← sufficientlyBalancedPrepacking( H , k , ε , H V ′ , k ′ ) // see Algorithm 2 Π ← multilevelBipartitioning( H V ′ , L , Ψ ) // treating Ψ as fixed vertices Π k ← recursiveBipartitioning( H , k , ε , H V , k ) with k := ⌈ k ′ ⌉ Π k ← recursiveBipartitioning( H , k , ε , H V , k ) with k := ⌊ k ′ ⌋ return Π k ∪ Π k Computing a Sufficiently Balanced Prepacking.

The prepacking Ψ is constructed byincrementally assigning vertices to Ψ in decreasing order of weight and checking a property P after each assignment that, if satisfied, implies that the current prepacking is sufficientlybalanced. In the proof of property P , we will extend a k -way prepacking Ψ k to an ε -balanced k -way partition Π k using the LPT algorithm and use the following upper bound on theweight of the heaviest block of Π k . ▶ Lemma 3. (LPT Bound) . Let H = ( V, E, c, ω ) be a weighted hypergraph, Ψ k be a k -wayprepacking for a set of fixed vertices P ⊆ V , and let O := ⟨ v , . . . , v m | v i ∈ V \ P ⟩ be thesequence of all ordinary vertices of V \ P sorted in decreasing order of weight. If we assignthe remaining vertices O to the blocks of Ψ k by using the LPT algorithm, we can extend Ψ k to a k -way partition Π k of H such that the weight of the heaviest block is bound by: max(Π k ) ≤ max { k c ( P ) + h k ( O ) , max(Ψ k ) } , with h k ( O ) := max i ∈{ ,...,m } c ( v i ) + 1 k i − X j =1 c ( v j ) . The proof of Lemma 3 can be found in Appendix A. O is sorted in decreasing order ofweight because for any permutation O ′ of O , it holds that h k ( O ) ≤ h k ( O ′ ) – resulting in thetightest bound for max(Π k ).Assuming that the number k of blocks is even (i.e., k = k = k / ) to simplify notation, thebalance property P is defined as follows (the generalized version can be found in Appendix B): Multilevel Hypergraph Partitioning with Vertex Weights Revisited ▶ Definition 4. (Balance Property P ) . Let H = ( V, E, c, ω ) be a hypergraph for whichwe want to compute an ε -balanced k -way partition and let Ψ be a prepacking of H for aset of fixed vertices P ⊆ V . Furthermore, let O t := ⟨ v , . . . , v t ⟩ be the sequence of the t heaviest ordinary vertices of V \ P sorted in decreasing order of weight such that t is thesmallest number that satisfies max(Ψ) + c ( O t ) ≥ L (see Line 2, Algorithm 1). We say thata prepacking Ψ satisfies the balance property P if the following two conditions hold:(i) the prepacking Ψ is deeply balanced(ii) k / max(Ψ) + h k / ( O t ) ≤ L k . In the following, we will show that the LPT algorithm can be used to construct a k / -waypartition Π k / for both blocks of any balanced bipartition Π = { V , V } that respects Ψ,such that the weight of the heaviest block can be bound by the left term of Condition (ii).This implies that max(Π k / ) ≤ L k (right term of Condition (ii)) and thus proofs that anybalanced bipartition Π respecting Ψ is deeply balanced. Note that choosing t as the smallestnumber that satisfies max(Ψ) + c ( O t ) ≥ L minimizes the left term of Condition (ii) (since h k ( O t ) ≤ h k ( O t +1 )). ▶ Theorem 5.

A prepacking Ψ of a hypergraph H = ( V, E, c, ω ) that satisfies the balanceproperty P is sufficiently balanced with respect to k . Proof.

For convenience, we use k ′ := k / . Let Π = { V , V } be an abitrary balancedbipartition that respects the prepacking Ψ = { P , P } with max (Π ) ≤ L . Since Ψ isdeeply balanced (see Definition 4(i)), there exists a k ′ -way prepacking Ψ k ′ of P such thatmax(Ψ k ′ ) ≤ L k . We define the sequence of the ordinary vertices of block V sorted indecreasing weight order with O := ⟨ v , . . . , v m | v i ∈ V \ P ⟩ . We can extend Ψ k ′ to a k ′ -way partition Π k ′ of V by assigning the vertices of O to the blocks in Ψ k ′ using theLPT algorithm. Lemma 3 then establishes an upper bound on the weight of the heaviestblock.max(Π k ′ ) Lemma 3 ≤ max { k ′ c ( P )+ h k ′ ( O ) , max(Ψ k ′ ) } max(Ψ k ′ ) ≤ Lk ≤ max { k ′ c ( P )+ h k ′ ( O ) , L k } Let O t be the sequence of the t heaviest ordinary vertices of V \ P with P := P ∪ P asdefined in Definition 4. ▷ Claim 6.

It holds that: k ′ c ( P ) + h k ′ ( O ) ≤ k ′ max(Ψ) + h k ′ ( O t ).For a proof of Claim 6 see Appendix C. We can conclude that1 k ′ c ( P ) + h k ′ ( O ) Claim 6 ≤ k ′ max(Ψ) + h k ′ ( O t ) Definition 4(ii) ≤ L k . This proves that the subhypergraph H V is deeply balanced. The proof for block V can bedone analogously, which then implies that Π is deeply balanced. Since Π is an abitrarybalanced bipartition respecting Ψ, it follows that Ψ is sufficiently balanced. ◀ Algorithm 2 outlines our approach to efficiently compute a sufficiently balanced prepackingΨ. In Line 6, we compute a k ′ -way prepacking Ψ k ′ of the i heaviest vertices with theLPT algorithm and if Ψ k ′ satisfies max(Ψ k ′ ) ≤ L k , then Line 7 constructs a deeply balancedprepacking Ψ (which fullfils Condition (i) of Definition 4). We store the blocks P ′ j of Ψ k ′ together with their weights c ( P ′ j ) as key in an addressable priority queue such that we candetermine and update the block with the smallest weight in time O (log k ′ ) (Line 6). InLine 9, we compute the smallest t that satisfies max(Ψ) + c ( O t ) ≥ L via a binary search in . Heuer, N. Maas and S. Schlag 9 logarithmic time over an array containing the vertex weight prefix sums of the sequence O ,which can be precomputed in linear time. Furthermore, we construct a range maximum querydata structure over the array H k ′ / = ⟨ c ( v ) , c ( v ) + k ′ / c ( v ) , . . . , c ( v n ) + k ′ / P n − j =1 c ( v j ) ⟩ .Caculating h k ′ / ( O t ) (Line 10) then corresponds to a range maximum query in the interval[ i +1 , i + t ] in H k ′ / , which can be answered in constant time after H k ′ / has been precomputedin time O ( n ) [6]. In total, the running time of the algorithm is O ( n (log k ′ + log n )). Notethat if the algorithm reaches Line 12, we could not proof that any of the intermediateconstructed prepackings were sufficiently balanced, in which case Ψ represents a bipartitionof H V ′ computed by the LPT algorithm. Algorithm 2

Prepacking Algorithm

Data:

Hypergraph H = ( V, E, c, ω ) for which we seek an ε -balanced k -way partition andsubhypergraph H V ′ = ( V ′ , E ′ , c, ω ) of H which is to be to bipartitioned recursivelyinto k ′ ≤ k blocks. Function sufficientlyBalancedPrepacking( H , k , ε H V ′ , k ′ ): Ψ = ⟨ P , P ⟩ ← ⟨∅ , ∅⟩ and Ψ k ′ = ⟨ P ′ , . . . , P ′ k ′ ⟩ ← ⟨∅ , . . . , ∅⟩ // Initialization L ← (1 + ε ′ ) ⌈ c ( V ′ )2 ⌉ and L k ← (1 + ε ) ⌈ c ( V ) k ⌉ // with ε ′ as defined in Equation 1 O ← ⟨ v , . . . , v n | v i ∈ V ′ ⟩ // V ′ sorted in decreasing order of weight ⇒ O ( n log n ) for i = 1 to n do Add v i ∈ O to bin P ′ j ∈ Ψ k ′ with smallest weight // LPT algorithm Ψ ← { P ′ ∪ . . . ∪ P ′ x , P ′ x +1 ∪ . . . ∪ P ′ k ′ } with x := ⌈ k ′ ⌉ if max(Ψ) ≤ L and max(Ψ k ′ ) ≤ L k then // ⇒ Ψ is deeply ( ε ′ -)balanced t ← min( { t | max(Ψ) + c ( O t ) ≥ L } ) // O t := ⟨ v i +1 , . . . , v i + t ⟩ if k ′ max(Ψ) + h k ′ / ( O t ) ≤ L k then // Condition (ii) of Definition 4 return Ψ // ⇒ Ψ is sufficiently balanced (Theorem 5) return Ψ // No sufficiently balanced prepacking found ⇒ treat all vertices as fixed vertices We integrated the prepacking technique (see Algorithms 1 and 2) into the recursive bipartition-ing algorithm of

KaHyPar . Our implementation is available from .The code is written in C ++

17 and compiled using g ++ -mtune=native-O3 -march=native . Since KaHyPar offers both a recursive bipartitioning and direct k -waypartitioning algorithm (which uses the RB algorithm in the initial partitioning phase), werefer to the RB-version using our improvements as KaHyPar-BP-R and to the direct k -wayversion as KaHyPar-BP-K ( BP = B alanced P artitioning). Instances.

The following experimental evaluation is based on two benchmark sets. The

RealWorld benchmark set consists of 50 hypergraphs originating from the VLSI design andscientific computing domain. It contains instances from the ISPD98 VLSI Circuit BenchmarkSuite [2] (18 instances), the DAC 2012 Routability-Driven Placement Benchmark Suite [42](9 instances), 16 instances from the Stanford Network Analysis (SNAP) Platform [33], and 7highly asymmetric matrices of Davis et al. [15] (referred to as

ASM ). For VLSI instances(

ISPD98 and

DAC ), we use the area of a circuit element as the weight of its correspondingvertex. We translate sparse matrices (

SNAP and

ASM instances) to hypergraphs using the row-net model [13] and use the degree of a vertex as its weight. The vertex weightdistributions of the individual instance types are depicted in Figure 3 in Appendix D. Additionally, we generate ten

Artificial instances that use the net structure of the tenlargest

ISPD98 instances. Instead of using the area as weight, we assign new vertex weightsthat yield instances for which it is difficult to satisfy the balance constraint: Each vertexis assigned either unit weight or a weight chosen randomly from an uniform distribution in[1 , W ] ⊆ N + . Both the probability that a vertex has non-unit weight and the parameter W are determined (depending on the total number of vertices) such that the expected numberof vertices with non-unit weight is 120 and the expected total weight of these vertices is halfthe expected total weight of the resulting hypergraph. System and Methodology.

All experiments are performed on a single core of a clusterwith Intel Xeon Gold 6230 processors running at 2 . KaHyPar-BP-R and

KaHyPar-BP-K with the latest recursive bipartitioning (

KaHyPar-R ) anddirect k -way version ( KaHyPar-K ) of

KaHyPar [21], the default (

PaToH-D ) and quality preset(

PaToH-Q ) of

PaToH hMetis-R ) anddirect k -way version ( hMetis-K ) of hMetis k ∈ { , , , , , , } , ε ∈ { . , . , . } , tenrepetitions using different seeds for each combination of k and ε , and a time limit of eighthours. We call a combination of a hypergraph H = ( V, E, c, ω ), k , and ε an instance . Beforepartitioning an instance, we remove all vertices v ∈ V from H with a weight greater than L k = (1 + ε ) ⌈ c ( V ) k ⌉ as proposed by Caldwell et al. [10] and adapt k to k ′ := k − | V R | , where V R represents the set of removed vertices. We repeat that step recursively until there is novertex with a weight greater than L k ′ := (1 + ε ) ⌈ c ( V \ V R ) k ′ ⌉ . The input for each partitioneris the subhypergraph H V \ V R of H for which we compute a k ′ -way partition with L LPT k ′ asmaximum allowed block weight. Note that since all evaluated partitioners internally employ L k ′ as balance constraint, we initialize each partitioner with a modified imbalance factor ˆ ε instead of ε which is calculated as follows: L k ′ = (1 + ˆ ε ) (cid:24) c ( V \ V R ) k ′ (cid:25) = (1 + ε )LPT( H V \ V R , k ′ ) = L LPT k ′ ⇒ ˆ ε = L LPT k ′ ⌈ c ( V \ V R ) k ′ ⌉ − . We consider the resulting k ′ -way partition Π k ′ to be imbalanced, if it is not ˆ ε -balanced.Each partitioner optimizes the connectivity metric, which we also refer to as the quality ofa partition. Partition Π k ′ can be extended to a k -way partition Π k by adding each of theremoved vertices v ∈ V R to Π k as a separate block. Note that adding the removed verticesincreases the connectivity metric of a k ′ -way partition only by a constant value α ≥

0. Thus,we report the quality of Π k ′ , since ( λ − k ) will be always equal to ( λ − k ′ ) + α .For each instance, we average quality and running times using the arithmetic mean (overall seeds). To further average over multiple instances, we use the geometric mean for absoluterunning times to give each instance a comparable influence. Runs with imbalanced partitionsare not excluded from averaged running times. If all ten runs of a partitioner producedimbalanced partitions on an instance, we consider the instance as imbalanced and mark itwith ✗ in the plots. The benchmark sets and detailed statistics of their properties are publicly available from http://algo2.iti.kit.edu/heuer/sea21/ . . Heuer, N. Maas and S. Schlag 11 Table 1

Percentage of instances for which all ten computed partitions were imbalanced.

ISPD98 DAC ASM SNAP Artificial ε KaHyPar-BP-K

KaHyPar-BP-R

KaHyPar-K

KaHyPar-R hMetis-K hMetis-R

PaToH-Q

PaToH-D

Table 2

Occurrence of prepacked vertices (i.e., vertices that are fixed to a specific block duringpartitioning) for each combination of k and ε when using KaHyPar-BP-R on RealWorld instances:Minimum/average/maximum percentage of prepacked vertices (left), and percentage of instances forwhich the prepacking is executed at least once (right). ε = 0 . ε = 0 . ε = 0 . k Min Avg Max Min Avg Max Min Avg Max ε = 0 . ε = 0 . ε = 0 . ≤ . ≤ . ≤ . ≤ . ≤ . - - - 5.0 1.7 -16 ≤ . ≤ . ≤ . ≤ . ≤ . ≤ . ≤ . ≤ . ≤ . ≤ . ≤ . ≤ . ≤ . ≤ . ≤ . ≤ . ≤ . ≤ . To compare the solution quality of different algorithms, we use performance profiles [17].Let A be the set of all algorithms we want to compare, I the set of instances, and q A ( I ) thequality of algorithm A ∈ A on instance I ∈ I . For each algorithm A , we plot the fraction ofinstances ( y -axis) for which q A ( I ) ≤ τ · min A ′ ∈A q A ′ ( I ), where τ is on the x -axis. For τ = 1,the y -value indicates the percentage of instances for which an algorithm A ∈ A performsbest. Note that these plots relate the quality of an algorithm to the best solution and thusdo not permit a full ranking of three or more algorithms. Balanced Partitioning.

In Table 1, we report the percentage of imbalanced instancesproduced by each partitioner for each instance type and ε . Both KaHyPar-BP-K and

KaHyPar-BP-R compute balanced partitions for all tested benchmark sets and paramet-ers. For the remaining partitioners, the number of imbalanced solutions increases as thebalance constraint becomes tighter. For the previous KaHyPar versions, the number ofimbalanced partitions is most pronounced on VLSI instances: For ε = 0 . KaHyPar-K and

KaHyPar-R compute infeasible solutions for 6.3% (10.3%) of the

ISPD98 and for 9.5%(19.0%) of the

DAC instances. Comparing the distribution of vertex weights reveals thatthese instances tend to have a larger proportion of heavier vertices compared to the

ASM and

SNAP instances (see Figure 3 in Appendix D). The largest benefit of using our approachcan be observed on the artificially generated instances, where

KaHyPar-K and

KaHyPar-R only computed balanced partitions for 72.9% (71.4%) of the instances (for ε = 0 . PaToH and hMetis-R is comparable to that of

KaHyPar-R : PaToH computes significantly fewer F r a c t i o n o f i n s t a n ce s τ

10 100 τ

10 100 KaHyPar-BP-KKaHyPar-BP-R KaHyPar-KKaHyPar-R hMetis-RhMetis-K PaToH-QPaToH-D

Figure 2

Performance profiles comparing the solution quality of

KaHyPar-BP-K and

KaHyPar-BP-R with

KaHyPar-K (left),

KaHyPar-R (left),

PaToH (middle), and hMetis (middle) on our

RealWorld benchmark set, and with all systems on our

Artificial benchmark set (right) ( ε = 0 . feasible solutions on sparse matrix instances ( ASM and

SNAP ) for ε = 0 .

01, while hMetis-R performs considerably worse on the

Artificial benchmark set. Out of all partitioners, hMetis-K yields the most imbalanced instances across all benchmark sets. As can be seen inTable 3 in Appendix F, the number of imbalanced partitions produced by each competingpartitioner increases with deceasing ε and increasing k .Table 2 shows (i) how often our prepacking algorithm is triggered at least once in KaHyPar-BP-R (see Line 5 in Algorithm 1) and (ii) the percentage of vertices that are treatedas fixed vertices (see Table 4 in Appendix G for the results of

KaHyPar-BP-K ). Except for k = 128, on average less than 25% of the vertices are treated as fixed vertices (even less than10% for k < k and smaller ε . Quality and Running Times.

Comparing the different KaHyPar configurations in Figure 2(left), we can see that our new configurations provide the same solution quality as their non-prepacking counterparts. Furthermore, we see that, in general, the direct k -way algorithmstill performs better than its RB counterpart [38]. Figure 2 (middle) therefore compares thestrongest configuration KaHyPar-BP-K with

PaToH and hMetis . We see that

KaHyPar-BP-K performs considerably better than the competitors. If we compare

KaHyPar-BP-K with eachpartitioner individually on the

RealWorld benchmark set,

KaHyPar-BP-K produces parti-tions with higher quality than those of

KaHyPar-K , KaHyPar-BP-R , KaHyPar-R , hMetis-R , hMetis-K , PaToH-Q and

PaToH-D on 48.9%, 70.2%, 73.2%, 76.4%, 84.3%, 92.9% and 97.9% ofthe instances, respectively.

KaHyPar-BP-K outperforms

KaHyPar-BP-R on the

RealWorld benchmark set. On artificial instances, both algorithms produce partitions with comparablequality for ε = { . , . } , while the results are less clear for ε = 0 . KaHyPar . On average,

KaHyPar-BP-K is slightlyfaster than

KaHyPar-K as our new algorithm has replaced the previous balancing strategyin

KaHyPar (restarting the bipartition with an tighter bound on the weight of the heaviestblock if the bipartition is imbalanced). The running time difference is less pronounced for

KaHyPar-BP-R and

KaHyPar-R . This can be explained by the fact that, in

KaHyPar-BP-R ,our prepacking algorithm is executed on the input hypergraph, whereas it is executed on thecoarsest hypergraph in

KaHyPar-BP-K . . Heuer, N. Maas and S. Schlag 13 In this work, we revisited the problem of computing balanced partitions for weightedhypergraphs in the multilevel setting and showed that many state-of-the-art hypergraphpartitioners struggle to find balanced solutions on hypergraphs with weighted vertices –especially for tight balance constraints. We therefore developed an algorithm that enablespartitioners based on the recursive bipartitioning scheme to reliably compute balancedpartitions. The method is based on the concept of deeply balanced bipartitions and isimplemented by pre-assigning a small subset of the heaviest vertices to the two blocks ofeach bipartiton. For this pre-assignment, we established a property that can be verifiedin polynomial time and, if fulfilled, leads to provable balance guarantees for the resulting k -way partition. We integrated the approach into the recursive bipartitioning algorithm of KaHyPar . Our new algorithms

KaHyPar-BP-K and

KaHyPar-BP-R are capable of computingbalanced solutions on all instances of a diverse benchmark set, without negatively affectingthe solution quality or running time of

KaHyPar .Interesting opportunities for future research include replacing the LPT algorithm with analgorithm that additionally optimizes the partitioning objective to construct sufficiently bal-anced prepackings with improved solution quality [34], and integrating rebalancing strategiessimilar to the techniques proposed for non-multilevel partitioners [9, 10, 18] into multilevelrefinement algorithms.

References Y. Akhremtsev, T. Heuer, P. Sanders, and S. Schlag. Engineering a Direct k -way Hyper-graph Partitioning Algorithm. In , pages 28–42. SIAM, 01 2017. C. J. Alpert. The ISPD98 Circuit Benchmark Suite. In

International Symposium on PhysicalDesign (ISPD) , pages 80–85, 4 1998. C. J. Alpert and A. B. Kahng. Recent Directions in Netlist Partitioning: A Survey.

Integration:The VLSI Journal , 19(1-2):1–81, 1995. C. Aykanat, B. B. Cambazoglu, and B. Uçar. Multi-Level Direct k -Way Hypergraph Parti-tioning with Multiple Constraints and Fixed Vertices. Journal of Parallel and DistributedComputing , 68(5):609–625, 2008. D. A. Bader, H. Meyerhenke, P. Sanders, and D. Wagner, editors.

Graph Partitioningand Graph Clustering, 10th DIMACS Implementation Challenge Workshop , volume 588 of

Contemporary Mathematics . American Mathematical Society, 2 2013. M. A. Bender and M. Farach-Colton. The LCA Problem Revisited. In

Latin AmericanSymposium on Theoretical Informatics , pages 88–94. Springer, 2000. R. H. Bisseling, B. O. Auer Fagginger, A. N. Yzelman, T. van Leeuwen, and Ü. V. Çata-lyürek. Two-Dimensional Approaches to Sparse Matrix Partitioning.

Combinatorial ScientificComputing , pages 321–349, 2012. T. N. Bui and C. Jones. Finding Good Approximate Vertex and Edge Partitions is NP-Hard.

Information Processing Letters , 42(3):153–159, 05 1992. A. E. Caldwell, A. B. Kahng, and I. L. Markov. Improved Algorithms for HypergraphBipartitioning. In

Asia South Pacific Design Automation Conference (ASP-DAC) , pages661–666, 2000. A. E. Caldwell, A. B. Kahng, and I. L. Markov. Iterative Partitioning with Varying NodeWeights.

VLSI Design , (3):249–258, 2000. Ü. V Çatalyürek and C. Aykanat. Decomposing Irregularly Sparse Matrices for ParallelMatrix-Vector Multiplication. In

International Workshop on Parallel Algorithms for IrregularlyStructured Problems , pages 75–86. Springer, 1996. Ü. V. Çatalyürek and C. Aykanat. PaToH: Partitioning Tool for Hypergraphs. , 2011. Ü. V. Çatalyürek and Cevdet Aykanat. Hypergraph-Partitioning-Based Decomposition forParallel Sparse-Matrix Vector Multiplication.

IEEE Transactions on Parallel and DistributedSystems , 10(7):673–693, 1999. Ü. V. Çatalyürek, M. Deveci, K. Kaya, and B. Uçar. UMPa: A Multi-Objective, Multi-LevelPartitioner for Communication Minimization. In

Graph Partitioning and Graph Clustering,10th DIMACS Implementation Challenge Workshop , pages 53–66, 2 2012. T. Davis, I. S. Duff, and S. Nakov. Design and Implementation of a Parallel MarkowitzThreshold Algorithm.

SIAM Journal on Matrix Analysis and Applications , 41(2):573–590, 42020. K. D. Devine, E. G. Boman, R. T. Heaphy, R. H. Bisseling, and Ü. V. Çatalyürek. Parallel Hy-pergraph Partitioning for Scientific Computing. In , 4 2006. E. D. Dolan and J. J. Moré. Benchmarking Optimization Software with Performance Profiles.

Mathematical Programming , 91(2):201–213, 2002. S. Dutt and H. Theny. Partitioning Around Roadblocks: Tackling Constraints with Interme-diate Relaxations. In

International Conference on Computer-Aided Design (ICCAD) , pages350–355, 11 1997. C. M. Fiduccia and R. M. Mattheyses. A Linear-Time Heuristic for Improving NetworkPartitions. In , pages 175–181, 1982. M. R. Garey and D. S. Johnson.

Computers and Intractability: A Guide to the Theory ofNP-Completeness , volume 174. W.H. Freeman, San Francisco, 1979. Lars Gottesbüren, Michael Hamann, Sebastian Schlag, and Dorothea Wagner. Advanced Flow-Based Multilevel Hypergraph Partitioning. In . Schloss Dagstuhl-Leibniz-Zentrum für Informatik, 2020. R. L. Graham. Bounds on Multiprocessing Timing Anomalies.

SIAM Journal on AppliedMathematics , 17(2):416–429, 1969. R. L. Graham, E. L. Lawler, J. K. Lenstra, and R. Kan. Optimization and Approximationin Deterministic Sequencing and Scheduling: A Survey. In

Annals of Discrete Mathematics ,volume 5, pages 287–326. Elsevier, 1979. S. A. Hauck.

Multi-FPGA Systems . PhD thesis, 1995. T. Heuer. Engineering Initial Partitioning Algorithms for direct k -way Hypergraph Partitioning.Bachelor thesis, Karlsruhe Institute of Technology, 08 2015. T. Heuer, P. Sanders, and S. Schlag. Network Flow-Based Refinement for Multilevel HypergraphPartitioning.

ACM Journal of Experimental Algorithmics (JEA) , 24(1):2.3:1–2.3:36, 09 2019. T. Heuer and S. Schlag. Improving Coarsening Schemes for Hypergraph Partitioning byExploiting Community Structure. In , Leibniz International Proceedings in Informatics (LIPIcs), pages 21:1–21:19. SchlossDagstuhl – Leibniz-Zentrum für Informatik, 06 2017. G. Karypis. A Software Package for Partitioning Unstructured Graphs, Partitioning Meshes,and Computing Fill-Reducing Orderings of Sparse Matrices, Version 5.1.0. Technical report,University of Minnesota, 2013. G. Karypis, R. Aggarwal, V. Kumar, and S. Shekhar. Multilevel Hypergraph Partitioning:Application in VLSI Domain. In , pages 526–529,6 1997. G. Karypis and V. Kumar. Multilevel k -way Hypergraph Partitioning. VLSI Design , (3):285–300, 2000. B. W. Kernighan and S. Lin. An Efficient Heuristic Procedure for Partitioning Graphs.

TheBell System Technical Journal , 49(2):291–307, 2 1970. T. Lengauer.

Combinatorial Algorithms for Integrated Circuit Layout . John Wiley & Sons,Inc., 1990. . Heuer, N. Maas and S. Schlag 15 J. Leskovec and A. Krevl. SNAP Datasets: Stanford Large Network Dataset Collection.http://snap.stanford.edu/data, 2014. N. Maas. Multilevel Hypergraph Partitioning with Vertex Weights Revisited. Bachelor thesis,Karlsruhe Institute of Technology, 05 2020. D. A. Papa and I. L. Markov. Hypergraph Partitioning and Clustering. In

Handbook ofApproximation Algorithms and Metaheuristics . 2007. M. Pinedo.

Scheduling , volume 29. Springer, 2012. S. Schlag, V. Henne, T. Heuer, H. Meyerhenke, P. Sanders, and C. Schulz. k -way HypergraphPartitioning via n -Level Recursive Bisection. In , pages 53–67. SIAM, 01 2016. Sebastian Schlag.

High-Quality Hypergraph Partitioning . PhD thesis, Karlsruhe Institute ofTechnology, 2020. C. Schulz.

High Quality Graph Partitioning . PhD thesis, Karlsruhe Institute of Technology,2013. H. Shin and C. Kim. A Simple Yet Effective Technique for Partitioning.

IEEE Transactionson Very Large Scale Integration (VLSI) Systems , 1(3):380–386, 1993. B. Vastenhouw and R. H. Bisseling. A Two-Dimensional Data Distribution Method for ParallelSparse Matrix-Vector Multiplication.

SIAM Review , 47(1):67–95, 2005. N. Viswanathan, C. J. Alpert, C. C. N. Sze, Z. Li, and Y. Wei. The DAC 2012 Routability-Driven Placement Contest and Benchmark Suite. In , pages 774–782. ACM, 6 2012.

A Proof of Lemma 3 ▶ Lemma 3. (LPT Bound) . Let H = ( V, E, c, ω ) be a weighted hypergraph, Ψ k be a k -wayprepacking for a set of fixed vertices P ⊆ V , and let O := ⟨ v , . . . , v m | v i ∈ V \ P ⟩ be thesequence of all ordinary vertices of V \ P sorted in decreasing order of weight. If we assignthe remaining vertices O to the blocks of Ψ k by using the LPT algorithm, we can extend Ψ k to a k -way partition Π k of H such that the weight of the heaviest block is bound by: max(Π k ) ≤ max { k c ( P ) + h k ( O ) , max(Ψ k ) } , with h k ( O ) := max i ∈{ ,...,m } c ( v i ) + 1 k i − X j =1 c ( v j ) . Proof.

We define Ψ k := { P , . . . , P k } and Π k := { V , . . . , V k } . Let assume that the LPT al-gorithm assigned the i -th vertex v i of O to block V j ∈ Π k . We define V ( i ) j as a subset ofblock V j that only contains vertices of ⟨ v , . . . , v i ⟩ ⊆ O and P . Since the LPT algorithmalways assigns an vertex to a block with the smallest weight (see Section 3), the weight of V ( i − j must be smaller or equal to k ( c ( P ) + P i − j =1 c ( v j )) (average weight of all previouslyassigned vertices), otherwise V ( i − j would be not the block with the smallest weight. ⇒ c ( V ( i ) j ) = c ( V ( i − j ) + c ( v i ) ≤ k ( c ( P ) + i − X j =1 c ( v j )) + c ( v i ) ≤ k c ( P ) + h k ( O )We can establish an upper bound on the weight of all blocks to which the LPT algorithmassigns an vertex to with k c ( P ) + h k ( O ). If the LPT algorithm does not assign any vertexto a block V j ∈ Π k , its weight is equal to c ( P j ) ≤ max(Ψ k ). ⇒ max(Π k ) ≤ max { k c ( P ) + h k ( O ) , max(Ψ k ) } ◀ B Generalized Balance Property ▶ Definition 7. (Generalized Balance Property) . Let H = ( V, E, c, ω ) be a hypergraph forwhich we want to compute an ε -balanced k -way partition and Ψ := { P , P } be a prepackingof H for a set of fixed vertices P ⊆ V . Furthermore, let O t resp. O t be the sequence of the t resp. t heaviest ordinary vertices of V \ P sorted in decreasing vertex weight order such that t resp. t is the smallest number that satisfies c ( P ) + c ( O t ) ≥ L resp. c ( P ) + c ( O t ) ≥ L (see Line 2, Algorithm 1). We say that a prepacking Ψ satisfies the balance property withrespect to k if the following conditions hold:(i) Ψ is deeply balanced(ii) k c ( P ) + h k ( O t ) ≤ L k with k := ⌈ k ⌉ (iii) k c ( P ) + h k ( O t ) ≤ L k with k := ⌊ k ⌋ The proof of Theorem 5 can be adapted such that we show that there exist a k - resp. k -way partition Π k resp. Π k for V resp. V of any balanced bipartition Π := { V , V } thatrespects the prepacking Ψ with max(Π k ) ≤ k c ( P ) + h k ( O t ) ≤ L k (Defintion (ii)) andmax(Π k ) ≤ k c ( P ) + h k ( O t ) ≤ L k (Defintion (iii)). C Proof of Claim 6 ▶ Lemma 8.

Let L = ⟨ a , . . . , a n ⟩ be a sequence of elements sorted in decreasing weightorder with respect to a weight function c : L → R ≥ (for a subsequence A := ⟨ a , . . . , a l ⟩ of L , we define c ( A ) := P li =1 c ( a i ) ), L ′ be an abitrary subsequence of L sorted in decreasingweight order and L m = ⟨ a , . . . , a m ⟩ the subsequence of the m ≤ n heaviest elements in L .Then the following conditions hold:(i) If c ( L ′ ) ≤ c ( L m ) , then h k ( L ′ ) ≤ h k ( L m ) (ii) If c ( L ′ ) > c ( L m ) , then h k ( L ′ ) − k c ( L ′ ) ≤ h k ( L m ) − k c ( L m ) Proof.

For convenience, we define L ′ := ⟨ b , . . . , b l ⟩ . Note that ∀ i ∈ { , . . . , min( m, l ) } : c ( a i ) ≥ c ( b i ), since L m contains the m heaviest elements in decreasing order. We define i := arg max i ∈{ ,...,l } c ( b i ) + k P i − j =1 c ( b j ) (index that maximizes h k ( L ′ )).(i) + (ii): If i ≤ m , then h k ( L ′ ) = c ( b i ) + 1 k i − X j =1 c ( b j ) ∀ j ∈ [1 ,i ]: c ( b j ) ≤ c ( a j ) ≤ c ( a i ) + 1 k i − X j =1 c ( a j ) ≤ h k ( L m )(i): If m < i ≤ l , then h k ( L ′ ) = c ( b i ) + 1 k i − X j =1 c ( b j ) = c ( b i ) − k n X j = i c ( b j ) + 1 k c ( L ′ ) ≤ (cid:18) − k (cid:19) c ( b i ) + 1 k c ( L ′ ) c ( b i ) ≤ c ( a m ) c ( L ′ ) ≤ c ( L m ) ≤ (cid:18) − k (cid:19) c ( a m ) + 1 k c ( L m ) = c ( a m ) + 1 k m − X j =1 c ( a j ) ≤ h k ( L m )(ii): If m < i ≤ l , then h k ( L ′ ) − k c ( L ′ ) = c ( b i ) + 1 k i − X j =1 c ( b j ) − k c ( L ′ ) = c ( b i ) − k n X l = i c ( b l ) ≤ (cid:18) − k (cid:19) c ( b i ) c ( b i ) ≤ c ( a m ) ≤ (cid:18) − k (cid:19) c ( a m ) = c ( a m ) + 1 k m − X j =1 c ( a j ) − k c ( L m ) ≤ h k ( L m ) − k c ( L m ) . Heuer, N. Maas and S. Schlag 17 ◀▷ Claim 6.

It holds that: k ′ c ( P ) + h k ′ ( O ) ≤ k ′ max(Ψ) + h k ′ ( O t ). Proof.

Remember, Ψ = { P , P } , Π = { V , V } with P ⊆ V and P ⊆ V , O is equal to V \ P and O t represents the t heaviest vertices of ( V ∪ V ) \ ( P ∪ P ) with max(Ψ)+ c ( O t ) ≥ L as defined in Definition 4. The following proof distingush two cases based on Lemma 8.If c ( O ) ≤ c ( O t ), then1 k ′ c ( P ) + h k ′ ( O ) Lemma 8(i) ≤ k ′ c ( P ) + h k ′ ( O t ) c ( P ) ≤ max(Ψ) ≤ k ′ max(Ψ) + h k ′ ( O t )If c ( O ) > c ( O t ), then 1 k ′ c ( P ) + h k ′ ( O ) = 1 k ′ c ( P ) + h k ′ ( O ) − k ′ c ( O ) + 1 k ′ c ( O ) Lemma 8(ii) ≤ k ′ ( c ( P ) + c ( O )) + h k ′ ( O t ) − k ′ c ( O t ) c ( P )+ c ( O )= c ( V ) = 1 k ′ ( c ( V ) − c ( O t )) + h k ′ ( O t ) c ( V ) ≤ L ≤ k ′ ( L − c ( O t )) + h k ′ ( O t ) max(Ψ)+ c ( O t ) ≥ L ≤ k ′ max(Ψ) + h k ′ ( O t ) ◀ D Vertex Weight Distributions

ISPD98 DAC ASM SNAP . . . . Vertex Weight / Total Hypergraph Weight [%] N o . o f V e r t i ce s Figure 3

Overview of the vertex weight distribution for each instance type. The histograms (binwidth = 0 .

2) show the number of vertices (y-axis) with a certain share on the total weight of itscorreponding hypergraph (x-axis). The share of a vertex v ∈ V on the total weight of a weightedhypergraph H = ( V, E, c, ω ) is c ( v ) /c ( V ). E Configuration of Evaluated Partitioners hMetis does not directly optimize the ( λ − sum-of-external-degrees (SOED), which is closely related to the connectivity metric: ( λ − − cut(Π). We therefore configure hMetis to optimize SOED and calculate the( λ − hMetis [30].Additionally, hMetis-R defines the maximum allowed imbalance of a partition differently [28].For example, an imbalance value of 5 means that a block weight between 0 . · c ( V ) and0 . · c ( V ) is allowed at each bisection step. We therefore translate the imbalance parameter ε to a modified parameter ε ′ such that the correct allowed block weight is matched afterlog ( k ) bisections: ε ′ := 100 ·  (1 + ε ) ⌈ c ( V ) k ⌉ c ( V ) ! k ) − .  PaToH is evaluated with both the default (

PaToH-D ) and the quality preset (

PaToH-Q ).However, there are also more fine-grained parameters available for

PaToH as described in [12].In our case, the balance parameter is of special interest as it might affect the ability of

PaToH to find a balanced partition. Therefore, we evaluated the performance of

PaToH onour benchmark set with each of the possible options

Strict , Adaptive and

Relaxed . Theconfiguration using the

Strict option (which is also the default) consistently produced fewestimbalanced partitions and had similar quality to the other configurations. Consequently, weonly report the results of this configuration.

F Number of Imbalanced Partitions per k and ε Table 3

Percentage of imbalanced instances produced by each partitioner on our

RealWorld benchmark set for each combination of k and ε . k ∈ { , , } k ∈ { , } k ∈ { , } Total [%] ε KaHyPar-BP-K

KaHyPar-BP-R

KaHyPar-K

KaHyPar-R hMetis-K hMetis-R

PaToH-Q

PaToH-D

G Prepacking Algorithm Statistics for

KaHyPar-BP-K

Table 4

Occurrence of prepacked vertices (i.e., vertices that are fixed to a specific block duringpartitioning) for each combination of k and ε when using KaHyPar-BP-K on RealWorld instances:Minimum/average/maximum percentage of prepacked vertices (left), and percentage of instances forwhich the prepacking is executed at least once (right). ε = 0 . ε = 0 . ε = 0 . k Min Avg Max Min Avg Max Min Avg Max ε = 0 . ε = 0 . ε = 0 . . Heuer, N. Maas and S. Schlag 19 H Quality Comparison for ε = 0 . and ε = 0 . F r a c t i o n o f i n s t a n ce s τ

10 100 U F r a c t i o n o f i n s t a n ce s τ

10 100 U KaHyPar-BP-KKaHyPar-BP-R KaHyPar-KKaHyPar-R hMetis-RhMetis-K PaToH-QPaToH-D

Figure 4

Comparing the solution quality of each evaluated partitioner for ε = 0 .

03 (left) and ε = 0 . RealWorld benchmark set. Note, (cid:85) marks instances that exceeded the timelimit. F r a c t i o n o f i n s t a n ce s τ

10 100 U F r a c t i o n o f i n s t a n ce s τ

10 100 U KaHyPar-BP-KKaHyPar-BP-R KaHyPar-KKaHyPar-R hMetis-RhMetis-K PaToH-QPaToH-D

Figure 5

Comparing the solution quality of each evaluated partitioner for ε = 0 .

03 (left) and ε = 0 . Artificial benchmark set. Note, (cid:85) marks instances that exceeded the timelimit.

I Absolute Running Times ε = 0 . ε = 0 . ε = 0 . K a H y P a r - B P - KK a H y P a r - B P - RK a H y P a r - KK a H y P a r - Rh M e t i s - Rh M e t i s - KP a T o H - QP a T o H - DK a H y P a r - B P - KK a H y P a r - B P - RK a H y P a r - KK a H y P a r - Rh M e t i s - Rh M e t i s - KP a T o H - QP a T o H - DK a H y P a r - B P - KK a H y P a r - B P - RK a H y P a r - KK a H y P a r - Rh M e t i s - Rh M e t i s - KP a T o H - QP a T o H - D U R unn i n g T i m e [ s ] Figure 6

Comparing the running time of each evaluated partitioner for different values of ε onour RealWorld benchmark set. The number under each boxplot denotes the average running timeof the corresponding partitioner. Note, (cid:85) marks instances that exceeded the time limit. ε = 0 . ε = 0 . ε = 0 . K a H y P a r - B P - KK a H y P a r - B P - RK a H y P a r - KK a H y P a r - Rh M e t i s - Rh M e t i s - KP a T o H - QP a T o H - DK a H y P a r - B P - KK a H y P a r - B P - RK a H y P a r - KK a H y P a r - Rh M e t i s - Rh M e t i s - KP a T o H - QP a T o H - DK a H y P a r - B P - KK a H y P a r - B P - RK a H y P a r - KK a H y P a r - Rh M e t i s - Rh M e t i s - KP a T o H - QP a T o H - D U R unn i n g T i m e [ s ] Figure 7

Comparing the running time of each evaluated partitioner for different values of ε onour Artificial benchmark set. The number under each boxplot denotes the average running timeof the corresponding partitioner. Note, (cid:85)(cid:85)