[PDF] A Fast Optimal Double Row Legalization Algorithm

Abstract

In Placement Legalization, it is often assumed that (almost) all standard cells possess the same height and can therefore be aligned in cell rows, which can then be treated independently. However, this is no longer true for recent technologies, where a substantial number of cells of double- or even arbitrary multiple-row height is to be expected. Due to interdependencies between the cell placements within several rows, the legalization task becomes considerably harder. In this paper, we show how to optimize quadratic cell movement for pairs of adjacent rows comprising cells of single- as well as double-row height with a fixed left-to-right ordering in time \mathcal{O}(n\cdot\log(n)), whereby n denotes the number of cells involved. Opposed to prior works, we thereby do not artificially bound the maximum cell movement and can guarantee to find an optimum solution. Experimental results show an average percental decrease of over 26\% in the total quadratic movement when compared to a legalization approach that fixes cells of more than single-row height after Global Placement.

Full PDF

AA Fast Optimal Double Row Legalization Algorithm

Stefan Hougardy [email protected] Institute for DiscreteMathematics, University of BonnBonn, Germany

Meike Neuwohner [email protected] Institute for DiscreteMathematics, University of BonnBonn, Germany

Ulrike Schorr [email protected] Design Systems Inc.Munich, Germany

ABSTRACT

In Placement Legalization, it is often assumed that (almost) all stan-dard cells possess the same height and can therefore be aligned in cell rows , which can then be treated independently. However, this isno longer true for recent technologies, where a substantial numberof cells of double- or even arbitrary multiple-row height is to beexpected. Due to interdependencies between the cell placementswithin several rows, the legalization task becomes considerablyharder. In this paper, we show how to optimize quadratic cell move-ment for pairs of adjacent rows comprising cells of single- as wellas double-row height with a fixed left-to-right ordering in time O( 𝑛 · log ( 𝑛 )) , whereby 𝑛 denotes the number of cells involved.Opposed to prior works, we thereby do not artificially bound themaximum cell movement and can guarantee to find an optimumsolution. Experimental results show an average percental decreaseof over 26% in the total quadratic movement when compared toa legalization approach that fixes cells of more than single-rowheight after Global Placement. CCS CONCEPTS • Hardware → Placement . KEYWORDS

Placement; Legalization; double-row-height cells

The

Standard Placement Problem captures the task of locating hun-dreds of thousands or even millions of standard cells, which areusually assumed to exhibit uniform heights, within the rectangularchip area. Thereby, multiple objectives such as minimizing the totallength of inter-cell electrical connections (nets) or achieving de-sirable timing properties have to be respected. Given the fact thateven the underlying packing problem is strongly

𝑁 𝑃 -hard [9], theplacement task is most commonly split into the three sub-problemsof

Global Placement , Legalization and

Detailed Placement . GlobalPlacement aims at finding cell locations that approximately mini-mize the total netlength for a certain net model and obey boundson local packing density, but does not have to ensure internal dis-jointness of shapes. The Legalization step deals with resolving theremaining overlaps by shifting cells locally, trying to minimize ei-ther netlength or the total (squared) cell displacement. The latteris desirable because it honors the quality of the Global Placementresult (e.g. w.r.t. timing) and balances cell movement. Detailed Place-ment usually incorporates several post-optimization routines.When only cells of single-row height are present, the Standard CellLegalizers “Tetris” [13] and “Abacus” [20] produce good results.They process the cells one by one, ordered by the 𝑥 -coordinates of their Global Placement positions, and place each cell at the closestfree position [13] or at the end of a nearby row, choosing the onethat allows for the minimum possible total cell movement [20].Another strategy, which is employed within the BonnTools project[16],[3], uses a min-cost-flow approach to first assign the cells to zones , unblocked parts of a row [1]. Fixing the left-to-right orderingof the cells contained within each zone to the one imposed by theGlobal Placement locations, legal cell positions are then obtainedby minimizing the total squared cell displacement (or (weighted)bounding box netlength) within each zone. The latter task is cap-tured by the Single Row Problem , which also occurs as a sub-problemof the Abacus Legalizer. It was first studied by Kahng, Tucker andZelikovsky [15], who suggested the

Clumping Algorithm to tackleit. While their implementation runs in 𝜃 ( 𝑚 · log ( 𝑚 )) for unit netweights (where 𝑚 denotes the number of nets), the fastest imple-mentation, which is due to Suhl [21], achieves a running time of O( 𝑚 · log ( 𝑚 )) even for general net weights. A similar result hasbeen obtained in the context of scheduling [10]. When the goal isto optimize quadratic cell movement, the Clumping Algorithm caneasily be implemented to run in time linear in the number of cells.While the mentioned approaches work well in the presence of uni-form cell heights, it is not obvious how to generalize them to asetting where cells of double- or even arbitrary multiple-row heightmay occur. Wang et al. [23] try to adapt the Clumping Algorithmto the double-row case, but manage to guarantee optimality onlyin a very restricted setting. In contrast to this, Wu and Chu [24]suggest to handle cells of double-row height by, depending on theplacement density, either inflating or matching cells of single-rowheight to ensure uniform cell heights again. However, as was al-ready pointed out in [19], this strategy can neither handle distinctpower alignment constraints nor cells covering more than two rows.Besides, both merging and inflating cells may drastically reduce theplacement flexibility as well as lead to a significant area overhead.Many other authors, therefore, settle for a dynamic programmingsolution instead of generalizing the Clumping Algorithm, guaran-teeing a reasonable runtime by artificially bounding the maximumdisplacement allowed for each cell by a small number of placementsites. In exchange, they show how to make their dynamic programaware of several other desirable objective traits or incorporate alarger degree of freedom by allowing for a local reordering of cells,even between multiple rows [6], [11], [12], [19].Other approaches comprise solving a linear complementarity prob-lem to approximately minimize the squared cell movement and thenresolving the remaining overlaps [5], [18], [25], applying integerlinear programming to legalize sufficiently small regions of the chipseparately [14], or making use of a cell insertion scheme [7], com-bined with bipartite matching and min-cost-flow-algorithms [17].In this paper, we present a fast O( 𝑛 log 𝑛 ) -time (where 𝑛 denotes a r X i v : . [ c s . D S ] J a n he number of cells) algorithm minimizing the total quadratic dis-placement for cells of single- and double-row height that need tobe accommodated in two adjacent rows obeying a fixed orderingof the cells covering each row. In contrast to previous dynamicprogramming approaches, we do not need to artificially restrict thenumber of available positions for each cell, which may be beneficialfor regions of low density and when dealing with coarser grid sizesfor double-row cells, which our algorithm can take into account.Moreover, our approach can be extended to support rectangularmovebounds for the cells.The rest of this paper is organized as follows: In Section 2, wediscuss the Single Row Problem, the Clumping Algorithm and itsimplementation for piecewise quadratic cost functions. In Section 3,we then introduce the Double Row Problem and show how to reduceit to the Single Row Problem in Section 4. Finally, Section 5 presentsour experimental results.

The following section comprises the base results our reduction fromthe Double Row to the Single Row Problem builds upon. • Section 2.1 reviews the Clumping Algorithm and its analysis. • Theorem 2.3 points out how an optimum solution to theSingle Row Problem changes when the domain is restricted. • Section 2.2 discusses an efficient implementation of the Clump-ing Algorithm for piecewise quadratic cost functions.

Definition 2.1 (Single Row Problem).

Instance:

A tuple (C , 𝑤, 𝑥 𝑚𝑖𝑛 , 𝑥 𝑚𝑎𝑥 , ( 𝑓 𝑖 ) 𝑛𝑖 = ) consisting of • a set C : = { 𝐶 , . . . , 𝐶 𝑛 } of cells, • cell widths 𝑤 : C → R + , • a minimum and maximum coordinate 𝑥 𝑚𝑖𝑛 , 𝑥 𝑚𝑎𝑥 ∈ R satisfying (cid:205) 𝑛𝑖 = 𝑤 ( 𝐶 𝑖 ) ≤ 𝑥 𝑚𝑎𝑥 − 𝑥 𝑚𝑖𝑛 and • convex, continuous functions 𝑓 𝑖 : R → R for 𝑖 = , . . . , 𝑛 . Task:

Find coordinates ( 𝑥 𝑖 ) 𝑛𝑖 = minimizing (cid:205) 𝑛𝑖 = 𝑓 𝑖 ( 𝑥 𝑖 ) subjectto • 𝑥 𝑚𝑖𝑛 ≤ 𝑥 , • 𝑥 𝑖 + 𝑤 ( 𝐶 𝑖 ) ≤ 𝑥 𝑖 + for 𝑖 = , . . . , 𝑛 − • 𝑥 𝑛 + 𝑤 ( 𝐶 𝑛 ) ≤ 𝑥 𝑚𝑎𝑥 .For 𝑖 = , . . . , 𝑛 , we write [ 𝑓 − 𝑖 , 𝑓 + 𝑖 ] : = argmin { 𝑓 𝑖 ( 𝑥 ) , 𝑥 ∈ [ 𝑥 𝑚𝑖𝑛 + 𝑖 − ∑︁ 𝑗 = 𝑤 ( 𝐶 𝑗 ) , 𝑥 𝑚𝑎𝑥 − 𝑛 ∑︁ 𝑗 = 𝑖 𝑤 ( 𝐶 𝑗 )]} . The Single Row Problem can be solved by the aforementioned

Clumping Algorithm [15]. The given formulation of the ClumpingAlgorithm (Algorithm 1) is based on [2].Theorem 2.2 ([15]).

The Clumping Algorithm finds an optimumplacement.

We prove a slightly stronger statement which we will need at alater point. In order to formulate it, we have to introduce the notionof a block , which we define as follows: For a cell 𝐶 𝑖 ∈ L , the block 𝐵 ( 𝑖 ) represented by 𝐶 𝑖 is defined to be the consecutive set of cells 𝐵 ( 𝑖 ) : = { 𝐶 𝑗 : 𝑖 ≤ 𝑗 ≤ 𝑛 ∧ (cid:154) 𝐶 𝑙 ∈ L : 𝑖 < 𝑙 ≤ 𝑗 } . The blocks presentat a given point during the run of the Clumping Algorithm indicate Algorithm 1:

Clumping Algorithm

Input:

An instance of the Single Row Problem given byan ordered list L = ( 𝐶 , . . . , 𝐶 𝑛 ) of cells,cell widths 𝑤 : { 𝐶 , . . . , 𝐶 𝑛 } → R + ,a row interval [ 𝑥 𝑚𝑖𝑛 , 𝑥 𝑚𝑎𝑥 ] andconvex cost functions ( 𝑓 𝑖 ) 𝑛𝑖 = . Output:

Optimum positions ( 𝑥 𝑖 ) 𝑛𝑖 = . Add an auxiliary element 𝐶 to the front of L and set 𝑥 ← 𝑥 𝑚𝑖𝑛 and 𝑤 ← for 𝑖 ← to 𝑛 do Compute 𝑓 − 𝑖 and 𝑓 + 𝑖 . 𝑤 𝑖 ← 𝑤 ( 𝐶 𝑖 ) for 𝑖 ← to 𝑛 do 𝑃𝐿𝐴𝐶𝐸 ( 𝐶 𝑖 , L) for 𝑖 ← to 𝑛 with 𝐶 𝑖 ∉ L do 𝑥 𝑖 ← 𝑥 𝑖 − + 𝑤 ( 𝐶 𝑖 − ) return ( 𝑥 𝑖 ) 𝑛𝑖 = Algorithm 2:

𝑃𝐿𝐴𝐶𝐸 ( 𝐶 𝑖 , L) 𝐶 ℎ ← predecessor of 𝐶 𝑖 in L if 𝑥 ℎ + 𝑤 ℎ ≤ 𝑓 + 𝑖 then 𝑥 𝑖 ← max { 𝑥 ℎ + 𝑤 ℎ , 𝑓 − 𝑖 } else 𝐶𝑂𝐿𝐿𝐴𝑃𝑆𝐸 ( 𝐶 ℎ , 𝐶 𝑖 , L) 𝑃𝐿𝐴𝐶𝐸 ( 𝐶 ℎ , L) Algorithm 3:

𝐶𝑂𝐿𝐿𝐴𝑃𝑆𝐸 ( 𝐶 ℎ , 𝐶 𝑖 , L) Redefine 𝑓 ℎ as 𝑥 ↦→ 𝑓 ℎ ( 𝑥 ) + 𝑓 𝑖 ( 𝑥 + 𝑤 ℎ ) and update 𝑓 − ℎ and 𝑓 + ℎ (w.r.t. [ 𝑥 𝑚𝑖𝑛 + (cid:205) ℎ − 𝑗 = 𝑤 ( 𝐶 𝑗 ) , 𝑥 𝑚𝑎𝑥 − (cid:205) 𝑛𝑗 = ℎ 𝑤 ( 𝐶 𝑗 )] ) 𝑤 ℎ ← 𝑤 ℎ + 𝑤 𝑖 Remove 𝐶 𝑖 from L sets of cells that the algorithm forces to be placed contiguously(or has clumped together ) at that time. Note that the partition intoblocks can only get coarser throughout the run of the algorithm.Theorem 2.3. Let 𝐼 ′ : = (C , 𝑤, 𝑥 ′ 𝑚𝑖𝑛 , 𝑥 ′ 𝑚𝑎𝑥 , ( 𝑓 𝑖 ) 𝑛𝑖 = ) be an instanceof the Single Row Problem, let 𝑥 𝑚𝑖𝑛 ≤ 𝑥 ′ 𝑚𝑖𝑛 < 𝑥 ′ 𝑚𝑎𝑥 ≤ 𝑥 𝑚𝑎𝑥 andlet 𝐼 denote the instance of the Single Row Problem that arises fromreplacing 𝑥 ′ 𝑚𝑖𝑛 and 𝑥 ′ 𝑚𝑎𝑥 by 𝑥 𝑚𝑖𝑛 and 𝑥 𝑚𝑎𝑥 , respectively. Then thereexists an optimum solution ( 𝑥 ∗ 𝑖 ) 𝑛𝑖 = for 𝐼 ′ such that for any block 𝐵 ( 𝑖 ) formed during the run of Algorithm 1 on 𝐼 , the cells in 𝐵 ( 𝑖 ) are placedcontiguously. Proof. By induction on the number of calls to

𝐶𝑂𝐿𝐿𝐴𝑃𝑆𝐸 . Ini-tially, the statement is clearly true because every cell constitutes ablock on its own. Consider a call to

𝐶𝑂𝐿𝐿𝐴𝑃𝑆𝐸 where two blocks 𝐵 ( ℎ ) and 𝐵 ( 𝑖 ) are united by deleting 𝐶 𝑖 from L , and pick an op-timum solution ( 𝑥 ∗ 𝑖 ) 𝑛𝑖 = for 𝐼 ′ respecting all previously formedblocks. If additionally 𝑥 ∗ 𝑖 − + 𝑤 ( 𝐶 𝑖 − ) = 𝑥 ∗ 𝑖 , we are done, so as-sume 𝑥 ∗ ℎ + (cid:205) 𝑖 − 𝑙 = ℎ 𝑤 ( 𝐶 𝑙 ) = 𝑥 ∗ 𝑖 − + 𝑤 ( 𝐶 𝑖 − ) < 𝑥 ∗ 𝑖 . By constructionf the algorithm, we have 𝑓 − ℎ ≤ 𝑥 ℎ ≤ 𝑓 + ℎ , 𝑤 ℎ = (cid:205) 𝑖 − 𝑙 = ℎ 𝑤 ( 𝐶 𝑙 ) and 𝑥 ℎ + 𝑤 ℎ > 𝑓 + 𝑖 . If 𝑥 ∗ 𝑖 > 𝑓 + 𝑖 , then we can shift 𝐵 ( 𝑖 ) to the left un-til it hits max { 𝑥 ∗ 𝑖 − + 𝑤 ( 𝐶 𝑖 − ) , 𝑓 + 𝑖 } and thereby decrease the totalcost since the cost function 𝑓 𝑖 of 𝐵 ( 𝑖 ) is strictly monotonically in-creasing on [ 𝑓 + 𝑖 , 𝑥 𝑚𝑎𝑥 − (cid:205) 𝑛𝑗 = 𝑖 𝑤 ( 𝐶 𝑗 )] ⊇ [ 𝑓 + 𝑖 , 𝑥 ′ 𝑚𝑎𝑥 − (cid:205) 𝑛𝑗 = 𝑖 𝑤 ( 𝐶 𝑗 )] , acontradiction to the assumed optimality of ( 𝑥 ∗ 𝑖 ) 𝑛𝑖 = . Hence 𝑥 ∗ 𝑖 ≤ 𝑓 + 𝑖 .Then 𝑥 ∗ 𝑖 − 𝑤 ℎ < 𝑥 ℎ ≤ 𝑓 + ℎ , so we can shift 𝐵 ( ℎ ) to the right untilit hits the left boundary of 𝐵 ( 𝑖 ) without increasing the total costsince the cost function 𝑓 ℎ of 𝐵 ( ℎ ) is monotonically decreasing on [ 𝑥 𝑚𝑖𝑛 + (cid:205) ℎ − 𝑗 = 𝑤 ( 𝐶 𝑗 ) , 𝑓 + ℎ ] ⊇ [ 𝑥 ′ 𝑚𝑖𝑛 + (cid:205) ℎ − 𝑗 = 𝑤 ( 𝐶 𝑗 ) , 𝑓 + ℎ ] . □ Remark.

Together with the fact that the Clumping Algorithmplaces each block 𝐵 ( 𝑖 ) with its optimum range [ 𝑓 − 𝑖 , 𝑓 + 𝑖 ] and hencealso within [ 𝑥 𝑚𝑖𝑛 , 𝑥 𝑚𝑎𝑥 − 𝑤 𝑖 ] (whereby 𝑓 𝑖 and 𝑤 𝑖 refer to the re-spective values after 𝐵 ( 𝑖 ) has been formed), Theorem 2.3 implies op-timality and therefore in particular the correctness of Theorem 2.2.Theorem 2.4. Let 𝐼 and 𝐼 ′ be as in Theorem 2.3 and let ( 𝑥 ∗ 𝑖 ) 𝑛𝑖 = be the solution computed by a run of the Clumping Algorithm on 𝐼 .Then an optimum solution ( 𝑥 ′∗ 𝑖 ) 𝑛𝑖 = for 𝐼 ′ is given by 𝑥 ′∗ 𝑖 = min  𝑥 ′ 𝑚𝑎𝑥 − 𝑛 ∑︁ 𝑗 = 𝑖 𝑤 ( 𝐶 𝑗 ) , max  𝑥 ′ 𝑚𝑖𝑛 + 𝑖 − ∑︁ 𝑗 = 𝑤 ( 𝐶 𝑗 ) , 𝑥 ∗ 𝑖  for 𝑖 = , . . . , 𝑛 . Proof. Feasibility follows easily from the fact that we have 𝑥 ′ 𝑚𝑎𝑥 − 𝑥 ′ 𝑚𝑖𝑛 ≥ (cid:205) 𝑛𝑖 = 𝑤 ( 𝐶 𝑖 ) by definition of the Single Row Problem.By Theorem 2.3, it, therefore, suffices to show that ( 𝑥 ′∗ 𝑖 ) 𝑛𝑖 = placeseach block 𝐵 ( 𝑖 ) arising from the run of the Clumping Algorithm on 𝐼 optimally. Pick such a block 𝐵 ( 𝑖 ) and call its cumulated cost functionto which 𝑓 𝑖 is set during the course of the algorithm ¯ 𝑓 𝑖 . Then bydefinition of the Clumping Algorithm, we have 𝑥 ∗ 𝑖 ∈ [ ¯ 𝑓 − 𝑖 , ¯ 𝑓 + 𝑖 ] . Wedistinguish the three cases • 𝑥 ∗ 𝑖 < 𝑥 ′ 𝑚𝑖𝑛 + (cid:205) 𝑖 − 𝑗 = 𝑤 ( 𝐶 𝑗 ) , • 𝑥 ∗ 𝑖 ∈ [ 𝑥 ′ 𝑚𝑖𝑛 + (cid:205) 𝑖 − 𝑗 = 𝑤 ( 𝐶 𝑗 ) , 𝑥 ′ 𝑚𝑎𝑥 − (cid:205) 𝑛𝑗 = 𝑖 𝑤 ( 𝐶 𝑗 )] and • 𝑥 ′ 𝑚𝑎𝑥 − (cid:205) 𝑛𝑗 = 𝑖 𝑤 ( 𝐶 𝑗 ) < 𝑥 ∗ 𝑖 .In the first case, 𝑥 ′∗ 𝑖 = 𝑥 ′ 𝑚𝑖𝑛 + (cid:205) 𝑖 − 𝑗 = 𝑤 ( 𝐶 𝑗 ) is set to the leftmostfeasible position and furthermore, ¯ 𝑓 𝑖 is monotonically increasingto the right of 𝑥 ∗ 𝑖 < 𝑥 ′∗ 𝑖 , showing that 𝐵 ( 𝑖 ) is placed optimally.In the second case, 𝑥 ′∗ 𝑖 = 𝑥 ∗ 𝑖 is placed within the optimum rangeof ¯ 𝑓 𝑖 ↾ [ 𝑥 𝑚𝑖𝑛 + (cid:205) 𝑖 − 𝑗 = 𝑤 ( 𝐶 𝑗 ) , 𝑥 𝑚𝑎𝑥 − (cid:205) 𝑛𝑗 = 𝑖 𝑤 ( 𝐶 𝑗 )] and therefore inparticular occupies an optimum position for this function. Finally, inthe third case, we get 𝑥 ′∗ 𝑖 = 𝑥 ′ 𝑚𝑎𝑥 − (cid:205) 𝑛𝑗 = 𝑖 𝑤 ( 𝐶 𝑗 ) , which is the right-most feasible position 𝐶 𝑖 may attain. Given that ¯ 𝑓 𝑖 is monotonicallydecreasing on [ 𝑥 ′ 𝑚𝑖𝑛 + (cid:205) 𝑖 − 𝑗 = 𝑤 ( 𝐶 𝑗 ) , 𝑥 ′∗ 𝑖 ] ⊆ [ 𝑥 𝑚𝑖𝑛 + (cid:205) 𝑖 − 𝑗 = 𝑤 ( 𝐶 𝑗 ) , ¯ 𝑓 + 𝑖 ] ,optimality follows again. □ Note that if all of the 𝑓 𝑖 are quadratic functions stored as triples ( 𝑎, 𝑏, 𝑐 ) of coefficients such that 𝑓 𝑖 : 𝑥 ↦→ 𝑎 · 𝑥 + 𝑏 · 𝑥 + 𝑐 , theClumping Algorithm can be implemented to run in linear time, aspointed out, for example, in [21], since the computation of minimaas well as shifting a quadratic function in 𝑥 -direction or addingit to another one only requires a constant number of arithmeticoperations on the respective coefficients. Our strategy to solve the problem of minimizing squared movementwithin two adjacent rows containing cells of both single- and double-row height with a prescribed left-to-right ordering is based on areduction of an instance of the latter problem to an instance of theSingle Row Problem with piecewise quadratic objective functions . Inthe following subsection, we therefore discuss how to implementthe Clumping Algorithm in this case.

Definition 2.5 (piecewise quadratic function).

For [ 𝑎, 𝑏 ] ⊆ R , wecall a continuous function 𝑓 : [ 𝑎, 𝑏 ] → R piecewise quadratic ifthere exist a nonnegative integer 𝑘 and • real numbers 𝑎 = : 𝑥 < 𝑥 < · · · < 𝑥 𝑘 < 𝑥 𝑘 + : = 𝑏 and • quadratic functions ( 𝑓 𝑖 : R → R ) 𝑘𝑖 = such that 𝑓 ↾ [ 𝑥 𝑖 , 𝑥 𝑖 + ] = 𝑓 𝑖 ↾ [ 𝑥 𝑖 , 𝑥 𝑖 + ] for all 𝑖 = , . . . , 𝑘 . Thepositions ( 𝑥 𝑖 ) 𝑘𝑖 = are called kinks of 𝑓 . Note that there exists aunique representation of 𝑓 with 𝑓 𝑖 ≠ 𝑓 𝑖 + for all 𝑖 = , . . . , 𝑘 −

1, towhich we refer when talking about the set of kinks of a piecewisequadratic function.Our goal is to achieve a running time of

O(( 𝑛 + 𝑚 ) log ( min { 𝑛, 𝑚 })) for the Clumping Algorithm, where 𝑛 denotes the number of cellsand 𝑚 specifies the total number of kinks occurring among allcost functions. Therefore, we suggest an implementation of thealgorithm that is based on the one proposed in [21] for the case ofpiecewise linear objective functions. Due to page limit, we do notpresent a detailed description, but rather give a short overview ofthe data structures used as well as a brief outline of the analysis. Representation of cost functions of cells.

We associate the qua-dratic function 𝑥 ↦→ 𝑎 · 𝑥 + 𝑏 · 𝑥 + 𝑐 with the triple ( 𝑎, 𝑏, 𝑐 ) andstore the restriction 𝑓 𝑖 ↾ [ 𝑥 𝑚𝑖𝑛 , 𝑥 𝑚𝑎𝑥 ] of the piecewise quadraticcost function 𝑓 𝑖 as follows:Let 𝑥 𝑚𝑖𝑛 = : 𝑝 𝑚 𝑖 + 𝑖 < 𝑝 𝑚 𝑖 𝑖 < · · · < 𝑝 𝑖 < 𝑝 𝑖 : = 𝑥 𝑚𝑎𝑥 suchthat { 𝑝 𝑖 , . . . , 𝑝 𝑚 𝑖 𝑖 } is the set of kinks of 𝑓 𝑖 ↾ [ 𝑥 𝑚𝑖𝑛 , 𝑥 𝑚𝑎𝑥 ] and let 𝑓 𝑖 ↾ [ 𝑝 𝑗 + 𝑖 , 𝑝 𝑗𝑖 ] be given by the quadratic function 𝑓 𝑗𝑖 , 𝑗 = , . . . , 𝑚 𝑖 .Then we represent 𝑓 𝑖 by the ordered list 𝐹 𝑖 : = (( 𝑝 𝑗 + 𝑖 , 𝑓 𝑗𝑖 )) 𝑚 𝑖 𝑗 = con-sisting of pairs of quadratic functions defining 𝑓 𝑖 ↾ [ 𝑥 𝑚𝑖𝑛 , 𝑥 𝑚𝑎𝑥 ] ona certain interval and the left boundary of their domain. Through-out the algorithm, for each cell 𝐶 𝑖 that has already been processedand is currently placed at the position 𝑥 𝑖 , we maintain the index 𝑗 ( 𝑖 ) ∈ { , . . . , 𝑚 𝑖 } for which 𝑝 𝑗 ( 𝑖 )+ 𝑖 < 𝑥 𝑖 ≤ 𝑝 𝑗 ( 𝑖 ) 𝑖 respectively 𝑗 ( 𝑖 ) = 𝑚 𝑖 if 𝑥 𝑖 = 𝑥 𝑚𝑖𝑛 . Observe that if we implicitly assume all cellsto be located at 𝑥 𝑚𝑎𝑥 initially and further consider a cell 𝐶 𝑗 ∈ 𝐵 ( 𝑖 ) as being placed at 𝑥 𝑖 + (cid:205) 𝑗 − 𝑙 = 𝑖 𝑤 ( 𝐶 𝑙 ) , cells never move to the rightduring a run of the Clumping Algorithm. To see this, note that bydefinition of 𝑓 − 𝑖 and 𝑓 + 𝑖 , each cell is located within [ 𝑥 𝑚𝑖𝑛 , 𝑥 𝑚𝑎𝑥 ] byconstruction. Moreover, whenever 𝑥 ℎ is reassigned after a call to 𝐶𝑂𝐿𝐿𝐴𝑃𝑆𝐸 ( 𝐶 ℎ , 𝐶 𝑖 , L) , then ℎ ≠ 𝐶 𝑔 the predecessor of 𝐶 ℎ in L , we get max { 𝑥 𝑔 + 𝑤 𝑔 , 𝑓 − ℎ } = 𝑥 ℎ ≤ 𝑓 + ℎ and 𝑥 ℎ + 𝑤 ℎ > 𝑓 + 𝑖 ≥ 𝑓 − 𝑖 before 𝐶𝑂𝐿𝐿𝐴𝑃𝑆𝐸 is performed. Hence, after the update of 𝑓 ℎ , wehave 𝑓 − ℎ ≤ 𝑥 ℎ , implying that 𝑥 ℎ is decreased, remains unchanged oranother call to 𝐶𝑂𝐿𝐿𝐴𝑃𝑆𝐸 is launched. In the first case, all alreadyprocessed cells 𝐶 𝑖 with 𝑖 > ℎ belong to 𝐵 ( ℎ ) and therefore move tohe left as well.As a consequence, the total time needed to maintain the indices 𝑗 ( 𝑖 ) can be bounded by O( (cid:205) 𝑛𝑖 = 𝑚 𝑖 ) = O( 𝑚 ) since none of theseindices is ever decreased. Representation of cost functions of blocks.

In order to realize callsto

𝑃𝐿𝐴𝐶𝐸 and

𝐶𝑂𝐿𝐿𝐴𝑃𝑆𝐸 efficiently, we need some additional datawhich we store for the blocks consisting of cells we have alreadyprocessed. Thereby, the key observation is the fact that in order toimplement the function

𝑃𝐿𝐴𝐶𝐸 , only local information on the givenconvex cost function is required since for a convex real function, thequestion whether the interval where it attains its minimum lies tothe left or right of or contains a certain coordinate can be answeredby considering local monotonicity properties. In this spirit, for eachblock 𝐵 ( 𝑖 ) , we store the following data: • a heap 𝐻 ( 𝑖 ) that contains for each 𝐶 𝑙 ∈ 𝐵 ( 𝑖 ) the position 𝑝 𝑗 ( 𝑙 )+ 𝑙 − (cid:205) 𝑙 − ℎ = 𝑖 𝑤 ( 𝐶 ℎ ) unless 𝑗 ( 𝑙 ) = 𝑚 𝑙 and • the quadratic function 𝑔 𝑖 defining 𝑓 𝑖 on the non-empty in-terval ( max 𝐻 ( 𝑖 ) , 𝑥 𝑖 ] (whereby max ∅ : = −∞ ).We outline how to use them in order to implement 𝑃𝐿𝐴𝐶𝐸 and

𝐶𝑂𝐿𝐿𝐴𝑃𝑆𝐸 . Consider a call to

𝑃𝐿𝐴𝐶𝐸 ( 𝐶 𝑖 , L) and remember thatwe implicitly assume that 𝑥 𝑖 = 𝑥 𝑚𝑎𝑥 for 1 ≤ 𝑖 ≤ 𝑛 initially. Fur-ther observe that this convention ensures that throughout thealgorithm, for 𝐶 ℎ , 𝐶 𝑖 ∈ L with ℎ < 𝑖 , we have 𝑥 ℎ + 𝑤 ℎ ≤ 𝑥 𝑖 .In order to execute 𝑃𝐿𝐴𝐶𝐸 , the first thing we have to decide iswhether 𝑥 ℎ + 𝑤 ℎ ≤ 𝑓 + 𝑖 . While we can compute the value of theleft hand side in constant time, 𝑓 + 𝑖 is not necessarily known to us.However, what we do know is that by convexity of 𝑓 𝑖 , 𝑓 + 𝑖 is theunique position in [ 𝑥 𝑚𝑖𝑛 + (cid:205) 𝑖 − 𝑗 = 𝑤 ( 𝐶 𝑗 ) , 𝑥 𝑚𝑎𝑥 − (cid:205) 𝑛𝑗 = 𝑖 𝑤 ( 𝐶 𝑗 )] suchthat 𝑓 𝑖 ↾ [ 𝑥 𝑚𝑖𝑛 + (cid:205) 𝑖 − 𝑗 = 𝑤 ( 𝐶 𝑗 ) , 𝑥 𝑚𝑎𝑥 − (cid:205) 𝑛𝑗 = 𝑖 𝑤 ( 𝐶 𝑗 )] is monotoni-cally decreasing to its left and strictly monotonically increasing toits right. As a consequence, if 𝑓 𝑖 ↾ ( max 𝐻 ( 𝑖 ) , 𝑥 𝑖 ] (which is givenby the quadratic function 𝑔 𝑖 ) is monotonically decreasing, we canbe sure that 𝑓 + 𝑖 ≥ 𝑥 𝑖 ≥ 𝑥 ℎ + 𝑤 ℎ . On the other hand, as long as 𝑓 𝑖 ↾ ( max 𝐻 ( 𝑖 ) , 𝑥 𝑖 ] is strictly monotonically increasing, we can de-crease 𝑥 𝑖 to max { 𝑥 ℎ + 𝑤 ℎ , max 𝐻 ( 𝑖 )} , and, whenever this maximumis attained by max 𝐻 ( 𝑖 ) , pop all corresponding entries from theheap, increment the corresponding indices 𝑗 ( 𝑙 ) by one and inserta new heap entry unless they reach 𝑚 𝑙 , and update 𝑔 𝑖 . Note thatif one precomputes all of the values (cid:205) 𝑖 − 𝑗 = 𝑤 ( 𝐶 𝑗 ) , 𝑖 = , . . . , 𝑛 re-cursively in linear time, which allows to determine (cid:205) 𝑙 − 𝑗 = 𝑖 𝑤 ( 𝐶 𝑗 ) inconstant time throughout the algorithm, each of these update stepstakes constant time per heap entry. In each case where the maxi-mum is not attained by max 𝐻 ( 𝑖 ) , we can infer that 𝑓 + 𝑖 < 𝑥 ℎ + 𝑤 ℎ and therefore launch a call of 𝐶𝑂𝐿𝐿𝐴𝑃𝑆𝐸 . Finally, if there is some 𝑧 ∈ ( max 𝐻 ( 𝑖 ) , 𝑥 𝑖 ) where 𝑔 𝑖 changes from being monotonicallydecreasing to being strictly monotonically increasing, then 𝑧 = 𝑓 + 𝑖 and we are able to decide whether or not 𝑥 ℎ + 𝑤 ℎ ≤ 𝑓 + 𝑖 holds. Incase the latter is true, we also have to determine max { 𝑥 ℎ + 𝑤 ℎ , 𝑓 − 𝑖 } .To this end, observe that by convexity of 𝑓 𝑖 , 𝑓 − 𝑖 is the unique co-ordinate in [ 𝑥 𝑚𝑖𝑛 + (cid:205) 𝑖 − 𝑗 = 𝑤 ( 𝐶 𝑗 ) , 𝑥 𝑚𝑎𝑥 − (cid:205) 𝑛𝑗 = 𝑖 𝑤 ( 𝐶 𝑗 )] such that 𝑓 𝑖 ,restricted to the latter interval, is strictly monotonically decreasingto the left, and monotonically increasing to the right of 𝑓 − 𝑖 . By ap-plying a similar strategy as before, we can therefore either compute 𝑓 − 𝑖 ∈ ( max 𝐻 ( 𝑖 ) , 𝑥 𝑖 ] or set 𝑥 𝑖 to max { 𝑥 ℎ + 𝑤 ℎ , max 𝐻 ( 𝑖 )} ≥ 𝑓 − 𝑖 . As a consequence, we are left with discussing the implementation of 𝐶𝑂𝐿𝐿𝐴𝑃𝑆𝐸 ( 𝐶 ℎ , 𝐶 𝑖 , L) . Since we do not explicitly recompute 𝑓 − ℎ and 𝑓 + ℎ and the updates of 𝑤 ℎ and L can be easily performed inconstant time when implementing L as a doubly linked list, weonly have to take care of the redefinition of 𝑓 ℎ . To this end, notethat 𝑔 ℎ can be updated by setting 𝑔 ℎ ( 𝑥 ) ← 𝑔 ℎ ( 𝑥 ) + 𝑔 𝑖 ( 𝑥 + 𝑤 ℎ ) by a constant number of arithmetic operations on the respectivecoefficients. As far as the heap 𝐻 ( ℎ ) is concerned, we have to shiftall entries in 𝐻 ( 𝑖 ) by 𝑤 ℎ to the left and then merge 𝐻 ( 𝑖 ) into 𝐻 ( ℎ ) .By employing Leftist Heaps and storing key differences instead ofthe actual keys (see [22] for further details), the shifting can be per-formed in constant and the merging in logarithmic (w.r.t. the totalnumber of heap elements) time. A logarithmic or even constanttime bound also applies for all other heap operations we perform,which comprise the creation of empty heaps, the extraction anddeletion of maximum heap entries as well as the insertion of newelements. By observing that the maximum heap size is boundedby min { 𝑛, 𝑚 } since each heap contains at most one entry per cell,but also at most one entry per kink, and that the total number ofheap operations is O( 𝑛 + 𝑚 ) since for every (pair of) shifting andmerging, we remove an entry from L , and every kink position isadded to and removed from a heap at most once, we obtain theclaimed runtime bound. In this section, we • formally introduce the Double Row Problem and • reformulate the feasibility constraints as those of an instanceof the Single Row Problem defined on the set of cells ofdouble-row height.As the name of the problem indicates, the task is to place a setof cells of single- and double-row height within a given rectangu-lar window covering two rows, minimizing a sum of continuous,convex objective functions on the positions of the individual cells.Thereby, the left-to-right ordering of those cells occupying a certainrow is fixed and the cells are not allowed to overlap. Definition 3.1 (Double Row Problem).

Instance: • a non-empty set C : = { 𝐶 , . . . , 𝐶 𝑘 } of double-row cells, • sets of cells – B : = { 𝑏 𝑖 𝑗 , 𝑖 = , . . . , 𝑘, 𝑗 = , . . . , 𝑚 𝑖 } and – T : = { 𝑡 𝑖 𝑗 , 𝑖 = , . . . , 𝑘, 𝑗 = , . . . , 𝑛 𝑖 } to be placed in the bottom respectively top row,where 𝑚 𝑖 , 𝑛 𝑖 ∈ N for 𝑖 = , . . . , 𝑘 , • cell widths 𝑤 : C ∪ B ∪ T → R + , • a minimum and maximum coordinate 𝑥 𝑚𝑖𝑛 , 𝑥 𝑚𝑎𝑥 ∈ R such that 𝑥 𝑚𝑖𝑛 + 𝑘 ∑︁ 𝑖 = 𝑤 ( 𝐶 𝑖 ) + 𝑘 ∑︁ 𝑖 = max  𝑚 𝑖 ∑︁ 𝑗 = 𝑤 ( 𝑏 𝑖 𝑗 ) , 𝑛 𝑖 ∑︁ 𝑗 = 𝑤 ( 𝑡 𝑖 𝑗 )  ≤ 𝑥 𝑚𝑎𝑥 and • convex, continuous cost functions – 𝑓 𝑖 : R → R for 𝑖 = , . . . , 𝑘 , – 𝑔 𝑖 𝑗 : R → R for 𝑖 = , . . . , 𝑘 , 𝑗 = , . . . , 𝑚 𝑖 and – ℎ 𝑖 𝑗 : R → R for 𝑖 = , . . . , 𝑘 , 𝑗 = , . . . , 𝑛 𝑖 . 𝑚𝑖𝑛 𝑥 𝑚𝑎𝑥 𝑏 𝑏 𝑏 𝑏 𝑡 𝑡 𝑡 𝑡 𝑡 𝑡 B B B T T T 𝐶 𝐶 Figure 1: The Double Row Problem.Task:

Find coordinates ( 𝑥 𝑖 ) 𝑘𝑖 = , ( 𝑦 𝑖 𝑗 ) 𝑘𝑖 = 𝑚 𝑖 𝑗 = and ( 𝑧 𝑖 𝑗 ) 𝑘𝑖 = 𝑛 𝑖 𝑗 = min-imizing (cid:205) 𝑘𝑖 = 𝑓 𝑖 ( 𝑥 𝑖 ) + (cid:205) 𝑘𝑖 = (cid:16)(cid:205) 𝑚 𝑖 𝑗 = 𝑔 𝑖 𝑗 ( 𝑦 𝑖 𝑗 ) + (cid:205) 𝑛 𝑖 𝑗 = ℎ 𝑖 𝑗 ( 𝑧 𝑖 𝑗 ) (cid:17) subject to • 𝑥 𝑖 + 𝑤 ( 𝐶 𝑖 ) ≤ 𝑥 𝑖 + for 𝑖 = , . . . , 𝑘 , • 𝑥 𝑖 + 𝑤 ( 𝐶 𝑖 ) ≤ 𝑦 𝑖 for 𝑖 = , . . . , 𝑘 , • 𝑦 𝑖 𝑗 + 𝑤 ( 𝑏 𝑖 𝑗 ) ≤ 𝑦 𝑖 𝑗 + for 𝑖 = , . . . , 𝑘 , 𝑗 = , . . . , 𝑚 𝑖 − • 𝑦 𝑖𝑚 𝑖 + 𝑤 ( 𝑏 𝑖𝑚 𝑖 ) ≤ 𝑥 𝑖 + for 𝑖 = , . . . , 𝑘 , • 𝑥 𝑖 + 𝑤 ( 𝐶 𝑖 ) ≤ 𝑧 𝑖 for 𝑖 = , . . . , 𝑘 , • 𝑧 𝑖 𝑗 + 𝑤 ( 𝑡 𝑖 𝑗 ) ≤ 𝑧 𝑖 𝑗 + for 𝑖 = , . . . , 𝑘 , 𝑗 = , . . . , 𝑛 𝑖 − • 𝑧 𝑖𝑛 𝑖 + 𝑤 ( 𝑡 𝑖𝑛 𝑖 ) ≤ 𝑥 𝑖 + for 𝑖 = , . . . , 𝑘 ,where 𝑥 : = 𝑥 𝑚𝑖𝑛 , 𝑤 ( 𝐶 ) : = 𝑥 𝑘 + : = 𝑥 𝑚𝑎𝑥 and each con-straint only applies if all of its variables exist.For 𝑖 = , . . . , 𝑘 , we define B 𝑖 : = { 𝑏 𝑖 𝑗 , 𝑗 = , . . . , 𝑚 𝑖 } and T 𝑖 : = { 𝑡 𝑖 𝑗 , 𝑗 = , . . . , 𝑛 𝑖 } .Proposition 3.2. Given a tuple ( 𝑥 ∗ 𝑖 ) 𝑘𝑖 = and an instance of theDouble Row Problem as defined above, there exists a feasible solutionto the Double Row Problem with 𝑥 𝑖 = 𝑥 ∗ 𝑖 for 𝑖 = , . . . , 𝑘 if and only if 𝑥 ∗ 𝑖 + 𝑤 ( 𝐶 𝑖 ) + max  𝑚 𝑖 ∑︁ 𝑗 = 𝑤 ( 𝑏 𝑖 𝑗 ) , 𝑛 𝑖 ∑︁ 𝑗 = 𝑤 ( 𝑡 𝑖 𝑗 )  ≤ 𝑥 ∗ 𝑖 + for 𝑖 = , . . . , 𝑘, where 𝑥 ∗ : = 𝑥 : = 𝑥 𝑚𝑖𝑛 , 𝑤 ( 𝐶 ) : = and 𝑥 ∗ 𝑘 + : = 𝑥 𝑘 + : = 𝑥 𝑚𝑎𝑥 . We call such a tuple ( 𝑥 ∗ 𝑖 ) 𝑘𝑖 = feasible . Remark.

Note that a tuple ( 𝑥 ∗ 𝑖 ) 𝑘𝑖 = is feasible if and only if itdefines a feasible solution to the instance of the Single Row Problemwith cell set C , cell widths 𝑤 ′ ( 𝐶 𝑖 ) : = 𝑤 ( 𝐶 𝑖 ) + max  𝑚 𝑖 ∑︁ 𝑗 = 𝑤 ( 𝑏 𝑖 𝑗 ) , 𝑛 𝑖 ∑︁ 𝑗 = 𝑤 ( 𝑡 𝑖 𝑗 )  and enclosing 𝑥 -interval [ 𝑥 ′ 𝑚𝑖𝑛 , 𝑥 ′ 𝑚𝑎𝑥 ] given by 𝑥 ′ 𝑚𝑖𝑛 : = 𝑥 𝑚𝑖𝑛 + max  𝑚 ∑︁ 𝑗 = 𝑤 ( 𝑏 𝑗 ) , 𝑛 ∑︁ 𝑗 = 𝑤 ( 𝑡 𝑗 )  and 𝑥 ′ 𝑚𝑎𝑥 : = 𝑥 𝑚𝑎𝑥 . For the remainder of this paper, we restrict ourselves to the caseof piecewise quadratic cost functions and show how to reducethe respective variant of the Double Row Problem to the SingleRow one. As we have already seen how to deal with the subjectof feasibility, it remains to transfer costs from the single-row cellsto the double-row ones, i.e. to determine the minimum cost of afeasible extension of a feasible tuple ( 𝑥 ∗ 𝑖 ) 𝑘𝑖 = and to express it as (cid:205) 𝑘𝑖 = 𝑓 ′ 𝑖 ( 𝑥 ∗ 𝑖 ) for some piecewise quadratic objective functions 𝑓 ′ 𝑖 . • We examine the structure of an optimum extension of afeasible tuple to coordinates for the single-row height cells. • Lemma 4.1 expresses the total cost of such an extension, upto a constant, as a sum (cid:205) 𝑘𝑖 = 𝐹 𝑖 ( 𝑥 ∗ 𝑖 ) . • We show that each of the functions 𝐹 𝑖 is convex and piece-wise quadratic and linearly bound the total number of kinks. • We then derive our main result stated in Theorem 4.2.Consider the coordinates ( ¯ 𝑦 𝑖 𝑗 ) 𝑘𝑖 = 𝑚 𝑖 𝑗 = and ( ¯ 𝑧 𝑖 𝑗 ) 𝑘𝑖 = 𝑛 𝑖 𝑗 = arising fromruns of the Clumping Algorithm on the instances of the SingleRow Problem given by (B 𝑖 , 𝑤 ↾ B 𝑖 , 𝑥 𝑚𝑖𝑛 , 𝑥 𝑚𝑎𝑥 , ( 𝑔 𝑖 𝑗 ) 𝑚 𝑖 𝑗 = ) and (T 𝑖 ,𝑤 ↾ T 𝑖 , 𝑥 𝑚𝑖𝑛 , 𝑥 𝑚𝑎𝑥 , ( ℎ 𝑖 𝑗 ) 𝑛 𝑖 𝑗 = ) for 𝑖 = , . . . , 𝑘 . Note that once afeasible tuple ( 𝑥 ∗ 𝑖 ) 𝑘𝑖 = of coordinates for the double-row cells hasbeen fixed, coordinates ( 𝑦 𝑖 𝑗 ) 𝑘𝑖 = 𝑚 𝑖 𝑗 = and ( 𝑧 𝑖 𝑗 ) 𝑘𝑖 = 𝑛 𝑖 𝑗 = extend themto a feasible solution of the Double Row Problem if and only iffor each 𝑖 ∈ { , . . . , 𝑘 } , ( 𝑦 𝑖 𝑗 ) 𝑚 𝑖 𝑗 = and ( 𝑧 𝑖 𝑗 ) 𝑛 𝑖 𝑗 = constitute feasiblesolutions of the instances of the Single Row Problem given by (B 𝑖 , 𝑤 ↾ B 𝑖 , 𝑥 ∗ 𝑖 + 𝑤 ( 𝐶 𝑖 ) , 𝑥 ∗ 𝑖 + , ( 𝑔 𝑖 𝑗 ) 𝑚 𝑖 𝑗 = ) and (T 𝑖 , 𝑤 ↾ T 𝑖 , 𝑥 ∗ 𝑖 + 𝑤 ( 𝐶 𝑖 ) ,𝑥 ∗ 𝑖 + , ( ℎ 𝑖 𝑗 ) 𝑛 𝑖 𝑗 = ) , respectively, whereby again 𝑥 ∗ : = 𝑥 𝑚𝑖𝑛 , 𝑤 ( 𝐶 ) : = 𝑥 ∗ 𝑘 + : = 𝑥 𝑚𝑎𝑥 . Note that these instances are feasible by fea-sibility of ( 𝑥 ∗ 𝑖 ) 𝑘𝑖 = . But now, since for each 𝑖 = , . . . , 𝑘 , we have 𝑥 𝑚𝑖𝑛 ≤ 𝑥 ∗ 𝑖 + 𝑤 ( 𝐶 𝑖 ) ≤ 𝑥 ∗ 𝑖 + ≤ 𝑥 𝑚𝑎𝑥 , Theorem 2.4 tells us that anoptimum extension ( 𝑦 ∗ 𝑖 𝑗 ) 𝑘𝑖 = 𝑚 𝑖 𝑗 = and ( 𝑧 ∗ 𝑖 𝑗 ) 𝑘𝑖 = 𝑛 𝑖 𝑗 = of ( 𝑥 ∗ 𝑖 ) 𝑘𝑖 = is givenby 𝑦 ∗ 𝑖 𝑗 = min { 𝑥 ∗ 𝑖 + − 𝑚 𝑖 ∑︁ 𝑙 = 𝑗 𝑤 ( 𝑏 𝑖𝑙 ) , max { 𝑥 ∗ 𝑖 + 𝑤 ( 𝐶 𝑖 )+ 𝑗 − ∑︁ 𝑙 = 𝑤 ( 𝑏 𝑖𝑙 ) , ¯ 𝑦 𝑖 𝑗 }} (1)and 𝑧 ∗ 𝑖 𝑗 = min { 𝑥 ∗ 𝑖 + − 𝑛 𝑖 ∑︁ 𝑙 = 𝑗 𝑤 ( 𝑡 𝑖𝑙 ) , max { 𝑥 ∗ 𝑖 + 𝑤 ( 𝐶 𝑖 ) + 𝑗 − ∑︁ 𝑙 = 𝑤 ( 𝑡 𝑖𝑙 ) , ¯ 𝑧 𝑖 𝑗 }} . (2)This allows us to express the total cost of the solution in terms ofthe coordinates ( 𝑥 ∗ 𝑖 ) 𝑘𝑖 = :Lemma 4.1. Let ( ¯ 𝑦 𝑖 𝑗 ) 𝑘𝑖 = 𝑚 𝑖 𝑗 = and ( ¯ 𝑧 𝑖 𝑗 ) 𝑘𝑖 = 𝑛 𝑖 𝑗 = as before and define 𝐹 𝑖 : 𝑥 ↦→ 𝑓 𝑖 ( 𝑥 ) (3) + 𝑚 𝑖 − ∑︁ 𝑗 = 𝑔 𝑖 − 𝑗 ( min { 𝑥 − 𝑚 𝑖 − ∑︁ 𝑙 = 𝑗 𝑤 ( 𝑏 𝑖 − 𝑙 ) , ¯ 𝑦 𝑖 − 𝑗 }) (4) + 𝑚 𝑖 ∑︁ 𝑗 = 𝑔 𝑖 𝑗 ( max { 𝑥 + 𝑤 ( 𝐶 𝑖 ) + 𝑗 − ∑︁ 𝑙 = 𝑤 ( 𝑏 𝑖𝑙 ) , ¯ 𝑦 𝑖 𝑗 }) (5) + 𝑛 𝑖 − ∑︁ 𝑗 = ℎ 𝑖 − 𝑗 ( min { 𝑥 − 𝑛 𝑖 − ∑︁ 𝑙 = 𝑗 𝑤 ( 𝑡 𝑖 − 𝑙 ) , ¯ 𝑧 𝑖 − 𝑗 }) (6) + 𝑛 𝑖 ∑︁ 𝑗 = ℎ 𝑖 𝑗 ( max { 𝑥 + 𝑤 ( 𝐶 𝑖 ) + 𝑗 − ∑︁ 𝑙 = 𝑤 ( 𝑡 𝑖𝑙 ) , ¯ 𝑧 𝑖 𝑗 }) (7) and 𝑐 : = (cid:205) 𝑘 − 𝑖 = (cid:205) 𝑚 𝑖 𝑗 = 𝑔 𝑖 𝑗 ( ¯ 𝑦 𝑖 𝑗 ) + (cid:205) 𝑘 − 𝑖 = (cid:205) 𝑛 𝑖 𝑗 = ℎ 𝑖 𝑗 ( ¯ 𝑧 𝑖 𝑗 ) . Then for a feasi-ble tuple ( 𝑥 ∗ 𝑖 ) 𝑘𝑖 = , the total cost of an optimum solution to the DoubleRow Problem with 𝑥 𝑖 = 𝑥 ∗ 𝑖 for 𝑖 = , . . . , 𝑘 amounts to (cid:205) 𝑘𝑖 = 𝐹 𝑖 ( 𝑥 ∗ 𝑖 ) − 𝑐 . Proof. Recall that an optimum extension ( 𝑦 ∗ 𝑖 𝑗 ) 𝑘𝑖 = 𝑚 𝑖 𝑗 = and ( 𝑧 ∗ 𝑖 𝑗 ) 𝑘𝑖 = 𝑛 𝑖 𝑗 = of ( 𝑥 ∗ 𝑖 ) 𝑘𝑖 = is given by (1) and (2). We are done if we can show thator any cell, the part of the cost term involving its objective functionmatches the cost of its position in the given solution.For the cells ( 𝐶 𝑖 ) 𝑘𝑖 = , this is clear.For a cell 𝑏 𝑗 with 𝑗 ∈ { , . . . , 𝑚 } , the desired statement followsfrom 𝑥 ∗ + 𝑤 ( 𝐶 ) + (cid:205) 𝑗 − 𝑙 = 𝑤 ( 𝑏 𝑙 ) = 𝑥 𝑚𝑖𝑛 + (cid:205) 𝑗 − 𝑙 = 𝑤 ( 𝑏 𝑙 ) ≤ ¯ 𝑦 𝑗 , and asimilar argument applies for 𝑖 = 𝑘 .For a cell 𝑏 𝑖 𝑗 with 𝑖 ∈ { , . . . , 𝑘 − } and 𝑗 ∈ { , . . . , 𝑚 𝑖 } , we exem-plarily consider the case where ¯ 𝑦 𝑖 𝑗 ≤ 𝑥 ∗ 𝑖 + 𝑤 ( 𝐶 𝑖 ) + (cid:205) 𝑗 − 𝑙 = 𝑤 ( 𝑏 𝑖𝑙 ) sincethe cases 𝑥 ∗ 𝑖 + 𝑤 ( 𝐶 𝑖 ) + (cid:205) 𝑗 − 𝑙 = 𝑤 ( 𝑏 𝑖𝑙 ) < ¯ 𝑦 𝑖 𝑗 < 𝑥 ∗ 𝑖 + − (cid:205) 𝑚 𝑖 𝑙 = 𝑗 𝑤 ( 𝑏 𝑖𝑙 ) and 𝑥 ∗ 𝑖 + − (cid:205) 𝑚 𝑖 𝑙 = 𝑗 𝑤 ( 𝑏 𝑖𝑙 ) ≤ ¯ 𝑦 𝑖 𝑗 can be treated similarly. In the mentionedcase, we get 𝑦 ∗ 𝑖 𝑗 = min { 𝑥 ∗ 𝑖 + − 𝑚 𝑖 ∑︁ 𝑙 = 𝑗 𝑤 ( 𝑏 𝑖𝑙 ) , max { 𝑥 ∗ 𝑖 + 𝑤 ( 𝐶 𝑖 ) + 𝑗 − ∑︁ 𝑙 = 𝑤 ( 𝑏 𝑖𝑙 ) , ¯ 𝑦 𝑖 𝑗 }} = max { 𝑥 ∗ 𝑖 + 𝑤 ( 𝐶 𝑖 ) + 𝑗 − ∑︁ 𝑙 = 𝑤 ( 𝑏 𝑖𝑙 ) , ¯ 𝑦 𝑖 𝑗 } and min { 𝑥 ∗ 𝑖 + − (cid:205) 𝑚 𝑖 𝑙 = 𝑗 𝑤 ( 𝑏 𝑖𝑙 ) , ¯ 𝑦 𝑖 𝑗 } = ¯ 𝑦 𝑖 𝑗 , so 𝑔 𝑖 𝑗 ( max { 𝑥 ∗ 𝑖 + 𝑤 ( 𝐶 𝑖 ) + 𝑗 − ∑︁ 𝑙 = 𝑤 ( 𝑏 𝑖𝑙 ) , ¯ 𝑦 𝑖 𝑗 })+ 𝑔 𝑖 𝑗 ( min { 𝑥 ∗ 𝑖 + − 𝑚 𝑖 ∑︁ 𝑙 = 𝑗 𝑤 ( 𝑏 𝑖𝑙 ) , ¯ 𝑦 𝑖 𝑗 }) − 𝑔 𝑖 𝑗 ( ¯ 𝑦 𝑖 𝑗 ) = 𝑔 𝑖 𝑗 ( 𝑦 ∗ 𝑖 𝑗 ) + 𝑔 𝑖 𝑗 ( ¯ 𝑦 𝑖 𝑗 ) − 𝑔 𝑖 𝑗 ( ¯ 𝑦 𝑖 𝑗 ) = 𝑔 𝑖 𝑗 ( 𝑦 ∗ 𝑖 𝑗 ) . The cells in T can be treated analogously. □ Up to the constant 𝑐 , which only depends on the given instanceof the Double Row Problem, but not on the tuple ( 𝑥 ∗ 𝑖 ) 𝑘𝑖 = , we canhence express the costs of an optimum solution extending a fea-sible tuple ( 𝑥 ∗ 𝑖 ) 𝑘𝑖 = as a sum of the cost functions ( 𝐹 𝑖 ) 𝑘𝑖 = appliedto the individual coordinates. Note that each of the summandscontributing to 𝐹 𝑖 and hence 𝐹 𝑖 itself is piecewise quadratic sincelinear shifting as well as replacement by a constant function tothe left or right of a certain coordinate (ensuring continuity) pre-serves this property. In addition to that, it is not hard to see thatthe total number of kinks the cost functions ( 𝐹 𝑖 ) 𝑘𝑖 = possess canbe bounded by 2 · (|B| + |T |) + 𝑁 , where 𝑁 denotes the totalnumber of kinks present in the cost functions of the single- anddouble-row cells. To show that all 𝐹 𝑖 are actually convex, it is suffi-cient to show that each of the summands (3)-(7) induces a convexfunction. This is clear for (3), and we exemplarily show it for (5).Let L 𝑏𝑖 denote the list of cells arising from the run of the Clump-ing Algorithm on the aforementioned instance of the Single RowProblem with cell set B 𝑖 . Given that for 𝑏 𝑖 𝑗 ∈ L 𝑏𝑖 , the cells in theblock 𝐵 ( 𝑖 𝑗 ) starting at 𝑏 𝑖 𝑗 are placed contiguously, we can rewrite(5) as (cid:205) 𝑏 𝑖𝑗 ∈L 𝑏𝑖 𝐺 𝑖 𝑗 ( max { 𝑥 + 𝑤 ( 𝐶 𝑖 ) + (cid:205) 𝑗 − 𝑙 = 𝑤 ( 𝑏 𝑖𝑙 ) , ¯ 𝑦 𝑖 𝑗 }) , where 𝐺 𝑖 𝑗 denotes the cumulated cost function of the block represented by 𝑏 𝑖 𝑗 .Recall that by definition of the Clumping Algorithm, ¯ 𝑦 𝑖 𝑗 occupies aminimum position of 𝐺 𝑖 𝑗 for 𝑏 𝑖 𝑗 ∈ L 𝑏𝑖 . Given that for a continuous,convex function 𝑓 : [ 𝑎, 𝑏 ] → R and 𝑥 ∈ argmin { 𝑓 ( 𝑥 ) , 𝑥 ∈ [ 𝑎, 𝑏 ]} ,the function mapping 𝑥 ∈ [ 𝑎, 𝑏 ] to 𝑓 ( max { 𝑥, 𝑥 }) is convex, itfollows that (5) defines a convex function in 𝑥 . By applying anal-ogous arguments for the remaining summands, we can infer that each 𝐹 𝑖 is convex as a sum of convex functions. This completesour reduction from the Double to the Single Row Problem andit remains to discuss the runtime it requires. Note that the posi-tions ( ¯ 𝑦 𝑖 𝑗 ) 𝑘𝑖 = 𝑚 𝑖 𝑗 = and ( ¯ 𝑧 𝑖 𝑗 ) 𝑘𝑖 = 𝑛 𝑖 𝑗 = can be computed in total time O((|B| + |T | + 𝑁 ) · log (|B| + |T |)) , where again 𝑁 denotes the totalnumber of kinks of the all cost functions appearing in the giveninstance of the Double Row Problem.A time of O((|C| + |B| + |T | + 𝑁 ) · log (|C| + |B| + |T | + 𝑁 )) thensuffices to build up and solve the instance of the Single Row Problemon the set of double-row cells to which we reduce, and optimumcoordinates for the single-row cells can be deduced from the com-puted positions for the cells in C in linear time. Putting everythingtogether, we can therefore formulate the following theorem:Theorem 4.2. The Double Row Problem with piecewise quadraticfunctions with a total amount of 𝑁 kinks can be solved in time O((|C| + |B| + |T | + 𝑁 ) · log (|C| + |B| + |T | + 𝑁 )) . We implemented the proposed algorithm in the C++ programminglanguage and embedded it into the legalization framework describedin [1]. More precisely, we first run the legalization algorithm from[1], which legalizes all cells of more than single-row height viaa greedy projection approach and then proceeds by assigning allcells of single-row height to so-called zones , unblocked segmentsof cell rows, through a min-cost-flow algorithm. Within each zone,the left-to-right ordering is inferred from the Global Placementpositions. While the algorithm from [1] proceeds by optimizingsquared cell movement only within each zone making use of theClumping Algorithm, we instead apply the Double Row Algorithmto the instances of the Double Row Problem arising from the givenleft-to-right ordering in every second pair of rows, treating all cellsof more than double-row height as blockages.All experiments were performed single-threaded on Intel Xeon3.3GHz CPUs with 384GB RAM. We conduct two experiments ontwo different sets of benchmarks. The first one aims at establishingthe competitiveness of our legalization approach when comparedto recent works on the matter of mixed-cell-height legalization. Thesecond experiment displays the effectiveness of the Double RowAlgorithm in improving squared cell movement.For the first experiment, we run our algorithm on benchmark in-stances from the ICCAD-2017 CAD Contest on Multi-Deck Standard-Cell Legalization [8]. In doing so, we omit fence region constraintsas well as soft constraints, but stick to the required power-rail align-ment. As most prior works optimize linear instead of squared cellmovement, we employ our proposed legalization method to mini-mize linear movement during the Double Row Algorithm. Observethat this is possible since for each cell, once its row assignment isfixed, the distance to its Global Placement location constitutes apiecewise linear and hence in particular piecewise quadratic func-tion. However, we point out that minimizing l1 movement is notthe main purpose of our algorithm and that in particular, the as-signment to zones is designed to optimize squared instead of linearmovement. Hence, the subsequent comparison should be regardedas proof that our algorithm, even though not explicitly devisedto do so, can compete with state-of-the-art legalizers concerninglinear cell movement. We compare the average l1 cell movement able 1: Comparison between the average cell movement in terms of horizontal placement sites.

Instance GP HPWL (m) Δ HPWL Av. L1 Movement (Sites) Max. L1 Movement (Sites) CPU (sec)DAC’17 ISPD’19 TCAD’13 Ours DAC’17 ISPD’19 TCAD’13 Ours OursISPD’19 DAC’17 ISPD’19 TCAD’13 Ours DAC’17 ISPD’19 Oursdes_perf_1 1.217 16.21% 6.66% 4.52% 4.52% 10.86 6.97 6.66 6.66 95.55% 200.82 48.95 57.22 57.22 11.23 11.75 9.97des_perf_ a_md1 2.160 3.27% 2.48% 2.20% 2.19% 6.71 5.94 5.85 5.79 97.47% 607.30 607.30 607.30 607.30 2.30 2.79 8.05des_perf_a_md2 2.177 3.35% 2.51% 2.23% 2.23% 6.77 5.93 6.08 6.07 102.36% 403.86 403.86 403.86 403.86 2.19 6.82 8.53des_perf_b_md1 2.106 1.75% 1.52% 1.61% 1.59% 5.17 4.77 4.78 4.72 98.95% 79.34 38.45 48.19 45.19 2.01 3.64 6.79des_perf_b_md2 2.137 2.05% 1.72% 1.50% 1.49% 5.74 5.25 5.38 5.31 101.14% 198.74 39.76 50.68 50.68 2.31 3.12 8.06edit_dist_1_md1 4.004 1.47% 1.39% 1.27% 1.26% 6.22 5.79 5.75 5.69 98.27% 109.34 95.45 67.55 67.55 3.49 5.19 9.67edit_dist_a_md2 5.103 1.17% 1.01% 0.92% 0.91% 6.02 5.51 5.57 5.51 100.00% 164.00 164.00 164.00 164.00 2.59 2.24 10.78edit_dis_ a_md3 5.328 2.69% 1.48% 1.02% 1.02% 9.11 7.08 6.96 6.93 97.88% 233.00 233.00 233.00 233.00 5.91 15.68 15.87fft_2_md2 0.444 11.21% 8.78% 7.14% 7.02% 8.84 7.54 7.89 7.76 102.92% 102.94 73.60 59.55 60.55 0.70 2.89 2.81fft_a_md2 1.092 0.98% 0.95% 1.13% 1.13% 5.03 4.86 4.74 4.70 96.71% 345.50 345.50 343.48 346.50 0.69 0.60 2.15ff_ a_md3 0.949 1.08% 1.08% 1.22% 1.22% 4.73 4.55 4.43 4.42 97.14% 109.62 109.62 102.59 102.59 0.63 0.40 1.91pci_bridge32_a_md1 0.454 3.61% 3.38% 3.00% 2.95% 6.01 5.64 5.83 5.76 102.13% 72.48 63.76 63.76 63.76 0.61 2.29 2.01pci_bridge32_a_md2 0.565 8.33% 4.38% 3.68% 3.62% 9.43 7.14 7.55 7.45 104.34% 186.08 121.35 121.35 121.35 0.53 3.34 3.76pc_ bridge32_b_md1 0.660 2.55% 2.26% 2.13% 2.11% 6.35 6.01 5.79 5.72 95.17% 322.71 332.71 313.99 313.99 0.52 0.70 2.41pci_bridge32_b_md2 0.574 2.80% 2.53% 2.57% 2.57% 5.92 5.53 5.43 5.42 98.01% 640.12 430.04 430.04 430.04 0.50 0.66 1.89pci_bridge32_b_md3 0.583 3.63% 3.17% 3.14% 3.13% 6.74 6.10 6.13 6.12 100.33% 398.57 398.57 398.58 398.58 0.51 1.58 2.21average 4.13% 2.83% 2.46% 2.44% 6.85 5.91 5.93 5.88 99.27% 260.90 219.12 216.57 216.64 2.30 3.98 5.06

Table 2: Comparison between the squared cell movementresulting from the legalization algorithm described inTCAD’13 and our algorithm.

Instance GP HPWL (m) Cells Squared Cell MovementSingle Double achieved by our algorithm to the results obtained by [5] and thestate-of-the-art paper [18] as reported in [18] as well as the legal-ization approach from [1]. Table 1 displays the relative increase( Δ HPWL) of the half-perimeter wire length after Global Placement(GP HPWL), the average l1 cell movement (measured in horizontalplacement sites), the maximum l1 cell movement (again measuredin placement sites) and the runtime in CPU seconds for the algo-rithms in [5](DAC’17), [18](ISPD’19) and [1](TCAD ’13) and thealgorithm suggested in this paper (Ours). Concerning the averagecell movement, which we are mainly interested in for this compari-son, the column labeled ”Ours/ISPD’19” contains the percentagesthe average cell movement obtained by "Ours" constitutes of theaverage cell movement reported by ISPD’19 [18]. The final rowlabeled ”average” displays the average of all prior values in therespective column. In particular, the respective entry in the column”Ours/ISPD’19” refers to the average of the above percentages. Onecan see that on average, our proposed algorithm achieves com-parable results to the algorithm in [18], which in turn producesconsiderably better results than [5] when it comes to average cellmovement. However, the deviation between the different instances is relatively high: While there are some on which our algorithmsignificantly outperforms the method from [18] (including thosewhere no cells of triple- and quadruple-row height are present),the converse is true for several other test cases. One possible ex-planation for this might be the fact that the greedy legalizationof cells of more than double-row height only works well if theyare sufficiently spaced out in the Global Placement solution, whichis true for only some of the given benchmarks. When it comes torunning time, maximum movement, and increase in HPWL, ouralgorithm can be seen to yield comparable or even better results.In our second experiment, we compare the total quadratic cellmovement achieved by the algorithm described in [1] to minimizesquared cell movement and our new method. As the number ofdouble-row cells on the ICCAD-2017 CAD Contest benchmarks[8] is rather small, we employ a set of benchmarks generated bythe authors of [7] by modifying instances from the ISPD 2015 De-tailed Routing-Driven Placement Contest [4]. While these are moresuitable for the primary application of our algorithm, we decidedagainst using them for a comparison to other legalizers since theyare not publicly available and the parsing process appears to bemore error-prone due to a non-standard format. For completeness,we nevertheless state that our experiments revealed an average cellmovement better than the one obtained by [7], [5] and [23], butworse than what is claimed in [14] (at the cost of a considerablyhigher runtime) and [17].The results of our second experiment can be read from Table 2,which displays the squared cell movement achieved by the algo-rithm described in TCAD’13 [1] and the algorithm proposed inthis paper. The first column contains the instance name, while thecolumns labeled "Single" and "Double" display the number of cellsof single- respectively double-row height present on the given testcase, whereby the fraction the number of double-row cells consti-tutes of the total number of cells can be found in the following col-umn labeled " igure 2: superblue12Figure 3: matrix_mult_1 after our algorithm. Blue lines in-dicate movement w.r.t. the output of TCAD’13. cells of double-row height, improvements achieved by the applica-tion of the Double Row Algorithm are quite significant, which canbe explained by the fact that even a single double-row cell beingfixed in position may lead to the displacement of huge blocks ofconsecutive cells of single-row height in densely packed regions(see Figure 3). On the other hand, if many of the cells of double-rowheight do not interfere with those of single-row height at all in thatthere is sufficient horizontal whitespace around them, comparablysmall improvements are obtained despite a considerable numberof cells of double-row height present (see Figure 2). However, asthe legalization task becomes more difficult in those cases wherethe Global Placement packs the cells relatively dense locally, theDouble Row Algorithm can be considered a worthwhile extensionof the considered legalization framework.

In this paper, we have presented a fast algorithm to minimize qua-dratic (or linear) cell displacement for pairs of cell rows comprisingcells of both single- and double-row height with predefined targetlocations and a fixed left-to-right ordering. Even though the sur-rounding legalization framework is designed to optimize squaredinstead of linear cell displacement, our results are competitive whencompared to state-of-the-art works on mixed-cell-height legaliza-tion. Moreover, experimental results comparing the squared celldisplacement when fixing all cells of double-row height and whenemploying the Double Row Algorithm, respectively, clearly speakin favor of its effectiveness.

REFERENCES [1] U. Brenner. 2013. BonnPlace Legalization: Minimizing Movement by IterativeAugmentation.

TCAD

32, 8 (2013), 1215–1227.[2] U. Brenner and J. Vygen. 2000. Faster Optimal Single-Row Placement with FixedOrdering. In

Proceedings Design, Automation and Test in Europe . 117–121.[3] U. Brenner and J. Vygen. 2004. Legalizing a Placement with Minimum TotalMovement.

TCAD

23, 12 (2004), 1597–1613.[4] I. Bustany, D. Chinnery, J. Shinnerl, and V. Yutsis. 2015. ISPD 2015 benchmarkswith fence regions and routing blockages for detailed-routing-driven placement.In

Proceedings of the ISPD . 157–164.[5] J. Chen, Z. Zhu, W. Zhu, and Y. Chang. 2017. Toward Optimal Legalization forMixed-Cell-Height Circuit Designs. In . 6.[6] Y. Cheng, D. Huang, W. Mak, and T. Wang. 2018. A Practical Detailed PlacementAlgorithm under Multi-Cell Spacing Constraints. In

Proceedings of the ICCAD . 8.[7] W. Chow, C. Pui, and E. Young. 2016. Legalization Algorithm for Multiple-RowHeight Standard Cell Design. In . 1–6. [8] N. Darav, I. Bustany, A. Kennings, and R. Mamidi. 2017. ICCAD-2017 CADContest in Multi-Deck Standard Cell Legalization and Benchmarks. In

ICCAD .867–871.[9] M. Garey and D. Johnson. 1978. “Strong” NP-Completeness Results: Motivation,Examples, and Implications.

J. ACM

25, 3 (1978), 499–508.[10] M. Garey, R. Tarjan, and G. Wilfong. 1988. One-Processor Scheduling withSymmetric Earliness and Tardiness Penalties.

Mathematics of Operations Research

13, 2 (1988), 330–348.[11] C. Han, K. Han, A. Kahng, H. Lee, L. Wang, and B. Xu. 2017. Optimal Multi-RowDetailed Placement for Yield and Model-Hardware Correlation Improvements inSub-10nm VLSI. In

ICCAD . 667–674.[12] C. Han, A. Kahng, L. Wang, and B. Xu. 2019. Enhanced Optimal Multi-RowDetailed Placement for Neighbor Diffusion Effect Mitigation in Sub-10 nm VLSI.

TCAD

38, 9 (2019), 1703–1716.[13] D. Hill. 2002. Method and system for high speed detailed placement of cellswithin an integrated circuit design. U.S. Patent 6370673.[14] C. Hung, P. Chou, and W. Mak. 2017. Mixed-Cell-Height Standard Cell PlacementLegalization. In

Proceedings of the Great Lakes Symposium on VLSI . 149–154.[15] A. Kahng, P. Tucker, and A. Zelikovsky. 1999. Optimization of Linear Placementsfor Wirelength Minimization with Free Sites. In

Proceedings of the Asia and SouthPacific Design Automation Conference . 241–244.[16] B. Korte, D. Rautenbach, and J. Vygen. 2007. BonnTools: Mathematical Innovationfor Layout and Timing Closure of Systems on a Chip.

Proc. IEEE

95 (2007), 555–572.[17] H. Li, W. Chow, G. Chen, E. Young, and B. Yu. 2018. Routability-Driven andFence-Aware Legalization for Mixed-Cell-Height Circuits. In . 1–6.[18] X. Li, J. Chen, W. Zhu, and Y. Chang. 2019. Analytical Mixed-Cell-Height Legaliza-tion Considering Average and Maximum Movement Minimization. In

Proceedingsof the ISPD . 27–34.[19] Y. Lin, B. Yu, X. Xu, J. Gao, N. Viswanathan, W. Liu, Z. Li, C. Alpert, and D. Pan.2016. MrDP: Multiple-row Detailed Placement of Heterogeneous-sized Cells forAdvanced Nodes. In

ICCAD . 1–8.[20] P. Spindler, U. Schlichtmann, and F. Johannes. 2008. Abacus: Fast Legalizationof Standard Cell Circuits with Minimal Movement. In

Proceedings of the ISPD .47–53.[21] U. Suhl. 2010.

Row-Placement in VLSI Design: The Clumping Algorithm and ageneralization . diploma thesis. University of Bonn, Research Institute for DiscreteMathematics.[22] R. Tarjan. 1983.

Data Structures and Network Algorithms . SIAM.[23] C. Wang, Y. Wu, J. Chen, Y. Chang, S. Kuo, W. Zhu, and G. Fan. 2017. An EffectiveLegalization Algorithm for Mixed-Cell-Height Standard Cells. In . 450–455.[24] G. Wu and C. Chu. 2015. Detailed Placement Algorithm for VLSI Design withDouble-Row Height Standard Cells.

TCAD

35 (2015), 1569–1573.[25] Z. Zhu, X. Li, Y. Chen, J. Chen, W. Zhu, and Y. Chang. 2018. Mixed-Cell-HeightLegalization Considering Technology and Region Constraints. In