Is this you? Create Your Porfile

Kwangsoo Han

University of California, San Diego

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Kwangsoo Han is active.

Explore More

Publication

Featured researches published by Kwangsoo Han.

design automation conference | 2015

A global-local optimization framework for simultaneous multi-mode multi-corner clock skew variation reduction

Kwangsoo Han; Andrew B. Kahng; Jong-Pil Lee; Jiajia Li; Siddhartha Nath

As combinations of signoff corners grow in modern SoCs, minimization of clock skew variation across corners is important. Large skew variation can cause difficulties in multi-corner timing closure because fixing violations at one corner can lead to violations at other corners. Such “ping-pong” effects lead to significant power and area overheads and time to signoff. We propose a novel framework encompassing both global and local clock network optimizations to minimize the sum of skew variations across different PVT corners between all sequentially adjacent sink pairs. The global optimization uses linear programming to guide buffer insertion, buffer removal and routing detours. The local optimization is based on machine learning-based predictors of latency change; these are used for iterative optimization with tree surgery, buffer sizing and buffer displacement operators. Our optimization achieves up to 22% total skew variation reduction across multiple testcases implemented in foundry 28nm technology, as compared to a best-practices CTS solution using a leading commercial tool.

international conference on computer aided design | 2015

Scalable Detailed Placement Legalization for Complex Sub-14nm Constraints

Kwangsoo Han; Andrew B. Kahng; Hyein Lee

Technology scaling to 10nm and below introduces complex intra-row and inter-row constraints in standard-cell detailed placement. Examples of such constraints are found in rules for drain-drain abutment, minimum implant region area and width, oxide diffusion (OD) notching and jogging, etc. Typically, these rules are too complex for the normal global-detailed placement flow to fully consider. On the other hand, guardbanding the library cell design so that arbitrary cell placement adjacencies are all “correct by construction” has increasingly high area cost. This motivates the introduction of a final legalization phase for standard-cell placement tools in advanced (particularly 10nm and 7nm) foundry nodes. In this work, we develop a mixed integer-linear programming (MILP)-based placer, called DFPlacer, for final-phase design rule violation (DRV) fixing. DFPlacer finds (near-)DRV-free solutions considering various complex layout constraints including minimum implant width, drain-drain abutment, and oxide diffusion jogs. To overcome the runtime limitation of MILP-based approaches, we implement a distributable optimization strategy based on partitioning of the block layout into windows of cells that can be independently legalized. Using layouts in an abstracted 7nm library, we find that DFPlacer fixes 99% of DRVs on average with minimal impacts on area and timing. We also study an area-DRV tradeoff between two types of standard-cell library strategies, namely, with and without dummy poly gates.

great lakes symposium on vlsi | 2014

OCV-aware top-level clock tree optimization

Tuck-Boon Chan; Kwangsoo Han; Andrew B. Kahng; Jae-gon Lee; Siddhartha Nath

The clock trees of high-performance synchronous circuits have many clock logic cells (e.g., clock gating cells, multiplexers and dividers) in order to achieve aggressive clock gating and required performance across a wide range of operating modes and conditions. As a result, clock tree structures have become very complex and difficult to optimize with automatic clock tree synthesis (CTS) tools. In advanced process nodes, CTS becomes even more challenging due to on-chip variation (OCV) effects. In this paper, we present a new CTS methodology that optimizes clock logic cell placements and buffer insertions in the top level of a clock tree. We formulate the top-level clock tree optimization problem as a linear program that minimizes a weighted sum of timing slacks, clock uncertainty and wirelength. Experimental results in a commercial 28nm FDSOI technology show that our method can improve post-CTS worst negative slack across all modes/corners by up to 320ps compared to a leading commercial providers CTS flow.

international conference on computer aided design | 2014

Benchmarking of mask fracturing heuristics

Tuck Boon Chan; Puneet Gupta; Kwangsoo Han; Abde Ali Kagalwalla; Andrew B. Kahng; Emile Sahouria

Aggressive resolution enhancement techniques such as inverse lithography (ILT) often lead to complex, non-rectilinear mask shapes which make mask writing extremely slow and expensive. To reduce shot count of complex mask shapes, mask writers allow overlapping shots, due to which the problem of fracturing mask shapes with minimum shot count is NP-hard. The need to correct for e-beam proximity effect makes mask fracturing even more challenging. Although a number of fracturing heuristics have been proposed, there has been no systematic study to analyze the quality of their solutions. In this work, we propose a new method to generate benchmarks with known optimal solutions that can be used to evaluate the suboptimality of mask fracturing heuristics. We also propose a method to generate tight upper and lower bounds for actual ILT mask shapes by formulating mask fracturing as an integer linear program and solving it using branch and price. Our results show that a state-of-the-art prototype [version of] capability within a commercial EDA tool for e-beam mask shot decomposition can be suboptimal by as much as 3.7× for generated benchmarks, and by as much as 3.6× for actual ILT shapes.

design automation conference | 2017

Vertical M1 Routing-Aware Detailed Placement for Congestion and Wirelength Reduction in Sub-10nm Nodes

Peter Debacker; Kwangsoo Han; Andrew B. Kahng; Hyein Lee; Praveen Raghavan; Lutong Wang

Aggressive pitch scaling in sub-10nm nodes has introduced complex design rules which make routing extremely challenging. Cell architectures have also been changed to meet the design rules. For example, metal layers below M1 are used to gain additional routing resources. New cell architectures wherein inter-row M1 routing is allowed force consideration of vertical alignment of cells. In this work, we propose a mixed-integer linear programming (MILP)-based, detailed placement optimization to maximize direct vertical M1 routing utilization for congestion and wirelength reduction.

international symposium on physical design | 2018

Prim-Dijkstra Revisited: Achieving Superior Timing-driven Routing Trees

Charles J. Alpert; Wing-Kai Chow; Kwangsoo Han; Andrew B. Kahng; Zhuo Li; Derong Liu; Sriram Venkatesh

The Prim-Dijkstra (PD ) construction [1] was first presented over 20 years ago as a way to efficiently trade off between shortest-path and minimum-wirelength routing trees. This approach has stood the test of time, having been integrated into leading semiconductor design methodologies and electronic design automation tools. PD optimizes the conflicting objectives of wirelength (WL) and source-sink pathlength (PL) by blending the classic Prim and Dijkstra spanning tree algorithms. However, as this work shows, PD can sometimes demonstrate significant suboptimality for both WL and PL. This quality degradation can be especially costly for advanced nodes because (i) wire delays form a much larger component of total stage delay, i.e., timing-driven routing is critical, and (ii) modern designs are severely power-constrained (e.g., mobile, IoT), which makes low-capacitance wiring important. Consequently, achieving a good timing and power tradeoff for routing is required to build a market-leading product[2]. This work introduces a new problem formulation that incorporates the total detour cost in the objective function to optimize the detour to every sink in the tree, not just the worst detour. We then propose a new PD-II construction which directly improves upon the original PD construction by repairing the tree to simultaneously reduce both WL and PL. The PD-II approach achieves improvement for both objectives, making it a clear win over PD, for virtually zero additional runtime cost. PD-II is a spanning tree algorithm (which is useful for seeding global routing); however, since Steiner trees are needed for timing estimation, this work also includes a post-processing algorithm called DAS to convert PD-II trees into balanced Steiner trees. Experimental results demonstrate that this construction outperforms the recent state-of-the-art academic tool, SALT [36], for high-fanout nets, achieving up to 36.46% PL improvement with similar WL on average for 20K nets of size ≥ 32 terminals from DAC 2012 contest benchmark designs [37].

asia and south pacific design automation conference | 2016

Delay uncertainty and signal criticality driven routing channel optimization for advanced DRAM products

Sam-Young Bang; Kwangsoo Han; Andrew B. Kahng; Mulong Luo

Signal delay uncertainty induced by crosstalk is a critical challenge to the physical design of long interconnect channels in DRAM products at the 2× and 1× technology nodes. Due to severe cost challenges in a high-volume, commodity market, layout resources including channel width, buffers, and number of metal routing layers are extremely scarce. We describe a new channel optimizer that reduces crosstalk-induced delay uncertainty, weighted by signal criticality and aware of signal activity correlations (e.g., to reduce delay uncertainty by mutual shielding). Instead of the typical signal net permutation strategy, we apply (pessimistic) timing-driven swizzling to minimize the delay uncertainty cost function. Contributions of this work include (1) an accurate and efficient analytical crosstalk delay calculator, (2) scalability up to hundreds of signals and tracks in the routing channel through use of greedy and decomposition strategies as well as a pair-swapping approach, and (3) experimental studies that demonstrate up to 24% reduction of the worst-case criticality-weighted delay uncertainty (or, 34ps of absolute delay uncertainty reduction) compared with the typical signal permutation approach.

system level interconnect prediction | 2015

Clock clustering and IO optimization for 3D integration

Sam-Young Bang; Kwangsoo Han; Andrew B. Kahng; Vaishnav Srinivas

3D interconnect between two dies can span a wide range of bandwidths and region areas, depending on the application, partitioning of the dies, die size, and floorplan. We explore the concept of dividing such an interconnect into local clusters, each with a cluster clock. We combine such clustering with a choice of three clock synchronization schemes (synchronous, source-synchronous, asynchronous) and study impacts on power, area and timing of the clock tree, data path and 3DIO. We build a model for the power, area and timing as a function of key system requirements and constraints: total bandwidth, region area, number of clusters, clock synchronization scheme, and 3DIO frequency. Such a model enables architects to perform pathfinding exploration of clocking and IO power, area and bandwidth optimization for 3D integration.

system level interconnect prediction | 2018

A study of optimal cost-skew tradeoff and remaining suboptimality in interconnect tree constructions

Kwangsoo Han; Andrew B. Kahng; Christopher Moyes; Alexander Zelikovsky

Cost and skew are among the most fundamental objectives for interconnect tree synthesis. The cost-skew tradeoff is particularly important in buffered clock tree construction, where clock subnets are an important “sweet spot” for balancing on-chip variation-aware analysis, skew, power and other factors. In advanced nodes, where both performance and power are critical to IC products, there is a renewed challenge of minimizing wirelength while controlling skew. In this work, we formulate the minimum-cost bounded skew spanning and Steiner tree problems as flow-based integer linear programs, and give the first-ever study of optimal cost-skew tradeoffs. We also assess heuristics (notably, Bounded-Skew DME (BST-DME), Steiner shallow-light tree (SALT), and Prim-Dijkstra (PD)) that are currently available for trading off cost and skew. Experimental results demonstrate that BST-DME has suboptimality ~ 10% in cost at iso-skew and ~ 50% in skew at iso-cost. In addition, SALT and PD shows suboptimality in terms of skew by up to ~3×.

international symposium on quality electronic design | 2017

Performance- and energy-aware optimization of BEOL interconnect stack geometry in advanced technology nodes

Kwangsoo Han; Andrew B. Kahng; Hyein Lee; Lutong Wang

In advanced technology nodes, BEOL interconnect stack geometry has become a key lever for design enablement. The rapid increase of interconnect RC leads to not only performance loss from interconnect delay increase, but circuit power and area degradation as well. Thus, optimization of BEOL dimensions (i.e., wire width, spacing and thickness subject to a given layers pitch constraint) is crucial to achieve better product performance, power and area. However, it is not obvious how to optimize BEOL dimensions, especially in sub-10nm nodes. In this work, we study BEOL interconnect stack geometry by exploring wire aspect ratio (AR) and wire line-space duty cycle (DC). We perform SPICE-based analyses of timing path delays to find delay- or power-optimal (AR,DC) combinations, and also perform block-level studies with placed and routed designs. Based on our experimental results, we provide various insights on BEOL stack geometry: (i) optimal (AR,DC) for a given wire pitch with respect to power and delay; (ii) sensitivities of optimal (AR,DC) to circuit parameters (e.g., driver strength, input slew, output load, wirelength); (iii) optimal (AR,DC) when multiple interconnect layers are considered; and (iv) potential impacts of BEOL stack optimizations within future design-aware manufacturing and/or manufacturing-aware design methodologies.

Explore More