Junhyung Um | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Junhyung Um is active.

Explore More

Publication

Featured researches published by Junhyung Um.

IEEE Transactions on Computers | 2001

An optimal allocation of carry-save-adders in arithmetic circuits

Junhyung Um; Taewhan Kim

Carry-save-adder (CSA) is one of the most widely used components for fast arithmetic in industry. This paper provides a solution to the problem of finding an optimal-timing allocation of CSAs in arithmetic circuits. Namely, we present a polynomial time algorithm which finds an optimal-timing CSA allocation for a given arithmetic expression. We then extend our result for CSA allocation to the problem of optimizing arithmetic expressions across the boundary of design hierarchy by introducing a new concept, called auxiliary ports. Our algorithm can be used to carry out the CSA allocation step optimally and automatically and this can be done within the context of a standard RTL synthesis environment.

international conference on computer aided design | 1999

Optimal allocation of carry-save-adders in arithmetic optimization

Junhyung Um; Taewhan Kim; C. L. Liu

Carry-save-adder(CSA) is one of the most widely used schemes for fast arithmetic in industry. This paper provides a solution to the problem of finding an optimal-timing allocation of CSAs. Specifically, we present a polynomial time algorithm which finds an optimal-timing CSA allocation for a given arithmetic expression. In addition, we extend our result for CSA allocation to the problem of optimizing arithmetic expressions across the boundary of design hierarchy by introducing a new concept, called auxiliary ports. Our algorithm can be used to carry out the CSA allocation step optimally and automatically, and this can be done within the context of a standard HDL synthesis environment.

design, automation, and test in europe | 2009

In-network reorder buffer to improve overall NoC performance while resolving the in-order requirement problem

Woo-Cheol Kwon; Sungjoo Yoo; Junhyung Um; Seh-Woong Jeong

Data-intensive functions on chip, e.g., codec, 3D graphics, pixel processing, etc. need to make best use of the increased bandwidth of multiple memories enabled by 3D die stacking via accessing multiple memories in parallel. Parallel memory accesses with originally in-order requirements necessitate reorder buffers to avoid deadlock. Reorder buffers are expensive in terms of area and power consumption. In addition, conventional reorder buffers suffer from a problem of low resource utilization. In our work, we present a novel idea, called in-network reorder buffer, to increase the utilization of reorder buffer resource. In our method, we move the reorder buffer resource and related functions from network entry/exit points to network routers. Thus, the in-network reorder buffers can be better utilized in two ways. First, they can be utilized by other packets without in-order requirements while there are no in-order packets. Second, even in-order packets can benefit from in-network reorder buffers by enjoying more shares of reorder buffers than before. Such an increase in reorder buffer utilization enables NoC performance improvement while supporting the original in-order requirements. Experimental results with an industrial strength DTV SoC example show that the presented idea improves the total execution cycle by 16.9%.

international conference on computer aided design | 2002

Layout-driven resource sharing in high-level synthesis

Junhyung Um; Jae-Hoon Kim; Taewhan Kim

In deep submicron (DSM) technology, the interconnects are equally as or more important than the logic gates. In particular, to achieve timing closure in DSM technology, it is very necessary and critical to consider the interconnect delay at an early stage of the synthesis process. It has been known that resource sharing in high-level synthesis is one of the major synthesis tasks which greatly affect the final synthesis/layout results. In this paper, we propose a new layout-driven resource sharing approach to overcome some of the limitations of the previous works in which the effects of layout on the synthesis have never been taken into account or considered in local and limited ways, or whose computation time is excessively large. The proposed approach consists of two steps: (Step 1) We relax the integrated resource sharing and placement into an efficient linear programming (LP) formulation based on the concept of discretizing placement space; (Step 2) We derive a feasible solution from the solution obtained in Step 1. Then, we employ an iterative mechanism based on the two steps to tightly integrate resource sharing and placement tasks so that the slack time violation due to interconnect delay (determined by placement) as well as logic delay (determined by resource sharing) should be minimized. From experiments using a set of benchmark designs, it is shown that the approach is effective, and efficient, completely removing the slack time violation produced by conventional methods.

design automation conference | 2002

Layout-aware synthesis of arithmetic circuits

Junhyung Um; Taewhan Kim

In deep sub-micron (DSM) technology, wires are equally or more important than logic components since wire-related problems such as crosstalk, noise are much critical in system-on-chip (SoC) design. Recently, a method [12] for generating a partial product reduction tree (PPRT) with optimal-timing using bit-level adders to implement arithmetic circuits, which outperforms the current best designs, is proposed. However, in the conventional approaches including [12], interconnects are not primary components to be optimized in the synthesis of arithmetic circuits, mainly due to its high integration complexity or unpredictable wire effects, thereby resulting in unsatisfactory layout results with long and messed wire connections. To overcome the limitation, we propose a new module generation/synthesis algorithm for arithmetic circuits utilizing carry-save-adder (CSA) modules, which not only optimizes the circuit timing but also generates a much regular interconnect topology of the final circuits. Specifically, we propose a two-step algorithm: (Phase 1: CSA module generation) we propose an optimal-timing CSA module generation algorithm for an arithmetic expression under a general CSA timing model; (Phase 2: Bit-level interconnect refinements) we optimally refine the interconnects between the CSA modules while retaining the global CSA-tree structure produced by Phase 1. It is shown that the timing of the circuits produced by our approach is equal or almost close to that by [12] in most testcases (even without including the interconnect delay), and at the same time, the interconnects in layout are significantly short and regular.

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems | 2000

A practical approach to the synthesis of arithmetic circuits using carry-save-adders

Taewhan Kim; Junhyung Um

Carry-save-adder (CSA) is one of the most widely used types of operation in implementing a fast computation of arithmetics. An inherent limitation of the conventional CSA applications is that the applications are confined to the sections of arithmetic circuit that can be directly translated into addition expressions. To overcome this limitation, from the analysis of the structures of arithmetic circuits found in industry, we derive a set of simple, but effective CSA transformation techniques other than the existing ones. These are 1) optimization across multiplexers, 2) optimization across design boundaries, and 3) optimization across multiplications. Based on the techniques, we develop a new timing-driven CSA transformation algorithm that is able to utilize CSAs extensively throughout all circuits. Experimental data for practical testcases are provided to show the effectiveness of our algorithm.

design automation conference | 2000

A fine-grained arithmetic optimization technique for high-performance flow-power data path synthesis

Junhyung Um; Thewhan Kim; C. L. Liu

Wallace-tree compressor style has been widely recognized as one of the most effective implementation schemes for arithmetic computation sin VLSI design. However, the scheme has been applied only in a rather restrictive way, that is, for implementing fast multipliers and for generating fixed structures without considering the characteristic of the input signals. The contributions of our work are (1) to extend the applicability of the Wallace scheme to any arithmetic circuit which consists of additions/substractions/multiplications globally (instead of applying it to each operation) to produce a globally efficient architecture of the circuit; (2) to optimize the timing of the circuit for uneven signal arrival profiles; (Specifically, we present an efficient algorithm for generating a delay-optimal (bit-level) carry-save addition structure of an arithmetic circuit.) (3) to provide a comprehensive analysis of the switching activity of a (bit-level) carry-save addition structure, and based on which we derive an effective algorithm for synthesizing low power circuits. Putting these arithmetic optimization solutions together, a circuit designer will be able to fully understand the synthesis of arithmetic circuit based on the bit-level carry-save addition.

design, automation, and test in europe | 2006

A systematic IP and bus subsystem modeling for platform-based system design

Junhyung Um; Woo-Cheol Kwon; Sungpack Hong; Young-Taek Kim; Kyu-Myung Choi; Jeong-Taek Kong; Soo-Kwan Eo; Taewhan Kim

The topic on platform-based system modeling has received a great deal of attention today. One of the important tasks that significantly affect the effectiveness and efficiency of the system modeling is the modeling of IP components and communication between IPs. To be effective, it is generally accepted that the system modeling should be performed in two steps; In the first step, a fast but some inaccurate system modeling is considered to facilitate the simultaneous development of software and hardware. The second step then refines the models of the software and hardware blocks (i.e., IPs) to increase the simulation accuracy for the system performance analysis. Here, one critical factor required for a successful system modeling is a systematic modeling of the IP blocks and bus subsystem connecting the IPs. In this respect, this work addresses the problem of systematic modeling of the IPs and bus subsystem in different levels of refinements. In the experiments, we found that by applying our proposed IP and bus modeling methods to the MPEG-4 application, we are able to achieve 4times performance improvement and at the same time, reduce the software development time by 35%, compared to that by conventional modeling methods

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems | 2003

Synthesis of arithmetic circuits considering layout effects

Junhyung Um; Taewhan Kim

In deep submicron technology, wires are equally or more important than logic components since wire-related problems such as crosstalk noise is much critical in system-on-chip design. Recently, a method for generating a partial product reduction tree with optimal-timing using bit-level adders to implement arithmetic circuits has been proposed, which outperforms the current best designs. However, in the conventional approaches, interconnects are not primary components to be optimized in the synthesis of arithmetic circuits, mainly due to its integration complexity or unpredictable wire effects, thereby resulting in unsatisfactory layout results with long and messy wire connections. To overcome the limitation, we propose a new module generation/synthesis algorithm for arithmetic circuits utilizing carry-save-adder (CSA) modules, which not only optimizes the circuit timing but also generates a much more regular interconnect topology of the final circuits. Specifically, we propose a two-step algorithm: (Phase 1: CSA module generation) we propose an optimal-timing CSA module generation algorithm for an arithmetic expression under a general CSA timing model; then (Phase 2: Bit-level interconnect refinements), we optimally refine the interconnects between the CSA modules while retaining the global CSA-tree structure produced by Phase 1. We show that the timing of the circuits produced by our approach is equal or almost close to that in most test cases (even without including the interconnect delay), and at the same time, the interconnects in layout are short and regular.

signal processing systems | 2006

Resource Sharing Combined with Layout Effects in High-Level Synthesis

Junhyung Um; Taewhan Kim

AbstractIn deep-submicron designs, the interconnects are equally as or more important than the logic gates. In particular, to achieve timing closure, it is necessary and critical to consider the interconnect delay at an early stage of the synthesis process. It has been known that resource sharing in high-level synthesis is one of the major synthesis tasks which greatly affect the final synthesis/layout results. In this paper, we propose a new layout-aware resource sharing approach to overcome some of the limitations of the previous works in which the effects of layout on the synthesis have never been taken into account or considered in local and limited ways, or whose computation time is excessively large. The proposed approach consists of two steps: (Step 1) We relax the integrated resource sharing and placement into an efficient linear programming (LP) formulation based on the concept of

Explore More