Kiyoung Choi | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Kiyoung Choi is active.

Explore More

Publication

Featured researches published by Kiyoung Choi.

design automation conference | 1999

Power conscious fixed priority scheduling for hard real-time systems

Youngsoo Shin; Kiyoung Choi

Power efficient design of real-time systems based on programmable processors becomes more important as system functionality is increasingly realized through software. This paper presents a power efficient version of a widely used fixed priority scheduling method. The method yields a power reduction by exploiting slack times, both those inherent in the system schedule and those arising from variations of execution times. The proposed run-time mechanism is simple enough to be implemented in most kernels. Experimental results show that the proposed scheduling method obtains a significant power reduction across several kinds of applications.

international conference on computer aided design | 2000

Power optimization of real-time embedded systems on variable speed processors

Youngsoo Shin; Kiyoung Choi; Takayasu Sakurai

Power efficient design of real-time embedded systems based on programmable processors becomes more important as system functionality is increasingly realized through software. This paper presents a power optimization method for real-time embedded applications on a variable speed processor. The method combines off-line and on-line components. The off-line component determines the lowest possible maximum processor speed while guaranteeing deadlines of all tasks. The on-line component dynamically varies the processor speed or brings a processor into a power-down mode according to the status of task set in order to exploit execution time variations and idle intervals. Experimental results show that the proposed method obtains a significant power reduction across several kinds of applications.

international symposium on computer architecture | 2015

A scalable processing-in-memory accelerator for parallel graph processing

Junwhan Ahn; Sungpack Hong; Sungjoo Yoo; Onur Mutlu; Kiyoung Choi

The explosion of digital data and the ever-growing need for fast data analysis have made in-memory big-data processing in computer systems increasingly important. In particular, large-scale graph processing is gaining attention due to its broad applicability from social science to machine learning. However, scalable hardware design that can efficiently process large graphs in main memory is still an open problem. Ideally, cost-effective and scalable graph processing systems can be realized by building a system whose performance increases proportionally with the sizes of graphs that can be stored in the system, which is extremely challenging in conventional systems due to severe memory bandwidth limitations. In this work, we argue that the conventional concept of processing-in-memory (PIM) can be a viable solution to achieve such an objective. The key modern enabler for PIM is the recent advancement of the 3D integration technology that facilitates stacking logic and memory dies in a single package, which was not available when the PIM concept was originally examined. In order to take advantage of such a new technology to enable memory-capacity-proportional performance, we design a programmable PIM accelerator for large-scale graph processing called Tesseract. Tesseract is composed of (1) a new hardware architecture that fully utilizes the available memory bandwidth, (2) an efficient method of communication between different memory partitions, and (3) a programming interface that reflects and exploits the unique hardware design. It also includes two hardware prefetchers specialized for memory access patterns of graph processing, which operate based on the hints provided by our programming model. Our comprehensive evaluations using five state-of-the-art graph processing workloads with large real-world graphs show that the proposed architecture improves average system performance by a factor of ten and achieves 87% average energy reduction over conventional systems.

international symposium on computer architecture | 2015

PIM-enabled instructions: a low-overhead, locality-aware processing-in-memory architecture

Junwhan Ahn; Sungjoo Yoo; Onur Mutlu; Kiyoung Choi

Processing-in-memory (PIM) is rapidly rising as a viable solution for the memory wall crisis, rebounding from its unsuccessful attempts in 1990s due to practicality concerns, which are alleviated with recent advances in 3D stacking technologies. However, it is still challenging to integrate the PIM architectures with existing systems in a seamless manner due to two common characteristics: unconventional programming models for in-memory computation units and lack of ability to utilize large on-chip caches. In this paper, we propose a new PIM architecture that (I) does not change the existing sequential programming models and (2) automatically decides whether to execute PIM operations in memory or processors depending on the locality of data. The key idea is to implement simple in-memory computation using compute-capable memory commands and use specialized instructions, which we call PIM-enabled instructions, to invoke in-memory computation. This allows PIM operations to be interoperable with existing programming models, cache coherence protocols, and virtual memory mechanisms with no modification. In addition, we introduce a simple hardware structure that monitors the locality of data accessed by a PIM-enabled instruction at runtime to adaptively execute the instruction at the host processor (instead of in memory) when the instruction can benefit from large on-chip caches. Consequently, our architecture provides the illusion that PIM operations are executed as if they were host processor instructions. We provide a case study of how ten emerging data-intensive workloads can benefit from our new PIM abstraction and its hardware implementation. Evaluations show that our architecture significantly improves system performance and, more importantly, combines the best parts of conventional and PlM architectures by adapting to data locality of applications.

IEEE Design & Test of Computers | 2003

Compilation approach for coarse-grained reconfigurable architectures

Jongeun Lee; Kiyoung Choi; Nikil D. Dutt

Coarse-grained reconfigurable architectures can enhance the performance of critical loops and computation-intensive functions. Such architectures need efficient compilation techniques to map algorithms onto customized architectural configurations. A new compilation approach uses a generic reconfigurable architecture to tackle the memory bottleneck that typically limits the performance of many applications.

IEEE Transactions on Very Large Scale Integration Systems | 2001

Partial bus-invert coding for power optimization of application-specific systems

Youngsoo Shin; Soo-Ik Chae; Kiyoung Choi

This paper presents two bus coding schemes for power optimization of application-specific systems: partial pus-invert coding and its extension to multiway partial bus-invert coding. In the first scheme, only a selected subgroup of bus lines is encoded to avoid unnecessary inversion of relatively inactive and/or uncorrelated bus lines which are not included in the subgroup. In the extended scheme, we partition a bus into multiple subbuses by clustering highly correlated bus lines and then encode each subbus independently. We describe a heuristic algorithm of partitioning a bus into subbuses for each encoding scheme. Experimental results for various examples indicate that both encoding schemes are highly efficient for application-specific systems.

international symposium on low power electronics and design | 2000

Power minimization of functional units by partially guarded computation

Junghwan Choi; Jinhwan Jeon; Kiyoung Choi

This paper deals with power minimization problem for data-dominated applications based on a novel concept called partially guarded computation. We divide a functional unit into two parts - MSP (Most Significant Part) and LSP (Least Significant Part) - and allow the functional unit to perform only the LSP computation if the range of output data can be covered by LSP. We dynamically disable MSP computation to remove unnecessary transitions thereby reducing power consumption. We also propose a systematic approach for determining optimal location of the boundary between the two parts during high-level synthesis. Experimental results show about 10~44% power reduction with about 30~36% area overhead and less than 3% delay overhead in functional units.

international symposium on low power electronics and design | 1998

Partial bus-invert coding for power optimization of system level bus

Youngsoo Shin; Soo-Ik Chae; Kiyoung Choi

We presen t a partial bus-in vertcoding scheme for po wer optim ization of system level bus. In the proposed sch eme, we select a su b-group of bus lines involved in b us encoding to a void unnecessary inversion of b us lines not in the sub-group thereby redu cing th e total number of bus transitions. We propose a heuristic algorithm that selects the sub-grou p of bus lines for b us encoding. Ex periments on benchmark examples in dicate that the partial bus-in vert coding reduces the tot al bus tran sitions b y 62.6% on the av erage, compared to that of the unencoded patterns.

design, automation, and test in europe | 2005

Resource Sharing and Pipelining in Coarse-Grained Reconfigurable Architecture for Domain-Specific Optimization

Yoon-Jin Kim; Mary Kiemb; Chulsoo Park; Jinyong Jung; Kiyoung Choi

Coarse-grained reconfigurable architectures aim to achieve goals of both high performance and flexibility. However, existing reconfigurable array architectures require many resources without considering the specific application domain. Functional resources that take long latency and/or large area can be pipelined and/or shared among the processing elements. Therefore, the hardware cost and the delay can be effectively reduced without any performance degradation for some application domains. We suggest such a reconfigurable array architecture template and a design space exploration flow for domain-specific optimization. Experimental results show that our approach is much more efficient, in both performance and area, compared to existing reconfigurable architectures.

international conference on computer aided design | 2002

Efficient instruction encoding for automatic instruction set design of configurable ASIPs

Jongeun Lee; Kiyoung Choi; Nikil D. Dutt

Application-specific instructions can significantly improve the performance, energy, and code size of configurable processors. A common approach used in the design of such instructions is to convert application-specific operation patterns into new complex instructions. However, processors with a fixed instruction bitwidth cannot accommodate all the potentially interesting operation patterns, due to the limited code space afforded by the fixed instruction bitwidth. We present a novel instruction set synthesis technique that employs an efficient instruction encoding method to achieve maximal performance improvement. We build a library of complex instructions with various encoding alternatives and select the best set of complex instructions while satisfying the instruction bitwidth constraint. We formulate the problem using integer linear programming and also present an effective heuristic algorithm. Experimental results using our technique generate instruction sets that show improvements of up to 38% over the native instruction set for several realistic benchmark applications running on a typical embedded RISC processor.

Explore More