Sungjoo Yoo | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Sungjoo Yoo is active.

Explore More

Publication

Featured researches published by Sungjoo Yoo.

design automation conference | 2002

Component-based design approach for multicore SoCs

W. Cescirio; Amer Baghdadi; Lovic Gauthier; Damien Lyonnard; Gabriela Nicolescu; Yanick Paviot; Sungjoo Yoo; Ahmed Amine Jerraya; Mario Diaz-Nava

This paper presents a high-level component-based methodology and design environment for application-specific multicore SoC architectures. Component-based design provides primitives to build complex architectures from basic components. This bottom-up approach allows design-architects to explore efficient custom solutions with best performances. This paper presents a high-level component-based methodology and design environment for application-specific multicore SoC architectures. The system specifications are represented as a virtual architecture described in a SystemC-like model and annotated with a set of configuration parameters. Our component-based design environment provides automatic wrapper-generation tools able to synthesize hardware interfaces, device drivers, and operating systems that implement a high-level interconnect API. This approach, experimented over a VDSL system, shows a drastic design time reduction without any significant efficiency loss in the final circuit.

international symposium on computer architecture | 2015

A scalable processing-in-memory accelerator for parallel graph processing

Junwhan Ahn; Sungpack Hong; Sungjoo Yoo; Onur Mutlu; Kiyoung Choi

The explosion of digital data and the ever-growing need for fast data analysis have made in-memory big-data processing in computer systems increasingly important. In particular, large-scale graph processing is gaining attention due to its broad applicability from social science to machine learning. However, scalable hardware design that can efficiently process large graphs in main memory is still an open problem. Ideally, cost-effective and scalable graph processing systems can be realized by building a system whose performance increases proportionally with the sizes of graphs that can be stored in the system, which is extremely challenging in conventional systems due to severe memory bandwidth limitations. In this work, we argue that the conventional concept of processing-in-memory (PIM) can be a viable solution to achieve such an objective. The key modern enabler for PIM is the recent advancement of the 3D integration technology that facilitates stacking logic and memory dies in a single package, which was not available when the PIM concept was originally examined. In order to take advantage of such a new technology to enable memory-capacity-proportional performance, we design a programmable PIM accelerator for large-scale graph processing called Tesseract. Tesseract is composed of (1) a new hardware architecture that fully utilizes the available memory bandwidth, (2) an efficient method of communication between different memory partitions, and (3) a programming interface that reflects and exploits the unique hardware design. It also includes two hardware prefetchers specialized for memory access patterns of graph processing, which operate based on the hints provided by our programming model. Our comprehensive evaluations using five state-of-the-art graph processing workloads with large real-world graphs show that the proposed architecture improves average system performance by a factor of ten and achieves 87% average energy reduction over conventional systems.

design automation conference | 2001

Automatic generation of application-specific architectures for heterogeneous multiprocessor system-on-chip

Damien Lyonnard; Sungjoo Yoo; Amer Baghdadi; Ahmed Amine Jerraya

We present a design flow for the generation of application-specific multiprocessor architectures. In the flow, architectural parameters are first extracted from a high-level system specification. Parameters are used to instantiate architectural components, such as processors, memory modules and communication networks. The flow includes the automatic generation of a communication coprocessor that adapts the processor to the communication network in an application-specific way. Experiments with two system examples show the effectiveness of the presented design flow.

IEEE Design & Test of Computers | 2002

Multiprocessor SoC platforms: a component-based design approach

Wander O. Cesário; Damien Lyonnard; Gabriela Nicolescu; Yanick Paviot; Sungjoo Yoo; Ahmed Amine Jerraya; Lovic Gauthier; Mario Diaz-Nava

A high-level, component-based methodology and design environment for multiprocessor SoC architectures reduces design time without significant efficiency loss in the final circuit. This design environment provides tools for automatic wrapper generation that synthesize hardware interfaces, device drivers, and operating systems implementing high-level interconnect APIs.

international symposium on computer architecture | 2015

PIM-enabled instructions: a low-overhead, locality-aware processing-in-memory architecture

Junwhan Ahn; Sungjoo Yoo; Onur Mutlu; Kiyoung Choi

Processing-in-memory (PIM) is rapidly rising as a viable solution for the memory wall crisis, rebounding from its unsuccessful attempts in 1990s due to practicality concerns, which are alleviated with recent advances in 3D stacking technologies. However, it is still challenging to integrate the PIM architectures with existing systems in a seamless manner due to two common characteristics: unconventional programming models for in-memory computation units and lack of ability to utilize large on-chip caches. In this paper, we propose a new PIM architecture that (I) does not change the existing sequential programming models and (2) automatically decides whether to execute PIM operations in memory or processors depending on the locality of data. The key idea is to implement simple in-memory computation using compute-capable memory commands and use specialized instructions, which we call PIM-enabled instructions, to invoke in-memory computation. This allows PIM operations to be interoperable with existing programming models, cache coherence protocols, and virtual memory mechanisms with no modification. In addition, we introduce a simple hardware structure that monitors the locality of data accessed by a PIM-enabled instruction at runtime to adaptively execute the instruction at the host processor (instead of in memory) when the instruction can benefit from large on-chip caches. Consequently, our architecture provides the illusion that PIM operations are executed as if they were host processor instructions. We provide a case study of how ten emerging data-intensive workloads can benefit from our new PIM abstraction and its hardware implementation. Evaluations show that our architecture significantly improves system performance and, more importantly, combines the best parts of conventional and PlM architectures by adapting to data locality of applications.

design, automation, and test in europe | 2001

Automatic generation and targeting of application specific operating systems and embedded systems software

Lovic Gauthier; Sungjoo Yoo; Ahmed Amine Jerraya

We propose a method of automatic generation of application specific operating systems (OSs) and automatic targeting of application software. OS generation starts from a very small bur yet flexible OS kernel. OS services, which are specific to the application and deduced from dependencies between services, are added to the kernel to construct the whole OS. Communication and synchronization functions in the application code are adapted to the generated OS. As a preliminary experiment, we applied the proposed method to a system example called token ring system.

asia and south pacific design automation conference | 2006

PowerV i P: Soc power estimation framework at transaction level

Ikhwan Lee; Hyun-Suk Kim; Peng Yang; Sungjoo Yoo; Eui-Young Chung; Kyu-Myung Choi; Jeong-Taek Kong; Soo-Kwan Eo

In this work, we propose a SoC power estimation framework built on our system-level simulation environment. Our framework provides designers with the system-level power profile in a cycle-accurate manner. We target the framework to run fast and accurately, which is enabled by adopting different modeling techniques depending on the power characteristics of various IP blocks. The framework can be applied to any target SoC design

design automation conference | 2011

Power management of hybrid DRAM/PRAM-based main memory

Hyunsun Park; Sungjoo Yoo; Sunggu Lee

Hybrid main memory consisting of DRAM and non-volatile memory is attractive since the non-volatile memory can give the advantage of low standby power while DRAM provides high performance and better active power. In this work, we address the power management of such a hybrid main memory consisting of DRAM and phase-change RAM (PRAM). In order to reduce DRAM refresh energy which occupies a significant portion of total memory energy, we present a runtime-adaptive method of DRAM decay. In addition, we present two methods, DRAM bypass and dirty data keeping, for further reduction in refresh energy and memory access latency, respectively. The experiments show that by reducing DRAM refreshes, we can obtain 23.5%∼94.7% reduction in the energy consumption with negligible performance overhead compared with the conventional DRAM-only main memory.

IEEE Transactions on Circuits and Systems for Video Technology | 2010

Dual Motion Estimation for Frame Rate Up-Conversion

Suk-Ju Kang; Sungjoo Yoo; Young Hwan Kim

In this letter, we present a new motion estimation algorithm for frame rate up-conversion. The proposed dual motion estimation algorithm enhances the estimation accuracy of motion vectors by using the unidirectional and bidirectional matching ratios of blocks in the previous and current frames. In addition, the proposed motion estimation approach uses motion vector validity to evaluate the accuracy of motion vectors thereby avoiding false motion vectors. In experiments using benchmark image sequences, the proposed motion estimation algorithm improved the average peak signal-to-noise ratio of interpolated frames by up to 2.272 dB, when compared to conventional motion estimation algorithms. For the comparison of the perceptual image quality using the structural similarity, the average value of the proposed dual motion estimation was by up to 0.062 higher than those of the conventional algorithms.

Ninth International Symposium on Hardware/Software Codesign. CODES 2001 (IEEE Cat. No.01TH8571) | 2001

A generic wrapper architecture for multi-processor SoC cosimulation and design

Sungjoo Yoo; Gabriela Nicolescu; Damien Lyonnard; Amer Baghdadi; Ahmed Amine Jerraya

In communication refinement with multiple communication protocols and abstraction levels, the system specification is described by heterogeneous components in terms of communication protocols and abstraction levels. To adapt each heterogeneous component to the other part of system, we present a generic wrapper architecture that can adapt different protocols or different abstraction levels, or both. In this paper, we give a detailed explanation of applying the generic wrapper architecture to mixed-level cosimulation. As preliminary experiments, we applied it to mixed-level cosimulation of an IS-95 CDMA cellular phone system.

Explore More