Sungchan Kim | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Sungchan Kim is active.

Explore More

Publication

Featured researches published by Sungchan Kim.

ACM Transactions on Design Automation of Electronic Systems | 2007

PeaCE: A hardware-software codesign environment for multimedia embedded systems

Soonhoi Ha; Sungchan Kim; Choonseung Lee; Youngmin Yi; Seongnam Kwon; Young-Pyo Joo

Existent hardware-software (HW-SW) codesign tools mainly focus on HW-SW cosimulation to build a virtual prototyping environment that enables software design and system verification without need of making a hardware prototype. Not only HW-SW cosimulation, but also HW-SW codesign methodology involves system specification, functional simulation, design-space exploration, and hardware-software cosynthesis. The PeaCE codesign environment is the first full-fledged HW-SW codesign environment that provides seamless codesign flow from functional simulation to system synthesis. Targeting for multimedia applications with real-time constraints, PeaCE specifies the system behavior with a heterogeneous composition of three models of computation and utilizes features of the formal models maximally during the whole design process. It is also a reconfigurable framework in the sense that third-party design tools can be integrated to build a customized tool chain. Experiments with industry-strength examples prove the viability of the proposed technique.

international conference on hardware/software codesign and system synthesis | 2010

A task remapping technique for reliable multi-core embedded systems

Chanhee Lee; Hokeun Kim; Hae-woo Park; Sungchan Kim; Hyunok Oh; Soonhoi Ha

With the continuous scaling of semiconductor technology, the life-time of circuit is decreasing so that processor failure becomes an important issue in MPSoC design. A software solution to tolerate run-time processor failure is to migrate tasks from the failed processors to the live processors when failure occurs. Previous works on run-time task migration usually aim to minimize the migration overhead with or without a given latency constraint. For streaming applications, however, it is more important to minimize the throughput degradation than the migration overhead or the latency. Hence, we propose a task remapping technique to minimize the throughput degradation assuming that the migration overhead can be amortized safely. The target multi-core system assumed in this paper consists of processor pools and each pool consists of homogeneous processors. The proposed technique is based on an intensive compile-time analysis for all possible failure scenarios. It involves the following steps; 1) Determine the static mapping of tasks onto the live processors, aiming to minimize the throughput degradation: 2) Find an optimal processor-to-processor mapping to minimize the task migration overhead: and 3) Store the resultant task remapping information that includes task mapping and processor-to-processor mapping results. Since the task remapping information is pre-computed at compile-time for all possible failure scenarios, it should be efficiently represented and stored. At run-time, we simply remap the tasks following the compile-time decision. We examine the scalability of the proposed technique on both space and run-time overhead for compile-time analysis varying the number of failed processors. Through intensive experiments, we show that the proposed technique outperforms the previous works with respect to application throughput.

international conference on supercomputing | 2013

Active disk meets flash: a case for intelligent SSDs

Sangyeun Cho; Chanik Park; Hyunok Oh; Sungchan Kim; Youngmin Yi; Gregory R. Ganger

Intelligent solid-state drives (iSSDs) allow execution of limited application functions (e.g., data filtering or aggregation)on their internal hardware resources, exploiting SSD characteristics and trends to provide large and growing performance and energy efficiency benefits. Most notably, internal flash media bandwidth can be significantly (2-4x or more) higher than the external bandwidth with which the SSD is connected to a host system, and the higher internal bandwidth can be exploited within an iSSD. Also, SSD bandwidth is projected to increase rapidly over time, creating a substantial energy cost for streaming of data to an external CPU for processing, which can be avoided via iSSD processing. This paper makes a case for iSSDs by detailing these trends, quantifying the potential benefifits across a range of application activities, describing how SSD architectures could be extended cost-effectively, and demonstrating the concept with measurements of a prototype iSSD running simple data scan functions. Our analyses indicate that, with less than a 2% increase in hardware cost over a traditional SSD, an iSSD can provide 2-4x performance increases and 5-27x energy efficiency gains for a range of data-intensive computations.

signal processing systems | 2010

A Systematic Design Space Exploration of MPSoC Based on Synchronous Data Flow Specification

Choonseung Lee; Sungchan Kim; Soonhoi Ha

The design space exploration (DSE) problem addressed in this paper is to find out Multi-Processor System-on-Chip architectures for a given multi-task signal processing application aiming to minimize the system cost while satisfying the real-time constraints. It involves the following three sub-problems: selecting processing elements, mapping an application to the processing elements, and determining the communication architecture. The proposed approach consists of two inner design loops: one is a cosynthesis loop that determines the selection of PEs and the mapping of a given application to the PEs, and the other is a communication architecture synthesis loop to find the hierarchical shared bus architecture. We specify an application with a synchronous data flow (SDF) model of computation that has well-matched semantics with the algorithmic function flow in DSP applications. To solve the problem, we need to compare the estimated performance of design points and choose the best ones. The common method of simulation-based performance estimation is too time-consuming to explore the wide design space. Thanks to the analytical properties of the SDF model, the performance estimation can be done without HW/SW cosimulation in both loops. A global feedback from the communication architecture synthesis step to the cosynthesis step forms the proposed DSE framework. We use a real-life application, 4-channel Digital Video Recorder (DVR) that is a multi-task example, as well as randomly generated graphs to show the viability of the proposed approach.

international conference on hardware/software codesign and system synthesis | 2003

Schedule-aware performance estimation of communication architecture for efficient design space exploration

Sungchan Kim; Chaeseok Im; Soonhoi Ha

In this paper, we are concerned about performance estimation of bus-based communication architectures assuming that task partitioning and scheduling on processing elements are already determined. Since communication overhead is dynamic and unpredictable due to bus contention, a simulation-based approach seems inevitable for accurate performance estimation. However, it is too time-consuming to be used for exploring the wide design space of bus architectures. We propose a static performance-estimation technique based on a queueing analysis assuming that the memory traces and the task schedule information are given. We use this static estimation technique as the first step in our design space exploration framework to prune the design space drastically before applying a simulation-based approach to the reduced design space. Experimental results show that the proposed technique is several orders of magnitude faster than a trace-driven simulation while keeping the estimation error within 10% consistently in various communication architecture configurations.

design automation conference | 2012

Executing synchronous dataflow graphs on a SPM-based multicore architecture

Junchul Choi; Hyunok Oh; Sungchan Kim; Soonhoi Ha

In this paper we are concerned about executing synchronous dataflow (SDF) applications on a multicore architecture where a core has a limited size of scratchpad memory (SPM). Unlike traditional multi-processor scheduling of SDF graphs, we consider the SPM size limitation that incurs code and data overlay overhead. Since the scheduling problem is intractable, we propose an EA(evolutionary algorithm)-based technique. To hide memory latency, prefetching is aggressively performed in the proposed technique. The experimental results show that our approach reduces the overlay overhead significantly compared to a non-optimized approach and the previous approach.

international conference on hardware/software codesign and system synthesis | 2006

Efficient exploration of bus-based system-on-chip architectures

Sungchan Kim; Soonhoi Ha

Separation between computation and communication in system design allows system designers to explore the communication architecture independently after component selection and mapping decision is made. In this paper, we present an iterative two-step exploration methodology for bus-based on-chip communication architecture for multitask applications. We assume that the memory traces from the processing components are given. The proposed methodology uses a static performance estimation technique extended for multitask applications to reduce the design space quickly and drastically and applies a trace-driven simulation to the reduced set of design candidates for accurate performance estimation. For the case that local memory traffics as well as shared memory traffics are involved in bus contention, memory allocation is considered as an important axis of the design space in our technique. Experimental results show that the proposed methodology achieves significant performance gain by optimizing on-chip communication only, up to almost 100% compared with an initial single shared bus architecture, in both two real-life examples, a four-Channel digital video recorder and an equalizer for OFDM DVB-T receiver

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems | 2012

A Parallel Simulation Technique for Multicore Embedded Systems and Its Performance Analysis

Dukyoung Yun; Sungchan Kim; Soonhoi Ha

A virtual prototyping system is constructed by replacing real processing components with component simulators running concurrently. The performance of such a distributed simulation decreases drastically as the number of component simulators increases. Thus, we propose a novel parallel simulation technique to boost up the simulation speed. In the proposed technique, a simulator wrapper performs time synchronization with the simulation backplane on behalf of the associated component simulator itself. Component simulators send null messages periodically to the backplane to enable parallel simulation without any causality problems. Since excessive communication may degrade the simulation performance, we also propose a novel performance analysis technique to determine an optimal period of null message transfer, considering both the characteristics of a target application and the configurations of the simulation host. Through intensive experiments, we show that the proposed parallel simulation achieves almost linear speedup to the number of processor cores if the frequency of null message transfer is optimally decided. The proposed analysis technique could predict the simulation performance with more than 90% accuracy in the worst case for various target applications and simulation environments we have used for experiments.

ACM Transactions in Embedded Computing Systems | 2014

Dynamic Behavior Specification and Dynamic Mapping for Real-Time Embedded Systems: HOPES Approach

Hanwoong Jung; Chanhee Lee; Shin-Haeng Kang; Sungchan Kim; Hyunok Oh; Soonhoi Ha

As the number of processors in a chip increases and more functions are integrated, the system status will change dynamically due to various factors such as the workload variation, QoS requirement, and unexpected component failure. A typical method to deal with the dynamics of the system is to decide the mapping decision at runtime, based on the local information of the system status. It is very challenging to guarantee any real-time performance of a certain application in such a dynamically varying system. To solve this problem, we propose a hybrid specification of dataflow and FSM models to specify the dynamic behavior of a system distinguishing inter- and intra-application dynamism. At the top level, each application is specified by a dataflow task and the dynamic behavior is modeled as a control task that supervises the execution of applications. Inside a dataflow task, we specify the dynamic behavior using a similar way as FSM-based SADF in which an application is specified by a synchronous dataflow graph for each mode of operation. It enables us to perform compile-time scheduling of each graph to maximize the throughput varying the number of allocated processors, and store the scheduling information. When a change in system state is detected at runtime, the number of allocated processors to the active tasks is determined dynamically utilizing the stored scheduling information of those tasks in order to meet the real-time requirements. The proposed technique is implemented in the HOPES design environment. Through preliminary experiments with a simple smartphone example, we show the viability of the proposed methodology.

design, automation, and test in europe | 2008

Architecture exploration of NAND flash-based multimedia card

Sungchan Kim; Chanik Park; Soonhoi Ha

In this paper, we present an architecture exploration methodology for low-end embedded systems where the reduction of cost is a primary design concern. The architecture exploration of such systems needs to explore a wide design space spanned by detailed architecture parameters through cycle-accurate performance estimation. For fast exploration, the proposed methodology is based on an efficient evolutionary algorithm, called QEA, and trace-driven simulation to evaluate architecture candidates quickly. We applied the proposed methodology to NAND flash-based multimedia card as a case study considering the following design parameters: buffer size, flash memory configuration, clock, communication architecture, and memory allocation. The experimental results validate the proposed methodology by showing the optimal architecture configurations with varying performance constraints and design parameters.

Explore More