Doosan Cho | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Doosan Cho is active.

Explore More

Publication

Featured researches published by Doosan Cho.

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems | 2009

Adaptive Scratch Pad Memory Management for Dynamic Behavior of Multimedia Applications

Doosan Cho; Sudeep Pasricha; Ilya Issenin; Nikil D. Dutt; Minwook Ahn; Yunheung Paek

Exploiting runtime memory access traces can be a complementary approach to compiler optimizations for the energy reduction in memory hierarchy. This is particularly important for emerging multimedia applications since they usually have input-sensitive runtime behavior which results in dynamic and/or irregular memory access patterns. These types of applications are normally hard to optimize by static compiler optimizations. The reason is that their behavior stays unknown until runtime and may even change during computation. To tackle this problem, we propose an integrated approach of software [compiler and operating system (OS)] and hardware (data access record table) techniques to exploit data reusability of multimedia applications in Multiprocessor Systems on Chip. Guided by compiler analysis for generating scratch pad data layouts and hardware components for tracking dynamic memory accesses, the scratch pad data layout adapts to an input data pattern with the help of a runtime scratch pad memory manager incorporated in the OS. The runtime data placement strategy presented in this paper provides efficient scratch pad utilization for the dynamic applications. The goal is to minimize the amount of accesses to the main memory over the entire runtime of the system, which leads to a reduction in the energy consumption of the system. Our experimental results show that our approach is able to significantly improve the energy consumption of multimedia applications with dynamic memory access behavior over an existing compiler technique and an alternative hardware technique.

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems | 2011

High Throughput Data Mapping for Coarse-Grained Reconfigurable Architectures

Yongjoo Kim; Jongeun Lee; Aviral Shrivastava; Jonghee W. Yoon; Doosan Cho; Yunheung Paek

Coarse-grained reconfigurable arrays (CGRAs) are a very promising platform, providing both up to 10-100 MOps/mW of power efficiency and software programmability. However, this promise of CGRAs critically hinges on the effectiveness of application mapping onto CGRA platforms. While previous solutions have greatly improved the computation speed, they have largely ignored the impact of the local memory architecture on the achievable power and performance. This paper motivates the need for memory-aware application mapping for CGRAs, and proposes an effective solution for application mapping that considers the effects of various memory architecture parameters including the number of banks, local memory size, and the communication bandwidth between the local memory and the external main memory. Further we propose efficient methods to handle dependent data on a double-buffering local memory, which is necessary for recurrent loops. Our proposed solution achieves 59% reduction in the energy-delay product, which factors into about 47% and 22% reduction in the energy consumption and runtime, respectively, as compared to memory-unaware mapping for realistic local memory architectures. We also show that our scheme scales across a range of applications and memory parameters, and the runtime overhead of handling recurrent loops by our proposed methods can be less than 1%.

compilers, architecture, and synthesis for embedded systems | 2007

Software controlled memory layout reorganization for irregular array access patterns

Doosan Cho; Ilya Issenin; Nikil D. Dutt; Jonghee W. Yoon; Yunheung Paek

Many embedded array-intensive applications have irregular access patterns that are not amenable to static analysis for extraction of access patterns, and thus prevent efficient use of a Scratch Pad Memory (SPM) hierarchy for performance and power improvement. We present a profiling based strategy that generates a memory access trace which can be used to identify data elements with fine granularity that can profitably be placed in the SPMs to maximize performance and energy gains. We developed an entire toolchain that allows incorporation of the code required to profitably move data to SPMs; visualization of the extracted access pattern after profiling; and evaluation/exploration of the generated application code to steer mapping of data to the SPM to yield performance and energy benefits.We present a heuristic approach that efficiently exploits the SPM using the profiler-driven access pattern behaviors. Experimental results on EEMBC and other industrial codes obtained with our framework show that we are able to achieve 36% energy reduction and reduce execution time by up to 22% compared to a cache based system.

languages, compilers, and tools for embedded systems | 2008

Compiler driven data layout optimization for regular/irregular array access patterns

Doosan Cho; Sudeep Pasricha; Ilya Issenin; Nikil D. Dutt; Yunheung Paek; SunJun Ko

Embedded multimedia applications consist of regular and irregular memory access patterns. Particularly, irregular pattern are not amenable to static analysis for extraction of access patterns, and thus prevent efficient use of a Scratch Pad Memory (SPM) hierarchy for performance and energy improvements. To resolve this, we present a compiler strategy to optimize data layout in regular/irregular multimedia applications running on embedded multiprocessor environments. The goal is to maximize the amount of accesses to the SPM over the entire system which leads to a reduction in the energy consumption of the system. This is achieved by optimizing data placement of application-wide reused data so that it resides in the SPMs of processing elements. Specifically, our scheme is based on a profiling that generates a memory access footprint. The memory access footprint is used to identify data elements with fine granularity that can profitably be placed in the SPMs to maximize performance and energy gains. We present a heuristic approach that efficiently exploits the SPMs using memory access footprint. Our experimental results show that our approach is able to reduce energy consumption by 30% and improve performance by 18% over cache based memory subsystems for various multimedia applications.

digital systems design | 2009

Iterative Algorithm for Compound Instruction Selection with Register Coalescing

Minwook Ahn; Jonghee M. Youn; Youngkyu Choi; Doosan Cho; Yunheung Paek

A compound instruction, encoding several ALU or memory operations within an instruction word, has been regarded as an efficient way of improving performance. In the compiler for embedded processors, the code generation algorithm for compound instructions has been built by dealing mainly with instruction selection which is a crucial phase of code generation. In this paper, we propose an iterative code generation algorithm for minimizing the detrimental impact of register coalescing that is applied to the code with compound instructions generated earlier from the instruction selection phase.

ACM Transactions on Design Automation of Electronic Systems | 2013

Reducing instruction bit-width for low-power VLIW architectures

Jongwon Lee; Jonghee M. Youn; Doosan Cho; Yunheung Paek

VLIW (very long instruction word) architectures have proven to be useful for embedded applications with abundant instruction level parallelism. But due to the long instruction bus width it often consumes more power and memory space than necessary. One way to lessen this problem is to adopt a reduced bit-width instruction set architecture (ISA) that has a narrower instruction word length. This facilitates a more efficient hardware implementation in terms of area and power by decreasing bus-bandwidth requirements and the power dissipation associated with instruction fetches. In practice, however, it is impossible to convert a given ISA fully into an equivalent reduced bit-width one because the narrow instruction word, due to bit-width restrictions, can encode only a small subset of normal instructions in the original ISA. Consequently, existing processors provide narrow instructions in very limited cases along with severe restrictions on register accessibility. The objective of this work is to explore the possibility of complete conversion, as a case study, of an existing 32-bit VLIW ISA into a 16-bit one that supports effectively all 32-bit instructions. To this objective, we attempt to circumvent the bit-width restrictions by dynamically extending the effective instruction word length of the converted 16-bit operations. Further, we will show that our proposed ISA conversion can create a synergy effect with a VLES (variable length execution set) architecture that is adopted in most recent VLIW processors. According to our experiment, the code size becomes significantly smaller after the conversion to 16-bit VLIW code. Also at a slight run time cost, the machine with the 16-bit ISA consumes much less energy than the original machine.

design, automation, and test in europe | 2011

I 2 CRF: Incremental interconnect customization for embedded reconfigurable fabrics

Jonghee W. Yoon; Jongeun Lee; Jaewan Jung; Sang-Hyun Park; Yongjoo Kim; Yunheung Paek; Doosan Cho

Integrating coarse-grained reconfigurable architectures (CGRAs) into a System-on-a-Chip (SoC) presents many benefits as well as important challenges. One of the challenges is how to customize the architecture for the target applications efficiently and effectively without explicit design space exploration. In this paper we present a novel methodology for incremental interconnect customization of CGRAs that can suggest a new interconnection architecture that can maximize the performance for a given set of application kernels while minimizing the hardware cost. Applying the inexact graph matching analogy, we translate our problem into graph matching taking into account the cost of various graph edit operations, which we solve using the A∗ search algorithm with a heuristic tailored to our problem. Our experimental results demonstrate that our customization method can quickly find application-optimized interconnections that exhibit 70% higher performance on average compared to the base architecture, with relatively little hardware increase in interconnections and muxes.

compiler construction | 2007

Preprocessing strategy for effective modulo scheduling on multi-issue digital signal processors

Doosan Cho; Ravi Ayyagari; Gang-Ryung Uh; Yunheung Paek

To achieve high resource utilization for multi-issue Digital Signal Processors (DSPs), production compilers commonly include variants of the iterative modulo scheduling algorithm. However, excessive cyclic data dependences, which exist in communication and media processing loops, often prevent the modulo scheduler from achieving ideal loop initiation intervals. As a result, replicated functional units in multi-issue DSPs are frequently underutilized. In response to this resource underutilization problem, this paper describes a compiler preprocessing strategy that capitalizes on two techniques for effective modulo scheduling, referred to as cloning1 and cloning2. The core of the proposed techniques lies in the direct relaxation of cyclic data dependences by exploiting functional units which are otherwise left unused. Since our preprocessing strategy requires neither code duplication nor additional hardware support, it is relatively easy to implement in DSP compilers. The strategy proposed has been validated by an implementation for a StarCore SC140 optimizing C compiler.

KIPS Transactions on Computer and Communication Systems | 2013

SorMob: Computation Offloading Framework based on AOP

Yeongpil Cho; Doosan Cho; Yunheung Paek

ABSTRACT As smartphones are rapidly and widely spread, their applications request gradually larger computation power. Recently, in the personal computer, computing power of hardware has exceeded performance requirement of software sometimes. Computing power of smartphone, however, will not grow at the same pace as demand of applications because of form factor to seek thinner devices and power limitation by relatively slow technical progress of battery. Computation offloading is getting huge attention as one of solution for the problem. It has not commonly used technology in spite of advantages for performance and power consumption since the existing offloading frameworks are difficult for application developer to utilize. This paper presents an application developer-friendly offloading framework, named SorMob. Based on Aspect Oriented Programming model, SorMob provides a convenient environment for application development, and its performance was verified by comparing with the existing offloading framework.Keywords:Computation Offloading, AOP

ACM Transactions on Design Automation of Electronic Systems | 2013

Architecture customization of on-chip reconfigurable accelerators

Jonghee W. Yoon; Jongeun Lee; Sang-Hyun Park; Yongjoo Kim; Jinyong Lee; Yunheung Paek; Doosan Cho

Integrating coarse-grained reconfigurable architectures (CGRAs) into a System-on-a-Chip (SoC) presents many benefits as well as important challenges. One of the challenges is how to customize the architecture for the target applications efficiently and effectively without performing explicit design space exploration. In this article we present a novel methodology for incremental interconnect customization of CGRAs that can suggest a new interconnection architecture which is able to maximize the performance for a given set of application kernels while minimizing the hardware cost. In our methodology, we translate the problem of interconnect customization into that of inexact graph matching, and we devised a heuristic for A* search algorithm to efficiently solve the inexact graph matching problem. Our experimental results demonstrate that our customization method can quickly find application-optimized interconnections that exhibit 80% higher performance on average compared to the base architecture which has mesh interconnections, with little energy and hardware increase in interconnections and muxes.

Explore More