Yoonseo Choi | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Yoonseo Choi is active.

Explore More

Publication

Featured researches published by Yoonseo Choi.

ACM Transactions on Architecture and Code Optimization | 2013

Fast modulo scheduler utilizing patternized routes for coarse-grained reconfigurable architectures

Won-Sub Kim; Yoonseo Choi; Haewoo Park

Coarse-Grained Reconfigurable Architectures (CGRAs) present a potential of high compute throughput with energy efficiency. A CGRA consists of an array of Functional Units (FUs), which communicate with each other through an interconnect network containing transmission nodes and register files. To achieve high performance from the software solutions mapped onto CGRAs, modulo scheduling of loops is generally employed. One of the key challenges in modulo scheduling for CGRAs is to explicitly handle routings of operands from a source to a destination operations through various routing resources. Existing modulo schedulers for CGRAs are slow because finding a valid routing is generally a searching problem over a large space, even with the guidance of well-defined cost metrics. Applications in traditional embedded multimedia domains are regarded as relatively tolerant to a slow compile time in exchange for a high-quality solution. However, many rapidly growing domains of applications, such as 3D graphics, require a fast compilation. Entrances of CGRAs to these domains have been blocked mainly due to their long compile time. We attack this problem by utilizing patternized routes, for which resources and time slots for a success can be estimated in advance when a source operation is placed. By conservatively reserving predefined resources at predefined time slots, future routings originating from the source operation are guaranteed. Experiments on a real-world 3D graphics benchmark suite show that our scheduler improves the compile time up to 6,000 times while achieving an average 70% throughputs of the state-of-the-art CGRA modulo scheduler, the Edge-centric Modulo Scheduler (EMS).

international conference on consumer electronics | 2014

JTS-based static branch prediction

Tai-song Jin; Jin-Seok Lee; Min-wook Ahn; Yoonseo Choi; Do Hyung Kim; Shihwa Lee

VLIW architectures are popular design choices in embedded computing market because of its capability of delivering performance with low power. Branch prediction plays a key role for minimizing pipeline stalls due to control hazard. Though a hardware branch predictor can result in good predictions, its HW cost often hinders it from being used in low-power VLIW architectures. On the other hand, a software branch prediction by the compiler can achieve comparable prediction quality utilizing delay slots intelligently without HW cost. In this paper, we propose a novel static branch prediction technique using jump target setting (JTS) instructions. The JTS-enabled VLIW architecture is successfully shipped in several commercial consumer electronic devices from Samsung. In our experiment using multimedia applications, the proposed branch prediction scheme outperforms the conventional static branch prediction with delay slots by 9%.

international conference on consumer electronics | 2014

Nop compression scheme for high speed DSPs based on VLIW architecture

Tai-song Jin; Min-wook Ahn; Dong-hoon Yoo; Dong-kwan Suh; Yoonseo Choi; Do-Hyung Kim; Shihwa Lee

VLIW (Very Long Instruction Word) is one of the most popular architectures in embedded systems because it has features of low power consumption and low hardware cost. Due to the nature of VLIW architecture such as bundled instructions and large register files, VLIW processors are running with large size of instruction codes in relatively low clock frequency. However compact instruction size and high clock frequency are the most important requirements of modern embedded consumer electronics. In this paper we propose a novel instruction compression scheme to solve the addressed problem. The experiment shows that the proposed scheme can reduce instruction size by 23% and improve clock frequency by 25% in average comparing with conventional compression schemes.

Archive | 2015