Haewoo Park | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Haewoo Park is active.

Explore More

Publication

Featured researches published by Haewoo Park.

field-programmable technology | 2012

SCC based modulo scheduling for coarse-grained reconfigurable processors

Won-Sub Kim; Dong-hoon Yoo; Haewoo Park; Min-wook Ahn

Coarse-grained reconfigurable arrays (CGRAs) architectures aim to offer high performance at low power consumption, especially for digital signal processing and streaming applications. To fully exploit the computing capability of CGRA, it is essential to develop a scheduling algorithm which maps operations over processing elements in CGRA. Modulo scheduling [1] is known as the state-of-art algorithm for CGRA scheduling, and there are many variants [2][3][4]. However, they suffer from dealing with inter-iteration dependences called as recurrences that form cyclic dependences. Hence we propose a new scheduling technique that efficiently handles the cyclic dependences. The key techniques are grouping all the mutually-dependent recurrence cycles into a strongly connected component (SCC), and scheduling the input data flow graph (DFG) based on SCCs. Since grouping removes all the recurrence cycles from DFG, the resulting SCC graph becomes a form of directed acyclic graph (DAG) in which the scheduler can track the total order of SCCs. While processing SCCs one by one, our intra-SCC scheduler analyzes the dependences between every pair of two different operations inside of SCC and produces the schedule of them. Thanks to the well-structured form of the SCC-based graph, we obtain more efficient schedule compared to the previous CGRA scheduling algorithm [2]. The experimental results show that the proposed technique enhances the performance of recurrence-dominant loops up to 3.5X and raises the success rate of modulo-scheduling compared to the previous CGRA scheduling algorithm [2].

ACM Transactions on Architecture and Code Optimization | 2013

Fast modulo scheduler utilizing patternized routes for coarse-grained reconfigurable architectures

Won-Sub Kim; Yoonseo Choi; Haewoo Park

Coarse-Grained Reconfigurable Architectures (CGRAs) present a potential of high compute throughput with energy efficiency. A CGRA consists of an array of Functional Units (FUs), which communicate with each other through an interconnect network containing transmission nodes and register files. To achieve high performance from the software solutions mapped onto CGRAs, modulo scheduling of loops is generally employed. One of the key challenges in modulo scheduling for CGRAs is to explicitly handle routings of operands from a source to a destination operations through various routing resources. Existing modulo schedulers for CGRAs are slow because finding a valid routing is generally a searching problem over a large space, even with the guidance of well-defined cost metrics. Applications in traditional embedded multimedia domains are regarded as relatively tolerant to a slow compile time in exchange for a high-quality solution. However, many rapidly growing domains of applications, such as 3D graphics, require a fast compilation. Entrances of CGRAs to these domains have been blocked mainly due to their long compile time. We attack this problem by utilizing patternized routes, for which resources and time slots for a success can be estimated in advance when a source operation is placed. By conservatively reserving predefined resources at predefined time slots, future routings originating from the source operation are guaranteed. Experiments on a real-world 3D graphics benchmark suite show that our scheduler improves the compile time up to 6,000 times while achieving an average 70% throughputs of the state-of-the-art CGRA modulo scheduler, the Edge-centric Modulo Scheduler (EMS).

compilers, architecture, and synthesis for embedded systems | 2014

Retargetable automatic generation of compound instructions for CGRA based reconfigurable processor applications

Narasinga Rao Miniskar; Soma Kohli; Haewoo Park; Dong-hoon Yoo

Reconfigurable processors such as SRP (Samsung Reconfigurable Processors) have become increasingly important, which enables just enough flexibility of accepting software solutions and providing application specific hardware configurability for faster time-to-market, lower development cost and higher performance while maintaining lower energy consumption and area. The reconfigurable processor compilation framework supports wide range of architectures through architecture description template for different domains of applications such as image processing, multimedia, video, and graphics. These architectures support several domain specific compound instructions (also called as intrinsics), which are computationally efficient when compared to the set of general instructions in the processor. Application developers have to use these intrinsics in their programs according to the architecture, which can result very inefficient usage, tedious and more error-prone. More-over, the intrinsics provided by the architecture need constant reference to the intrinsics file during development. In this paper, we propose a retargetable novel methodology for the automatic generation of compound instructions for a given architecture and application source code at compile time. Our approach is able to consider ~75% of total intrinsics in the architectures with the success rate of > 90% in identifying the intrinsics in the benchmarks such as AVC OpenGL Full Engine and OpenGL Vector benchmarks.

international conference on consumer electronics | 2013

A scalable scheduling algorithm for coarse-grained reconfigurable architecture

Haewoo Park; Won-Sub Kim; Dong-hoon Yoo; Soojung Ryu; Jeongwook Kim

Coarse-grained reconfigurable architectures (CGRAs) are introduced as flexible architectures that can efficiently execute various types of applications in a single device. A CGRA often achieve high IPC by utilizing tens or hundreds of functional units (FUs). The key technique in exploiting a CGRA is to find an optimal mapping of operations over FUs. Modulo scheduling algorithm is known as the state-of-art technique to find fairly efficient solution; however it often takes too much time and occasionally fails as the number of FU is increasing. In this paper, we propose a novel two-stage scheduling algorithm which finds out a solution within a reasonable amount of time. The experimental result presents the proposed algorithm reduces the scheduling time by 92% and finds out schedules that are as efficient as the solutions given by the previous modulo scheduler.

Archive | 2015