Minwook Ahn | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Minwook Ahn is active.

Explore More

Publication

Featured researches published by Minwook Ahn.

design, automation, and test in europe | 2006

A spatial mapping algorithm for heterogeneous coarse-grained reconfigurable architectures

Minwook Ahn; Jonghee W. Yoon; Yunheung Paek; Yoon-Jin Kim; Mary Kiemb; Kiyoung Choi

In this work, we investigate the problem of automatically mapping applications onto a coarse-grained reconfigurable architecture and propose an efficient algorithm to solve the problem. We formalize the mapping problem and show that it is NP-complete. To solve the problem within a reasonable amount of time, we divide it into three subproblems: covering, partitioning and layout. Our empirical results demonstrate that our technique produces nearly as good performance as hand-optimized outputs for many kernels

asia and south pacific design automation conference | 2008

SPKM: a novel graph drawing based algorithm for application mapping onto coarse-grained reconfigurable architectures

Jonghee W. Yoon; Aviral Shrivastava; Sang-Hyun Park; Minwook Ahn; Yunheung Paek

IEEE Transactions on Very Large Scale Integration Systems | 2009

A Graph Drawing Based Spatial Mapping Algorithm for Coarse-Grained Reconfigurable Architectures

Jonghee W. Yoon; Aviral Shrivastava; Sang-Hyun Park; Minwook Ahn; Yunheung Paek

Recently coarse-grained reconfigurable architectures (CGRAs) have drawn increasing attention due to their efficiency and flexibility. While many CGRAs have demonstrated impressive performance improvements, the effectiveness of CGRA platforms ultimately hinges on the compiler. Existing CGRA compilers do not model the details of the CGRA, and thus they are i) unable to map applications, even though a mapping exists, and ii) using too many processing elements (PEs) to map an application. In this paper, we model several CGRA details, e.g., irregular CGRA topologies, shared resources and routing PEs in our compiler and develop a graph drawing based approach, split-push kernel mapping (SPKM), for mapping applications onto CGRAs. On randomly generated graphs our technique can map on average 4.5times more applications than the previous approach, while generating mappings which have better qualities in terms of utilized CGRA resources. Utilizing fewer resources is directly translated into increased opportunities for novel power and performance optimization techniques. Our technique shows less power consumption in 71 cases and shorter execution cycles in 66 cases out of 100 synthetic applications, with minimum mapping time overhead. We observe similar results on a suite of benchmarks collected from Livermore loops, Mediabench, Multimedia, Wavelet and DSPStone benchmarks. SPKM is not a customized algorithm only for a specific CGRA template, and it is demonstrated by exploring various PE interconnection topologies and shared resource configurations with SPKM.

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems | 2009

Adaptive Scratch Pad Memory Management for Dynamic Behavior of Multimedia Applications

Doosan Cho; Sudeep Pasricha; Ilya Issenin; Nikil D. Dutt; Minwook Ahn; Yunheung Paek

Exploiting runtime memory access traces can be a complementary approach to compiler optimizations for the energy reduction in memory hierarchy. This is particularly important for emerging multimedia applications since they usually have input-sensitive runtime behavior which results in dynamic and/or irregular memory access patterns. These types of applications are normally hard to optimize by static compiler optimizations. The reason is that their behavior stays unknown until runtime and may even change during computation. To tackle this problem, we propose an integrated approach of software [compiler and operating system (OS)] and hardware (data access record table) techniques to exploit data reusability of multimedia applications in Multiprocessor Systems on Chip. Guided by compiler analysis for generating scratch pad data layouts and hardware components for tracking dynamic memory accesses, the scratch pad data layout adapts to an input data pattern with the help of a runtime scratch pad memory manager incorporated in the OS. The runtime data placement strategy presented in this paper provides efficient scratch pad utilization for the dynamic applications. The goal is to minimize the amount of accesses to the main memory over the entire runtime of the system, which leads to a reduction in the energy consumption of the system. Our experimental results show that our approach is able to significantly improve the energy consumption of multimedia applications with dynamic memory access behavior over an existing compiler technique and an alternative hardware technique.

high performance embedded architectures and compilers | 2009

Fast Code Generation for Embedded Processors with Aliased Heterogeneous Registers

Minwook Ahn; Yunheung Paek

Many embedded processors have complex, irregular architectures resulting from the customization for the maximum performance and energy efficiency of target applications. One such example is the heterogeneous register architecture, which has fast, small-sized register files, for their specific uses, distributed over the data paths between different functional units. Although this architectural design may be good at achieving the H/W design goal of high speed, small area and low power, it requires highly expensive algorithms for optimal code generation. This is primarily because multiple registers contained in each file come with many different constraints subject to their design purposes, and often their names are aliased with each other; thus the final code quality is very sensitive to how properly such aliased, heterogeneous registers are utilized in every instruction. In this work, we propose a code generation approach to attack this complex problem. The experiments reveal that our approach is fast, practically running in polynomial time. In comparison with the related work, it achieves approximately 13% of code size reduction and 16% of speed increase.

symposium on application specific processors | 2009

A new addressing mode for the encoding space problem on embedded processors

Jonghee M. Youn; Minwook Ahn; Daeho Kim; Jonghee W. Yoon; Yunheung Paek; Sechul Shin; Hochang Chae; Jeonghun Cho

The complexity of todays applications increases with various requirements such as execution time, code size or power consumption. To satisfy these requirements for performance, efficient instruction set design is one of the important issues because an instruction customized for specific applications can make better performance than multiple instructions in aspect of fast execution time, decrease of code size, and low power consumption. Limited encoding space, however, does not allow adding application-specific and complex instructions freely to our instruction set architecture. To resolve this problem, conventional architectures increases free space for encoding by trimming excessive bits required beyond the fixed word length. This approach however shows weakness in terms of the complexity of compiler, code size and execution time. In this paper, we propose a new instruction encoding scheme based on the dynamic implied addressing mode (DIAM) to resolve limited encoding space and side-effect by trimming. Our DIAM-based approach uses a special program memory to store extra encoding information. We also suggest a code generation algorithm to fully utilize the DIAM. In our experiment, the architecture augmented with DIAMs shows about 10% code size reduction and speed up on average, as compared to the base architecture without DIAMs.

high performance computing and communications | 2009

Orthogonal Instruction Encoding for a 16-bit Embedded Processor with Dynamic Implied Addressing Mode

Jonghee M. Youn; Daeho Kim; Minwook Ahn; Yongjoo Kim; Yunheung Paek

Although 32-bit architectures are becoming the norm for modern microprocessors, 16-bit ones are still employed by many low-end processors, for which small size and low power consumption are of high priority. However, 16-bit architectures have a critical disadvantage for embedded processors that they do not provide enough encoding space to add special instructions coined for certain applications. To overcome this, many existing architectures adopt non-orthogonal, irregular instruction sets to accommodate a variety of unusual addressing modes thru which more opcodes and operands are densely encoded within the narrow instruction word. In general, these non-orthogonal architectures are regarded compiler-unfriendly as they tend to requires extremely sophisticated compiler techniques for optimal code generation. To address this issue, we propose a compiler-friendly processor with a new addressing mode, called the dynamic implied addressing mode (DIAM). In this paper, we will demonstrate that the DIAM provides more encoding space for our 16-bit processor so that we are able to support more instructions specially customized for our applications. And yet, the processor maintains a RISC-style orthogonal architecture, thereby allowing us to use traditional code generation algorithms. In our experiment, the architecture augmented with DIAMs shows 6.2% code size reduction and 3.5% performance increase on average, as compared to the basic architecture without DIAMs.

digital systems design | 2009

Iterative Algorithm for Compound Instruction Selection with Register Coalescing

Minwook Ahn; Jonghee M. Youn; Youngkyu Choi; Doosan Cho; Yunheung Paek

A compound instruction, encoding several ALU or memory operations within an instruction word, has been regarded as an efficient way of improving performance. In the compiler for embedded processors, the code generation algorithm for compound instructions has been built by dealing mainly with instruction selection which is a crucial phase of code generation. In this paper, we propose an iterative code generation algorithm for minimizing the detrimental impact of register coalescing that is applied to the code with compound instructions generated earlier from the instruction selection phase.

languages, compilers, and tools for embedded systems | 2007

Optimistic coalescing for heterogeneous register architectures

Minwook Ahn; Jooyeon Lee; Yunheung Paek

In this paper, Optimistic coalescing has been proven as an elegant and effective technique that provides better chances of safely coloring more registers in register allocation than other coalescing techniques. Its algorithm originally assumes homogeneous registers which are all gathered in the same register file. Although this register architecture is still common in most general-purpose processors, embedded processors often contain heterogeneous registers which are scattered in physically different register files dedicated for each dissimilar purpose and use. In this work, we developed a modified algorithm for optimal coalescing that helps a register allocator for an embedded processor to better handle such heterogeneity of the register architecture. In the experiment, an existing register allocator was able to achieve up to 10% reduction in code size through our coalescing, and avoid many spills that would have been generated without our scheme.

software and compilers for embedded systems | 2003

Case Studies on Automatic Extraction of Target-Specific Architectural Parameters in Complex Code Generation

Yunheung Paek; Minwook Ahn; Soonho Lee

To cope with the highly complex and irregular embedded processor architectures, we employ the two traditionally-known most aggressive and computationally expensive code generation methods. One is integrated code generation where two main subproblems of code generation, instruction selection and register allocation, are simultaneously solved. The other is directed acyclic graph (DAG) covering, not tree covering, for code generation. In principle, unifying these two expensive methods may increase compilation time prohibitively. However often in practice, we have observed that the overall time can be manageably short without degrading the code quality by adding a few heuristics that fully capitalize on specific characteristics of target processor models.

Explore More