Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Taewook Oh is active.

Publication


Featured researches published by Taewook Oh.


international conference on parallel architectures and compilation techniques | 2008

Edge-centric modulo scheduling for coarse-grained reconfigurable architectures

Hyunchul Park; Kevin Fan; Scott A. Mahlke; Taewook Oh; Hee-seok Kim; Hong-seok Kim

Coarse-grained reconfigurable architectures (CGRAs) present an appealing hardware platform by providing the potential for high computation throughput, scalability, low cost, and energy efficiency. CGRAs consist of an array of function units and register files often organized as a two dimensional grid. The most difficult challenge in deploying CGRAs is compiler scheduling technology that can efficiently map software implementations of compute intensive loops onto the array. Traditional schedulers focus on the placement of operations in time and space. With CGRAs, the challenge of placement is compounded by the need to explicitly route operands from producers to consumers. To systematically attack this problem, we take an edge-centric approach to modulo scheduling that focuses on the routing problem as its primary objective. With edge-centric modulo scheduling (EMS), placement is a by-product of the routing process, and the schedule is developed by routing each edge in the dataflow graph. Routing cost metrics provide the scheduler with a global perspective to guide selection. Experiments on a wide variety of compute-intensive loops from the multimedia domain show that EMS improves throughput by 25% over traditional iterative modulo scheduling, and achieves 98% of the throughput of simulated annealing techniques at a fraction of the compilation time.


programming language design and implementation | 2011

Parallelism orchestration using DoPE: the degree of parallelism executive

Arun Raman; Hanjun Kim; Taewook Oh; Jae W. Lee; David I. August

In writing parallel programs, programmers expose parallelism and optimize it to meet a particular performance goal on a single platform under an assumed set of workload characteristics. In the field, changing workload characteristics, new parallel platforms, and deployments with different performance goals make the programmers development-time choices suboptimal. To address this problem, this paper presents the Degree of Parallelism Executive (DoPE), an API and run-time system that separates the concern of exposing parallelism from that of optimizing it. Using the DoPE API, the application developer expresses parallelism options. During program execution, DoPEs run-time system uses this information to dynamically optimize the parallelism options in response to the facts on the ground. We easily port several emerging parallel applications to DoPEs API and demonstrate the DoPE run-time systems effectiveness in dynamically optimizing the parallelism for a variety of performance goals.


ieee international conference on high performance computing data and analytics | 2011

A survey of the practice of computational science

Prakash Prabhu; Hanjun Kim; Taewook Oh; Thomas B. Jablin; Nick P. Johnson; Matthew Zoufaly; Arun Raman; Feng Liu; David Walker; Yun Zhang; Soumyadeep Ghosh; David I. August; Jialu Huang; Stephen R. Beard

Computing plays an indispensable role in scientific research. Presently, researchers in science have different problems, needs, and beliefs about computation than professional programmers. In order to accelerate the progress of science, computer scientists must understand these problems, needs, and beliefs. To this end, this paper presents a survey of scientists from diverse disciplines, practicing computational science at a doctoral-granting university with very high re search activity. The survey covers many things, among them, prevalent programming practices within this scientific community, the importance of computational power in different fields, use of tools to enhance performance and soft ware productivity, computational resources leveraged, and prevalence of parallel computation. The results reveal several patterns that suggest interesting avenues to bridge the gap between scientific researchers and programming tools developers.


languages, compilers, and tools for embedded systems | 2009

Recurrence cycle aware modulo scheduling for coarse-grained reconfigurable architectures

Taewook Oh; Bernhard Egger; Hyunchul Park; Scott A. Mahlke

In high-end embedded systems, coarse-grained reconfigurable architectures (CGRA) continue to replace traditional ASIC designs. CGRAs offer high performance at a low power consumption, yet provide flexibility through programmability. In this paper we introduce a recurrence cycle-aware scheduling technique for CGRAs. Our modulo scheduler groups operations belonging to a recurrence cycle into a clustered node and then computes a scheduling order for those clustered nodes. Deadlocks that arise when two or more recurrence cycles depend on each other are resolved by using heuristics that favor recurrence cycles with long recurrence delays. While with previous work one had to sacrifice either a fast compilation speed in order to get good quality results, or vice versa, this is not necessary anymore with the proposed recurrence cycle-aware scheduling technique. We have implemented the proposed method into our in-house CGRA chip and compiler solution and show that the technique achieves better quality schedules than schedulers based on simulated annealing at a 170-fold speed increase.


asia and south pacific design automation conference | 2006

Conversion of reference C code to dataflow model: H.264 encoder case study

Hyeyoung Hwang; Taewook Oh; Hyunuk Jung; Soonhoi Ha

Model-based design is widely accepted in developing complex embedded system under intense time-to-market pressure. While it promises improved design productivity, the main bottleneck lies not in the design methodology but in constructing the initial algorithm representation in the specified model. It is particularly true if a complicated multimedia application is given in the form of a sequential reference C code. In this paper we propose a systematic procedure for converting a sequential C code to a dataflow specification that has been widely used in many design environments for DSP systems. The proposed technique is successfully applied to H.264 encoder algorithm as a case study


programming language design and implementation | 2013

Fast condensation of the program dependence graph

Nick P. Johnson; Taewook Oh; Ayal Zaks; David I. August

Aggressive compiler optimizations are formulated around the Program Dependence Graph (PDG). Many techniques, including loop fission and parallelization are concerned primarily with dependence cycles in the PDG. The Directed Acyclic Graph of Strongly Connected Components (DAGSCC) represents these cycles directly. The naive method to construct the DAGSCC first computes the full PDG. This approach limits adoption of aggressive optimizations because the number of analysis queries grows quadratically with program size, making DAGSCC construction expensive. Consequently, compilers optimize small scopes with weaker but faster analyses. We observe that many PDG edges do not affect the DAGSCC and that ignoring them cannot affect clients of the DAGSCC. Exploiting this insight, we present an algorithm to omit those analysis queries to compute the DAGSCC efficiently. Across 366 hot loops from 20 SPEC2006 benchmarks, this method computes the DAGSCC in half of the time using half as many queries.


architectural support for programming languages and operating systems | 2013

Practical automatic loop specialization

Taewook Oh; Hanjun Kim; Nick P. Johnson; Jae W. Lee; David I. August

Program specialization optimizes a program with respect to program invariants, including known, fixed inputs. These invariants can be used to enable optimizations that are otherwise unsound. In many applications, a program input induces predictable patterns of values across loop iterations, yet existing specializers cannot fully capitalize on this opportunity. To address this limitation, we present Invariant-induced Pattern based Loop Specialization (IPLS), the first fully-automatic specialization technique designed for everyday use on real applications. Using dynamic information-flow tracking, IPLS profiles the values of instructions that depend solely on invariants and recognizes repeating patterns across multiple iterations of hot loops. IPLS then specializes these loops, using those patterns to predict values across a large window of loop iterations. This enables aggressive optimization of the loop; conceptually, this optimization reconstructs recurring patterns induced by the input as concrete loops in the specialized binary. IPLS specializes real-world programs that prior techniques fail to specialize without requiring hints from the user. Experiments demonstrate a geomean speedup of 14.1% with a maximum speedup of 138% over the original codes when evaluated on three script interpreters and eleven scripts each.


international symposium on computer architecture | 2015

DynaSpAM: dynamic spatial architecture mapping using out of order instruction schedules

Feng Liu; Heejin Ahn; Stephen R. Beard; Taewook Oh; David I. August

Spatial architectures are more efficient than traditional Out-of-Order (OOO) processors for computationally intensive programs. However, spatial architectures require mapping a program, either statically or dynamically, onto the spatial fabric. Static methods can generate efficient mappings, but they cannot adapt to changing workloads and are not compatible across hardware generations. Current dynamic methods are adaptive and compatible, but do not optimize as well due to their limited use of speculation and small mapping scopes. To overcome the limitations of existing dynamic mapping methods for spatial architectures, while minimizing the inefficiencies inherent in OOO superscalar processors, this paper presents DynaSpAM (Dynamic Spatial Architecture Mapping), a framework that tightly couples a spatial fabric with an OOO pipeline. DynaSpAM coaxes the OOO processor into producing an optimized mapping with a simple modification to the processors scheduler. The insight behind DynaSpAM is that todays powerful OOO processors do for themselves most of the work necessary to produce a highly optimized mapping for a spatial architecture, including aggressively speculating control and memory dependences, and scheduling instructions using a large window. Evaluation of DynaSpAM shows a geomean speedup of 1.42× for 11 benchmarks from the Rodinia benchmark suite with a geomean 23.9% reduction in energy consumption compared to an 8-issue OOO pipeline.


symposium on code generation and optimization | 2017

A collaborative dependence analysis framework

Nick P. Johnson; Jordan Fix; Stephen R. Beard; Taewook Oh; Thomas B. Jablin; David I. August

Compiler optimizations discover facts about program behavior by querying static analysis. However, developing or extending precise analysis is difficult. Some prior works implement analysis with a single algorithm, but the algorithm becomes more complex as it is extended for greater precision. Other works achieve modularity by implementing several simple algorithms and trivially composing them to report the best result from among them. Such a modular approach has limited precision because it employs only one algorithm in response to one query, without synergy between algorithms. This paper presents a framework for dependence analysis algorithms to collaborate and achieve precision greater than the trivial combination of those algorithms. With this framework, developers can achieve the high precision of complex analysis algorithms through collaboration of simple and orthogonal algorithms, without sacrificing the ease of implementation of the modular approach. Results demonstrate that collaboration of simple analyses enables advanced compiler optimizations.


international conference on embedded computer systems architectures modeling and simulation | 2007

Communication architecture simulation on the virtual synchronization framework

Taewook Oh; Youngmin Yi; Soonhoi Ha

As multi-processor system-on-chip (MPSoC) has become an effective solution to ever-increasing design complexity of modern embedded systems, fast and accurate HW/SW cosimulation of such system becomes more important to explore wide design space of communication architecture. Recently we have proposed the trace-driven virtual synchronization technique to boost the cosimulation speed while accuracy is almost preserved, where simulation of communication architectures is separated from simulation of the processing components. This paper proposes two methods of simulation modeling of communication architectures in the trace-driven virtual synchronization framework: SystemC modeling and C modeling. SystemC modeling gives better extensibility and accuracy but lower performance than C modeling as confirmed by experimental results. Fast reconfiguration of communication architecture is available in both methods to enable efficient design space exploration.

Collaboration


Dive into the Taewook Oh's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Feng Liu

Princeton University

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Jae W. Lee

Sungkyunkwan University

View shared research outputs
Researchain Logo
Decentralizing Knowledge