Gang-Ryung Uh | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Gang-Ryung Uh is active.

Explore More

Publication

Featured researches published by Gang-Ryung Uh.

programming language design and implementation | 1998

Improving performance by branch reordering

Minghui Yang; Gang-Ryung Uh; David B. Whalley

The conditional branch has long been considered an expensive operation. The relative cost of conditional branches has increased as recently designed machines are now relying on deeper pipelines and higher multiple issue. Reducing the number of conditional branches executed can often result in a substantial performance benefit. This paper describes a code-improving transformation to reorder sequences of conditional branches. First, sequences of branches that can be reordered are detected in the control flow. Second, profiling information is collected to predict the probability that each branch will transfer control out of the sequence. Third, the cost of performing each conditional branch is estimated. Fourth, the most beneficial ordering of the branches based on the estimated probability and cost is selected. The most beneficial ordering often included the insertion of additional conditional branches that did not previously exist in the sequence. Finally, the control flow is restructured to refflect the new ordering. The results of applying the transformation were significant reductions in the dynamic number of instructions and branches, as well as decreases in execution time.

real time technology and applications symposium | 2004

Timing the WCET of embedded applications

Wankang Zhao; Prasad A. Kulkarni; David B. Whalley; Christopher A. Healy; Frank Mueller; Gang-Ryung Uh

It is advantageous to not only calculate the WCET of an application, but to also perform transformations to reduce the WCET since an application with a lower WCET is less likely to violate its timing constraints. In this paper we describe an environment consisting of an interactive compilation system and a timing analyzer, where a user can interactively tune the WCET of an application. After each optimization phase is applied, the timing analyzer is automatically invoked to calculate the WCET of the function being tuned. Thus, a user can easily gauge the progress of reducing the WCET. In addition, the user can apply a genetic algorithm to search for an effective optimization sequence that best reduces the WCET. Using the genetic algorithm, we show that the WCET for a number of applications can be reduced by 7% on average as compared to the default batch optimization sequence.

languages, compilers, and tools for embedded systems | 1999

Effective exploitation of a zero overhead loop buffer

Gang-Ryung Uh; Yuhong Wang; David B. Whalley; Sanjay Jinturkar; Chris Burns; Vincent Cao

A Zero Overhead Loop Buffer (ZOLB) is an architectural feature that is commonly found in DSP processors. This buffer can be viewed as a compiler managed cache that contains a sequence of instructions that will be executed a specified number of times. Unlike techniques such as loop unrolling, a loop buffer is a hardware technique that can be used to minimize loop overhead without the penalty of increasing code size. In addition, a ZOLB also requires relatively little space and power, which are both important considerations for most DSP applications. This paper describes strategies for generating code to effectively use a ZOLB. The authors have found that many common improving transformations used by optimizing compilers to improve code on conventional architectures are shown (1) to allow more loops to be placed in a ZOLB and (2) to further reduce loop overhead of the loops placed in a ZOLB. The results given in this paper demonstrate that this architectural feature can often be exploited with substantial improvements in execution time and slight reductions in code size.

ACM Transactions on Programming Languages and Systems | 2002

Efficient and effective branch reordering using profile data

Minghui Yang; Gang-Ryung Uh; David B. Whalley

The conditional branch has long been considered an expensive operation. The relative cost of conditional branches has increased as recently designed machines are now relying on deeper pipelines and higher multiple issue. Reducing the number of conditional branches executed often results in a substantial performance benefit. This paper describes a code-improving transformation to reorder sequences of conditional branches that compare a common variable to constants. The goal is to obtain an ordering where the fewest average number of branches in the sequence will be executed. First, sequences of branches that can be reordered are detected in the control flow. Second, profiling information is collected to predict the probability that each branch will transfer control out of the sequence. Third, the cost of performing each conditional branch is estimated. Fourth, the most beneficial ordering of the branches based on the estimated probability and cost is selected. The most beneficial ordering often includes the insertion of additional conditional branches that did not previously exist in the sequence. Finally, the control flow is restructured to reflect the new ordering. The results of applying the transformation are on average reductions of about 8% fewer instructions executed and 13% branches performed, as well as about a 4% decrease in execution time.

Software - Practice and Experience | 2005

Branch elimination by condition merging

William C. Kreahling; David B. Whalley; Mark W. Bailey; Xin Yuan; Gang-Ryung Uh; Robert van Engelen

Conditional branches are expensive. Branches require a significant percentage of execution cycles since they occur frequently and cause pipeline flushes when mispredicted. In addition, branches result in forks in the control flow, which can prevent other code‐improving transformations from being applied. In this paper we describe profile‐based techniques for replacing the execution of a set of two or more branches with a single branch on a conventional scalar processor. These sets of branches can include tests of multiple variables. For instance, the test if (p1 != 0 && p2 != 0), which is testing for NULL pointers, can be replaced with if (p1 & p2 != 0). Program profiling is performed to target condition merging along frequently executed paths. The results show that eliminating branches by merging conditions can significantly reduce the number of conditional branches executed in non‐numerical applications. Copyright

Software - Practice and Experience | 1999

Effectively exploiting indirect jumps

Gang-Ryung Uh; David B. Whalley

This paper describes a general code‐improving transformation that can coalesce conditional branches into an indirect jump from a table. Applying this transformation allows an optimizer to exploit indirect jumps for many other coalescing opportunities besides the translation of multiway branch statements. First, dataflow analysis is performed to detect a set of coalescent conditional branches, which are often separated by blocks of intervening instructions. Secondly, several techniques are applied to reduce the cost of performing an indirect jump operation, often requiring the execution of only two instructions on a SPARC. Finally, the control flow is restructured using code duplication to replace the set of branches with an indirect jump. Thus, the transformation essentially provides early resolution of conditional branches that may originally have been some distance from the point where the indirect jump is inserted. The transformation can be frequently applied with often significant reductions in the number of instructions executed, total cache work, and execution time. In addition, we show that with branch target buffer support, indirect jumps improve branch prediction since they cause fewer mispredictions than the set of branches they replaced. Copyright

compilers, architecture, and synthesis for embedded systems | 2002

Experience with a retargetable compiler for a commercial network processor

Jinhwan Kim; Sungjoon Jung; Yunheung Paek; Gang-Ryung Uh

The Paion PPII network processor is designed to meet the growing need for new high bandwidth network equipment. In order to rapidly reconfigure the processor for frequently varying internet services and technologies, a high performance compiler is urgently needed. Albeit various code generation techniques have been proposed for DSPs or ASIPs, we experienced these techniques are not easily tailored towards the target Paion PPII processor due to striking architectural differences. First, we will show the architectural challenges posed by the target processor. Second, novel compiler techniques will be described that effectively exploit unorthogonal architectural features. The techniques include virtual data path, compiler intrinsics, and interprocedural register allocation. Third, intermediate benchmark results will be presented to demonstrate the effectiveness of our techniques.

compiler construction | 2000

Techniques for Effectively Exploiting a Zero Overhead Loop Buffer

Gang-Ryung Uh; Yuhong Wang; David B. Whalley; Sanjay Jinturkar; Chris Burns; Vincent Cao

A Zero Overhead Loop Buffer (ZOLB) is an architectural feature that is commonly found in DSP processors. This buffer can be viewed as a compiler managed cache that contains a sequence of instructions that will be executed a specified number of times. Unlike loop unrolling, a loop buffer can be used to minimize loop overhead without the penalty of increasing code size. In addition, a ZOLB requires relatively little space and power, which are both important considerations for most DSP applications. This paper describes strategies for generating code to effectively use a ZOLB. The authors have found that many common improving transformations used by optimizing compilers to improve code on conventional architectures can be exploited (1) to allow more loops to be placed in a ZOLB, (2) to further reduce loop overhead of the loops placed in a ZOLB, and (3) to avoid redundant loading of ZOLB loops. The results given in this paper demonstrate that this architectural feature can often be exploited with substantial improvements in execution time and slight reductions in code size.

IEEE Computer Architecture Letters | 2012

An Overview of Static Pipelining

Ian Finlayson; Gang-Ryung Uh; David B. Whalley; Gary S. Tyson

A new generation of mobile applications requires reduced energy consumption without sacrificing execution performance. In this paper, we propose to respond to these conflicting demands with an innovative statically pipelined processor supported by an optimizing compiler. The central idea of the approach is that the control during each cycle for each portion of the processor is explicitly represented in each instruction. Thus the pipelining is in effect statically determined by the compiler. The benefits of this approach include simpler hardware and that it allows the compiler to perform optimizations that are not possible on traditional architectures. The initial results indicate that static pipelining can significantly reduce power consumption without adversely affecting performance.

international conference on human-computer interaction | 2011

Improving Low Power Processor Efficiency with Static Pipelining

Ian Finlayson; Gang-Ryung Uh; David B. Whalley; Gary S. Tyson

A new generation of mobile applications requires reduced energy consumption without sacrificing execution performance. In this paper, we propose to respond to these con?icting demands with an innovative statically pipelined processor supported by an optimizing compiler. The central idea of the approach is that the control during each cycle for each portion of the processor is explicitly represented in each instruction. Thus the pipelining is in effect statically determined by the compiler. The benefits of this approach include simpler hardware and that it allows the compiler to perform optimizations that are not possible on traditional architectures. The initial results indicate that static pipelining can significantly reduce power consumption without adversely affecting performance.

Explore More