Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Pohua P. Chang is active.

Publication


Featured researches published by Pohua P. Chang.


international symposium on computer architecture | 1991

IMPACT: an architectural framework for multiple-instruction-issue processors

Pohua P. Chang; Scott A. Mahlke; William Y. Chen; Nancy J. Warter; Wen-mei W. Hwu

The performance of multiple-instruction-issue processors can be severely limited by the compiler’s ability to generate efficient code for concurrent hardware. In the IMPACT project, we have developed IMPACT-I, a highly optimizing C compiler to exploit instruction level concurrency. The optimization capabilities of the IMPACT-I C compiler are summarized in this paper. Using the IMPACT-I C compiler, we ran experiments to analyze the performance of multiple-instruction-issue processors executing some important non-numerical programs. The


Software - Practice and Experience | 1991

Using profile information to assist classic code optimizations

Pohua P. Chang; Scott A. Mahlke; Wen-mei W. Hwu

This paper describes the design and implementation of an optimizing compiler that automatically generates profile information to assist classic code optimizations. This compiler contains two new components, an execution profiler and a profile‐based code optimizer, which are not commonly found in traditional optimizing compilers. The execution profiler inserts probes into the input program, executes the input program for several inputs, accumulates profile information and supplies this information to the optimizer. The profile‐based code optimizer uses the profile information to expose new optimization opportunities that are not visible to traditional global optimization methods. Experimental results show that the profile‐based code optimizer significantly improves the performance of production C programs that have already been optimized by a high‐quality global code optimizer.


international symposium on computer architecture | 1989

Achieving High Instruction Cache Performance With An Optimizing Compiler

W.W. Hisu; Pohua P. Chang

Increasing the execution power requires a high instruction issue bandwidth, and decreasing instruction encoding and applying some code improving techniques cause code expansion. Therefore, the instruction memory hierarchy performance has become an important factor of the system performance. An instruction placement algorithm has been implemented in the IMPACT-I (Illinois Microarchitecture Project using Advanced Compiler Technology - Stage I) C compiler to maximize the sequential and spatial localities, and to minimize mapping conflicts. This approach achieves low cache miss ratios and low memory traffic ratios for small, fast instruction caches with little hardware overhead. For ten realistic UNIX* programs, we report low miss ratios (average 0.5%) and low memory traffic ratios (average 8%) for a 2048-byte, direct-mapped instruction cache using 64-byte blocks. This result compares favorably with the fully associative cache results reported by other researchers. We also present the effect of cache size, block size, block sectoring, and partial loading on the cache performance. The code performance with instruction placement optimization is shown to be stable across architectures with different instruction encoding density.


Software - Practice and Experience | 1992

Profile-guided automatic inline expansion for C programs

Pohua P. Chang; Scott A. Mahlke; William Y. Chen; Wen-mei W. Hwu

This paper describes critical implementation issues that must be addressed to develop a fully automatic inliner. These issues are: integration into a compiler, program representation, hazard prevention, expansion sequence control, and program modification. An automatic inter‐file inliner that uses profile information has been implemented and integrated into an optimizing C compiler. The experimental results show that this inliner achieves significant speedups for production C programs.


programming language design and implementation | 1989

Inline function expansion for compiling C programs

Pohua P. Chang; W.-W. Hwu

Inline function expansion replaces a function call with the function body. With automatic inline function expansion, programs can be constructed with many small functions to handle complexity and then rely on the compilation to eliminate most of the function calls. Therefore, inline expansion serves a tool for satisfying two conflicting goals: minizing the complexity of the program development and minimizing the function call overhead of program execution. A simple inline expansion procedure is presented which uses profile information to address three critical issues: code expansion, stack expansion, and unavailable function bodies. Experiments show that a large percentage of function calls/returns (about 59%) can be eliminated with a modest code expansion cost (about 17%) for twelve UNIX* programs.


international symposium on microarchitecture | 1991

Data access microarchitectures for superscalar processors with compiler-assisted data prefetching

William Y. Chen; Scott A. Mahlke; Pohua P. Chang; Wen-mei W. Hwu

The performance of superscrdar processors is more sensitive to the memory system delay than their single-issue predecessors. This paper examines alternative data access microarchitectures that effectively support compilerassisted data prefetching in superscalar processors. In particular, a prefetch buffer is shown to be more effective than increasing the cache dimension in solving the cache pollution problem. All in all, we show that a small data cache with compiler-assisted data prefetching can achieve a performance level close to that of an ideal cache.


international symposium on microarchitecture | 1988

Trace selection for compiling large C application programs to microcode

Pohua P. Chang; Wen-mei W. Hwu

Microcode optimization techniques such as code scheduling and resource allocation can benefit significantly by reducing uncertainties in program control flow. A trace selection algorithm with profiling information reduces the uncertainties in program control flow by identifying sequences of frequently invoked basic blocks as traces. These traces are treated as sequential codes for optimization purposes. Optimization based on traces is especially useful when the code size is large and the control structure is complicated enough to defeat hand optimizations. However, most of the experimental results reported to date are based on small benchmarks with simple control structures. For different trace selection algorithms, we report the distribution of control transfers categorized according to their potential impact on the microcode optimizations. The experimental results are based on ten C application programs which exhibit large code size and complicated control structure. The measured data for each program is accumulated across a large number of input files to ensure the reliability of the result. All experiments are performed automatically using our IMPACT C compiler which contains integrated profiling and analysis tools.


international symposium on computer architecture | 1989

Comparing Software And Hardware Schemes For Reducing The Cost Of Branches

Wen-mei W. Hwu; Thomas M. Conte; Pohua P. Chang

Pipelining has become a common technique to increase throughput of the instruction fetch, instruction decode, and instruction execution portions of modern computers. Branch instructions disrupt the flow of instructions through the pipeline, increasing the overall execution cost of branch instructions. Three schemes to reduce the cost of branches are presented in the context of a general pipeline model. Ten realistic Unix domain programs are used to directly compare the cost and performance of the three schemes and the results are in favor of the software-based scheme. For example, the software-based scheme has a cost of 1.65 cycles/branch vs. a cost of 1.68 cycles/branch of the best hardware scheme for a highly pipelined processor (11-stage pipeline). The results are 1.19 (software scheme) vs. 1.23 cycles/branch (best hardware scheme) for a moderately pipelined processor (5-stage pipeline).


IEEE Transactions on Computers | 1993

The effect of code expanding optimizations on instruction cache design

William Y. Chen; Pohua P. Chang; Thomas M. Conte; Wen-mei W. Hwu

Shows that code expanding optimizations have strong and nonintuitive implications on instruction cache design. Three types of code expanding optimizations are studied in this paper: instruction placement, function inline expansion, and superscalar optimizations. Overall, instruction placement reduces the miss ratio of small caches. Function inline expansion improves the performance for small cache sizes, but degrades the performance of medium caches. Superscalar optimizations increase the miss ratio for all cache sizes. However, they also increase the sequentiality of instruction access so that a simple load forwarding scheme effectively cancels the negative effects. Overall, the authors show that with load forwarding, the three types of code expanding optimizations jointly improve the performance of small caches and have little effect on large caches. >


IEEE Transactions on Computers | 1995

The importance of prepass code scheduling for superscalar and superpipelined processors

Pohua P. Chang; Daniel M. Lavery; Scott A. Mahlke; William Y. Chen; Wen-mei W. Hwu

Superscalar and superpipelined processors utilize parallelism to achieve peak performance that can be several times higher than that of conventional scalar processors. In order for this potential to be translated into the speedup of real program, the compiler must be able to schedule instructions so that the parallel hardware is effectively utilized. Previous work has shown that prepass code scheduling helps to produce a better schedule for scientific programs, but the importance of prescheduling has never been demonstrated for control-intensive non-numeric programs. These programs are significantly different from the scientific programs because they contain frequent branches. The compiler must do global scheduling in order to find enough independent instructions. In this paper, the code optimizer and scheduler of the IMPACT-I C compiler is described. Within this framework, we study the importance of prepass code scheduling for a set of production C programs. It is shown that, in contrast to the results previously obtained for scientific programs, prescheduling is not important for compiling control-intensive programs to the current generation of superscalar and superpipelined processors. However, if some of the current restrictions on upward code motion can be removed in future architectures, prescheduling would substantially improve the execution time of this class of programs on both superscalar and superpipelined processors. >

Collaboration


Dive into the Pohua P. Chang's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar

Wen-mei W. Hwu

University of Illinois at Urbana–Champaign

View shared research outputs
Researchain Logo
Decentralizing Knowledge