Guei-Yuan Lueh | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Guei-Yuan Lueh is active.

Explore More

Publication

Featured researches published by Guei-Yuan Lueh.

ACM Transactions on Programming Languages and Systems | 2000

Fusion-based register allocation

Guei-Yuan Lueh; Thomas R. Gross; Ali-Reza Adl-Tabatabai

The register allocation phase of a compiler maps live ranges of a program to registers. If there are more candidates than there are physical registers, the register allocator must spill a live range (the home location is in memory) or split a live range (the live range occupies multiple locations). One of the challenges for a register allocator is to deal with spilling and splitting together. Fusion-based register allocation uses the structure of the program to make splitting and spilling decisions, with the goal to move overhead operations to infrequently executed parts of a program. The basic idea of fusion-based register allocation is to build up the interference graph. Starting with some base region (e.g., a basic block, a loop), the register allocator adds basic blocks to the region and incrementallly builds the interference graph. When there are more live ranges than registers, the register allocator selects live ranges to split; these live ranges are split along the edge that was most recently added to the region. This article describes fusion-based register allocation in detail and compares it with other approaches to register allocation. For programs from the SPEC92 suite, fusion-based register allocation can improve the execution time (of optimized programs, for the MIPS architecture) by up to 8.4% over Chaitin-style register allocation.

conference on object-oriented programming systems, languages, and applications | 1996

Code reuse in an optimizing compiler

Ali-Reza Adl-Tabatabai; Thomas R. Gross; Guei-Yuan Lueh

This paper describes how the cmcc compiler reuses code---both internally (reuse between different modules) and externally (reuse between versions for different target machines). The key to reuse are the application frameworks developed for global data-flow analysis, code generation, instruction scheduling, and register allocation.The code produced by cmcc is as good as the code produced by the native compilers for the MIPS and SPARC, although significantly less resources have been spent on cmcc (overall, about 6 man years by 2.5 persons). cmcc is implemented in C++, which allowed for a compact expression of the frameworks as class hierarchies. The results support the claim that suitable frameworks facilitate reuse and thereby significantly improve developer effectiveness.

programming language design and implementation | 1997

Call-cost directed register allocation

Guei-Yuan Lueh; Thomas R. Gross

Choosing the right kind of register for a live range plays a major role in eliminating the register-allocation overhead when the compiled function is frequently executed or function tails are on the most frequently executed paths. Picking the wrong kind of register for a live range incurs a high penalty that may dominate the total overhead of register allocation. In this paper, we present three improvements, storage-class analysis, benefit-driven simplification, and preference decision that are effective in selecting the right kind of register for a live range. Then we compare an enhanced Chaitin-style register allocator (with these three improvements) with priority-based and optimistic coloring.

languages and compilers for parallel computing | 1996

Global Register Allocation Based on Graph Fusion

Guei-Yuan Lueh; Thomas R. Gross; Ali-Reza Adl-Tabatabai

A register allocator must effectively deal with three issues: live range splitting, live range spilling, and register assignment. This paper presents a new coloring-based global register allocation algorithm that addresses all three issues in an integrated way: the algorithm starts with an interference graph for each region of the program, where a region can be a basic block, a loop nest, a superblock, a trace, or another combination of basic blocks. Region formation is orthogonal to register allocation in this framework. Then the interference graphs for adjacent regions are fused to build up the complete interference graph. The algorithm delays decisions on splitting, spilling, and register assignment, and therefore, the register allocation may be better than what is obtained by a Chatin-style allocator. This algorithm uses execution probabilities, derived from either profiles or static estimates, to guide fusing interference graphs, allowing an easy integration of this register allocator into a region-based compiler.

international conference on parallel architectures and compilation techniques | 2002

Just-in-time Java compilation for the Itanium/spl reg/ processor

Tatiana Shpeisman; Guei-Yuan Lueh; Ali-Reza Adl-Tabatabai

This paper describes a just-in-time (JIT) Java compiler for the Intel/spl reg/ Itanium/spl reg/ processor. The Itanium processor is an example of an Explicitly Parallel Instruction Computing (EPIC) architecture and thus relies on aggressive and expensive compiler optimizations for performance. Static compilers for Itanium use aggressive global scheduling algorithms to extract instruction-level parallelism. In a JIT compiler, however, the additional overhead of such expensive optimizations may offset any gains from the improved code. In this paper, we describe lightweight code generation techniques for generating efficient Itanium code. Our compiler relies on two basic methods to generate efficient code. First, the compiler uses inexpensive scheduling heuristics to model the Itanium microarchitecture. Second, the compiler uses the semantics of the Java virtual machine to extract instruction-level parallelism.

Archive | 2001