Fred C. Chow
MIPS Technologies
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Fred C. Chow.
programming language design and implementation | 1997
Fred C. Chow; Sun Chan; Robert Kennedy; Shin-Ming Liu; Raymond Lo; Peng Tu
A new algorithm, SSAPRE, for performing partial redundancy elimination based entirely on SSA form is presented. It achieves optimal code motion similar to lazy code motion [KRS94a, DS93], but is formulated independently and does not involve iterative data flow analysis and bit vectors in its solution. It not only exhibits the characteristics common to other sparse approaches, but also inherits the advantages shared by other SSA-based optimization techniques. SSAPRE also maintains its output in the same SSA form as its input. In describing the algorithm, we state theorems with proofs giving our claims about SSAPRE. We also give additional description about our practical implementation of SSAPRE, and analyze and compare its performance with a bit-vector-based implementation of PRE. We conclude with some discussion of the implications of this work.
programming language design and implementation | 1988
Fred C. Chow
Inter-procedural register allocation can minimize the register usage penalty at procedure calls by reducing the saving and restoring of registers at procedure boundaries. A one-pass inter-procedural register allocation scheme based on processing the procedures in a depth-first traversal of the call graph is presented. This scheme can be overlaid on top of intra-procedural register allocation via a simple extension to the priority-based coloring algorithm. Using two different usage conventions for the registers, the scheme can distribute register saves/restores throughout the call graph even in the presence of recursion, indirect calls or separate compilation. A natural and efficient way to pass parameters emerges from this scheme. A separate technique uses data flow analysis to optimize the placement of the save/restore code for registers within individual procedures. The techniques described have been implemented in a production compiler suite. Measurements of the effects of these techniques on a set of practical programs are presented and the results analysed.
compiler construction | 1996
Fred C. Chow; Sun Chan; Shin-Ming Liu; Raymond Lo; Mark Streich
This paper addresses the problems of representing aliases and indirect memory operations in SSA form. We propose a method that prevents explosion in the number of SSA variable versions in the presence of aliases. We also present a technique that allows indirect memory operations to be globally commonized. The result is a precise and compact SSA representation based on global value numbering, called HSSA, that uniformly handles both scalar variables and indirect memory operations. We discuss the capabilities of the HSSA representation and present measurements that show the effects of implementing our techniques in a production global optimizer.
programming language design and implementation | 1998
Raymond Lo; Fred C. Chow; Robert Kennedy; Shin-Ming Liu; Peng Tu
An algorithm for register promotion is presented based on the observation that the circumstances for promoting a memory locations value to register coincide with situations where the program exhibits partial redundancy between accesses to the memory location. The recent SSAPRE algorithm for eliminating partial redundancy using a sparse SSA representation forms the foundation for the present algorithm to eliminate redundancy among memory accesses, enabling us to achieve both computational and live range optimality in our register promotion results. We discuss how to effect speculative code motion in the SSAPRE framework. We present two different algorithms for performing speculative code motion: the conservative speculation algorithm used in the absence of profile data, and the the profile-driven speculation algorithm used when profile data are available. We define the static single use (SSU) form and develop the dual of the SSAPRE algorithm, called SSUPRE, to perform the partial redundancy elimination of stores. We provide measurement data on the SPECint95 benchmark suite to demonstrate the effectiveness of our register promotion approach in removing loads and stores. We also study the relative performance of the different speculative code motion strategies when applied to scalar loads and stores.
ACM Transactions on Programming Languages and Systems | 1999
Robert Kennedy; Sun Chan; Shin-Ming Liu; Raymond Lo; Peng Tu; Fred C. Chow
The SSAPRE algorithm for performing partial redundancy elimination based entirely on SSA form is presented. The algorithm is formulated based on a new conceptual framework, the factored redundancy graph, for analyzing redundancy, and representes the first sparse approach to the classical problem and on methods for its solution. With the algorithm description, theorems and their proofs are given showing that the algorithm produces the best possible code by the criteria of computational optimality and lifetime optimality of the introduced temporaries. In addition to the base algorithm, a practical implementation of SSAPRE that exhibits additional compile-time efficiencies is described. In closing, measurement statistics are provided that characterize the instances of the partial redundancy problem from a set of benchmark programs and compare optimization time spent by an implementation of SSAPRE aganist a classical partial redundancy elimination implementation. The data lend insight into the nature of partial redundancy elimination and demonstrate the expediency of this new approach.
compiler construction | 1998
Robert Kennedy; Fred C. Chow; Peter Dahl; Shin-Ming Liu; Raymond Lo; Mark Streich
We present techniques that allow strength reduction to be performed concurrently with partial redundancy elimination in the SSAPRE framework. By sharing the characteristics inherent to SSAPRE, the resulting strength reduction algorithm exhibits many interesting attributes. We compare various aspects of the new strength reduction algorithm with previous strength reduction algorithms. We also outline and discuss our implementation of the closely related linear function test replacement optimization under the same framework.
international conference on parallel architectures and compilation techniques | 1996
Shin-Ming Liu; Raymond Lo; Fred C. Chow
Loop induction variable canonicalization transforms a loop to use a single primary induction variable that is incremented by one at the end of each iteration. In this process, other induction variables in the loop, called secondary induction variables, are removed from the loop, with their original references expressed in terms of the primary induction variable, and their loop exit values assigned to them after the loop exit. This paper presents a simple and powerful approach to loop induction variable canonicalization in which the problem is broken up into a sequence of steps. The whole canonicalization process is integrated into a global optimizer that builds an SSA representation of the program, performs global optimizations and hands the optimized code on to other components in the compiler that perform loop and instruction level parallelization. By taking advantage of optimizations that will be performed later each step in the canonicalization process can be made small and efficient. By letting generic parts of the optimizer share in the work, the approach infuses generality into the solution and provides automatic coverage of a wide variety of forms of induction variable occurrences. A number of examples are given to show the effectiveness of the approach. We also provide a set of measurements that shows that the additional compile-time cost incurred due to implementing our approach is small.
international conference on parallel architectures and compilation techniques | 2002
Jack Liu; Timothy Kong; Fred C. Chow
Traditional compilers perform their code generation tasks based on a fixed, pre-determined instruction set. This paper describes the implementation of a compiler that determines the best instruction set to use for a given program and generates efficient code sequence based on it. We first give an overview of the VISC Architecture pioneered at Cognigine that exemplifies a Variable Instruction Set Architecture. We then present three compilation techniques that, when combined, enable us to provide effective compilation and optimization support for such an architecture. The first technique involves the use of an abstract operation representation that enables the code generator to optimize towards the core architecture of the processor without committing to any specific instruction format. The second technique uses an enumeration approach to scheduling that yields near-optimal instruction schedules while still adhering to the irregular constraints imposed by the architecture. We then derive the dictionary and the instruction output based on this schedule. The third technique superimposes dictionary re-use on the enumeration algorithm to provide trade-off between program performance and dictionary budget. This enables us to make maximal use of the dictionary space without exceeding its limit. Finally, we provide measurements to show the effectiveness of these techniques.
compilers, architecture, and synthesis for embedded systems | 2002
Jack Liu; Fred C. Chow
This paper describes the implementation of an instruction scheduler that produces near-optimal results by efficiently enumerating all possible schedules. The scheduler is implemented in the context of an embedded processor that allows compile-time configuration of the instruction set to use for each program, where the need to adhere to the tight and irregular hardware constraints were the original motivation for using this approach. We present techniques that speed up the enumeration process by coming up with good initial lower bound estimates for the schedules and by pruning off large areas of the search space through implicit enumeration. In the process, we found that taking the hardware constraints into account actually helps in speeding up the enumeration. We provide performance data that show that the enumeration approach is superior to heuristics-based approaches like list scheduling for our target processor. The data also show that the compile-time overhead associated with enumeration can be effectively contained. This justifies the deployment of enumeration-based instruction schedulers in commercial compilers targeting embedded processors with similar tight constraints.
international symposium on microarchitecture | 1994
Raymond Lo; Sun Chan; Fred C. Chow; Shin-Ming Liu
The paper presents a technique called Global Instruction Distribution that globally fine-tunes the code produced for a superscalar processor. The fine-tuning is effected by distributing instructions from one block to other blocks according to the control flow graph of the program. The method does not involve instruction scheduling, but models resource usage to find the best insertion points in the target basic block. We present our implementation of GID in a production compiler, and show how the GID framework allows incorporation of additional functions targeting different optimizations. Performance measurements on the MIPS R8000 are presented to demonstrate the practicality and efficacy of this approach.