Timothy J. Harvey | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Timothy J. Harvey is active.

Explore More

Publication

Featured researches published by Timothy J. Harvey.

Software - Practice and Experience | 1998

Practical improvements to the construction and destruction of static single assignment form

Preston Briggs; Keith D. Cooper; Timothy J. Harvey; L. Taylor Simpson

Static Single Assignment (SSA) form is a program representation that is becoming increasingly popular for compiler‐based code optimization. In this paper, we address three problems that have arisen in our use of SSA form. Two are variations to the SSA construction algorithms presented by Cytron et al.1 The first variation is a version of SSA form that we call ‘semi‐pruned’ SSA. It offers an attractive trade‐off between the cost of global data‐flow analysis required to build ‘pruned’ SSA and the large number of unused ϕ‐functions found in minimal SSA. The second variation speeds up the program renaming process by efficiently manipulating the stacks of names used during renaming. Our improvement reduces the number of pushes performed, in addition to more efficiently locating the stacks that should be popped. To convert code in SSA form back into an executable form, the compiler must use an algorithm that replaces ϕ‐functions with appropriately‐placed copy instructions. The algorithm given by Cytron et al. for inserting copies produces incorrect results in some situations; particularly in cases like instruction scheduling, where the compiler may not be able to split ‘critical edges’, and in the aftermath of optimizations that aggressively rewrite the name space, like some forms of global value numbering.2 We present a new algorithm for inserting copy instructions to replace ϕ‐functions. It fixes the problems that we have encountered with the original copy insertion algorithm. We present experimental results that demonstrate the effectiveness of the first two improvements not only during the construction of SSA form, but also in the time saved by subsequent optimization passes that use a smaller representation of the program.

languages, compilers, and tools for embedded systems | 2005

ACME: adaptive compilation made efficient

Keith D. Cooper; Alexander Grosul; Timothy J. Harvey; Steven W. Reeves; Devika Subramanian; Linda Torczon; Todd Waterman

Research over the past five years has shown significant performance improvements using a technique called adaptive compilation. An adaptive compiler uses a compile-execute-analyze feedback loop to find the combination of optimizations and parameters that minimizes some performance goal, such as code size or execution time.Despite its ability to improve performance, adaptive compilation has not seen widespread use because of two obstacles: the large amounts of time that such systems have used to perform the many compilations and executions prohibits most users from adopting these systems, and the complexity inherent in a feedback-driven adaptive system has made it difficult to build and hard to use.A significant portion of the adaptive compilation process is devoted to multiple executions of the code being compiled. We have developed a technique called virtual execution to address this problem. Virtual execution runs the program a single time and preserves information that allows us to accurately predict the performance of different optimization sequences without running the code again. Our prototype implementation of this technique significantly reduces the time required by our adaptive compiler.In conjunction with this performance boost, we have developed a graphical-user interface (GUI) that provides a controlled view of the compilation process. By providing appropriate defaults, the interface limits the amount of information that the user must provide to get started. At the same time, it lets the experienced user exert fine-grained control over the parameters that control the system.

architectural support for programming languages and operating systems | 1998

Compiler-controlled memory

Keith D. Cooper; Timothy J. Harvey

Optimizations aimed at reducing the impact of memory operations on execution speed have long concentrated on improving cache performance. These efforts achieve a. reasonable level of success. The primary limit on the compilers ability to improve memory behavior is its imperfect knowledge about the run-time behavior of the program. The compiler cannot completely predict runtime access patterns.There is an exception to this rule. During the register allocation phase, the compiler often must insert substantial amounts of spill code; that is, instructions that move values from registers to memory and back again. Because the compiler itself inserts these memory instructions, it has more knowledge about them than other memory operations in the program.Spill-code operations are disjoint from the memory manipulations required by the semantics of the program being compiled, and, indeed, the two can interfere in the cache. This paper proposes a hardware solution to the problem of increased spill costs---a small compiler-controlled memory (CCM) to hold spilled values. This small random-access memory can (and should) be placed in a distinct address space from the main memory hierarchy. The compiler can target spill instructions to use the CCM, moving most compiler-inserted memory traffic out of the pathway to main memory and eliminating any impact that those spill instructions would have on the state of the main memory hierarchy. Such memories already exist on some DSP microprocessors. Our techniques can be applied directly on those chips.This paper presents two compiler-based methods to exploit such a memory, along with experimental results showing that speedups from using CCM may be sizable. It shows that using the register allocations coloring paradigm to assign spilled values to memory can greatly reduce the amount of memory required by a program.

programming language design and implementation | 2002

Fast copy coalescing and live-range identification

Zoran Budimlic; Keith D. Cooper; Timothy J. Harvey; Ken Kennedy; Timothy S. Oberg; Steven W. Reeves

This paper presents a fast new algorithm for modeling and reasoning about interferences for variables in a program without constructing an interference graph. It then describes how to use this information to minimize copy insertion for &fgr;-node instantiation during the conversion of the static single assignment (SSA) form into the control-flow graph (CFG), effectively yielding a new, very fast copy coalescing and live-range identification algorithm.This paper proves some properties of the SSA form that enable construction of data structures to compute interference information for variables that are considered for folding. The asymptotic complexity of our SSA-to-CFG conversion algorithm is where-is the number of instructions in the program.Performing copy folding during the SSA-to-CFG conversion eliminates the need for a separate coalescing phase while simplifying the intermediate code. This may make graph-coloring register allocation more practical in just in time (JIT) and other time-critical compilers For example, Suns Hotspot Server Compiler already employs a graph-coloring register allocator[10].This paper also presents an improvement to the classical interference-graph based coalescing optimization that shows adecrease in memory usage of up to three orders of magnitude and a decrease of a factor of two in compilation time, while providing the exact same results.We present experimental results that demonstrate that our algorithm is almost as precise (within one percent on average) as the improved interference-graph-based coalescing algorithm, while requiring three times less compilation time.

acm sigplan symposium on principles and practice of parallel programming | 1993

Experiences using the ParaScope Editor: an interactive parallel programming tool

Mary W. Hall; Timothy J. Harvey; Ken Kennedy; Nathaniel McIntosh; Kathryn S. McKinley; Jeffrey D. Oldham; Michael H. Paleczny; Gerald Roth

The ParaScope project is building an integrated collection of tools to help scientific programmers develop correct and efficient parallel programs. The centerpiece of this collection is the ParaScope Editor, an intelligent interactive editor for parallel FORTRAN programs. The ParaScope Editor displays data dependencies, which correspond to potential data races among the iterations of a parallel loop, to assist the user in determining the correctness of a proposed parallelization. In addition, it uses dependencies to support a variety of program transformations selectable by the programmer. The eventual goal for the ParaScope Editor is to support arbitrary editing changes by performing full incremental data dependence analysis in response to program changes. In addition, it will understand and recognize when synchronization correctly prevents race conditions. The ParaScope Editor is a new kind of program construction tool; one that not only manages text, but also presents the user with insights into the semantic structure of the program being constructed.

The Journal of Supercomputing | 2006

Exploring the structure of the space of compilation sequences using randomized search algorithms

Keith D. Cooper; Alexander Grosul; Timothy J. Harvey; Steven W. Reeves; Devika Subramanian; Linda Torczon; Todd Waterman

Modern optimizing compilers apply a fixed sequence of optimizations, which we call a compilation sequence, to each program that they compile. These compilers let the user modify their behavior in a small number of specified ways, using command-line flags (e.g.,-O1,-O2,...). For five years, we have been working with compilers that automatically select an appropriate compilation sequence for each input program. These adaptive compilers discover a good compilation sequence tailored to the input program, the target machine, and a user-chosen objective function. We have shown, as have others, that program-specific sequences can produce better results than any single universal sequence [1, 7, 10, 21, 23] Our adaptive compiler looks for compilation sequences in a large and complex search space. Its typical compilation sequence includes 10 passes (with possible repeats) chosen from the 16 available—there are 1610 or [1,099,511,627,776] such sequences. To learn about the properties of such spaces, we have studied subspaces that consist of 10 passes drawn from a set of 5 (510 or 9,765,625 sequences). These 10-of-5 subspaces are small enough that we can analyze them thoroughly but large enough to reflect important properties of the full spaces.This paper reports, in detail, on our analysis of several of these subspaces and on the consequences of those observed properties for the design of search algorithms.

Software - Practice and Experience | 1998

How to build an interference graph

Keith D. Cooper; Timothy J. Harvey; Linda Torczon

The design and implementation of an interference graph is critical to the performance of a graph‐coloring register allocator. The cost of constructing and manipulating the interference graph dominates the overall cost of allocation. The literature on graph‐coloring register allocation suggests the use of a bit matrix coupled with lists of edges to represent the graph.1–3 Recently, George and Appel4 claimed that their tests show better results using a hash table. This paper examines the trade‐offs between these two approaches. Our experiments were conducted with an optimistic, Chaitin‐style register allocator.5 We believe, however, that the lessons learned in the experiment are applicable to any program that needs to build and manipulate large graphs. For most graphs, we obtained our best results, in terms of both time and space, using a modification of the data structures suggested by both Chaitin and Briggs that we call the split bit‐matrix method. On a few large graphs, we found that a closed hash‐table with the universal hash function suggested by Cormen et al.6 ran faster than the split bit‐matrix method. We found one case where it used less space. This suggests that the split bit‐matrix technique should be the method of choice, unless the compiler regularly encounters large interference graphs. In that case, the best strategy might be to implement both data structures behind a common interface, and switch between them based on graph size.

compiler construction | 2008

An adaptive strategy for inline substitution

Keith D. Cooper; Timothy J. Harvey; Todd Waterman

Inline substitution is an optimization that replaces a procedure call with the body of the procedure that it calls. Inlining has the immediate benefit of reducing the overhead associated with the call, including register saves and restores, parameter evaluation, and activation record setup and teardown. It has secondary benefits that arise from providing greater context for global optimizations. These benefits can be offset by the effects of increased code size, and by deleterious interactions with other optimizations, such as register allocation. The difficult aspect of inline substitution is choosing which calls to inline. Previous work has focused on static, one-size-fits-all heuristics. This paper presents a feedback-driven adaptive scheme that derives a programspecific inlining heuristic. The key contributions of this work are: (1) a novel parameterization scheme for the inliner that makes it susceptible to fine-grained external control, (2) a scheme for discretizing large integer parameter spaces, and (3) effective search techniques for the resulting search space. This work provides a proof of concept that can provide insight into the design of adaptive controllers for other optimizations with complex decision heuristics. Our goal in this work is not to exhibit the worlds best inliner. Instead, we present evidence to suggest that a program-specific, adaptive scheme is needed to achieve the best results.

languages, compilers, and tools for embedded systems | 2004