David W. Wall | Researchain

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where David W. Wall is active.

Explore More

Publication

Featured researches published by David W. Wall.

compiler construction | 1986

Global register allocation at link time

David W. Wall

In previous work in global register allocation, the compiler colors a conflict graph constructed from liveness dataflow information, in order to allocate the same register to many variables that are not simultaneously live. If two procedures are in separately compiled modules, however, the compiler must do this allocation separately for each procedure. As a result, the two procedures might use different registers for the same global, or the same register for different locals.We can remove these problems if we delay the register allocation until link time. Our compiler produces object modules that can be linked and run without global register allocation, but includes with each object module a body of information describing how the module uses variables and procedures. A link-time register allocator then decides which variables are used most frequently, selects registers for them, and rewrites the code to reflect the decision that these variables reside in registers rather than in memory. Construction of the call graph allows us to use the same register for locals of procedures that are not simultaneously active, giving us most of the advantages of a full-scale coloring without the expense.When we use our method for 52 registers, our benchmarks speed up by 10 to 25 percent. Even with only 8 registers, the speedup can be nearly that large if we use previously collected profile information to guide the allocation. We cannot do much better, because programs whose variables all fit in registers rarely speed up by more than 30%. Moreover, profiling shows us that we usually remove 60% to 90% of the loads and stores of scalar variables that the program performs during its execution, and often much more.

programming language design and implementation | 1991

Predicting program behavior using real or estimated profiles

David W. Wall

A brush bristle cleaning system (10) which includes a handle member (16) adapted to be held in one hand of a user. The handle member (16) passes in a longitudinal direction (18). A first wire element (20) extends in an arcuate closed contour and is secured to the first handle member (16) on one end thereof. A first helical wire element (24) defining a plurality of first helical loops (26) pass around a portion of the closed contour of the first wire element (20). A second wire element (30) which extends in the longitudinal direction (18) is secured to the handle member (16) on a first end and is secured to the first wire element (20) on a second end thereof. A second helical wire element (36) passes around the second wire element (30) and defines a plurality of second helical loops (38). A brush (12) having brush bristles (14) is either longitudinally moved or transversely contacted with the helical loops of (38) and (26) to provide a cleansing action for the bristles of (14). A receptacle (44) is provided wherein the first wire elements (20) and second wire elements (30) may be releasably mounted for capturing a material being displaced from the brush bristles (14).

Code Generation | 1992

Systems for Late Code Modification

David W. Wall

Modifying code after the compiler has generated it can be useful for both optimization and instrumentation. Several years ago we designed the Mahler system, which uses link-time code modification for a variety of tools on our experimental Titan workstations. Killian’s Pixie tool works even later, translating a fully-linked MIPS executable file into a new version with instrumentation added. Recently we wanted to develop a hybrid of the two, that would let us experiment with both optimization and instrumentation on a standard workstation, preferably without requiring us to modify the normal compilers and linker. This paper describes prototypes of two hybrid systems, closely related to Mahler and Pixie. We implemented basic-block counting in both, and compare the resulting time and space expansion to those of Mahler and Pixie.

international symposium on computer architecture | 1990

Generation and analysis of very long address traces

Anita Borg; Richard E. Kessler; David W. Wall

Existing methods of generating and analyzing traces suffer from a variety of limitations including complexity, inaccuracy, short length, inflexibility, or applicability only to CISC machines. We use a trace generation mechanism based on link-time code modification which is simple to use, generates accurate long traces of multi-user programs, runs on a RISC machine, and can be flexibly controlled. On-the-fly analysis of the traces allows us to get accurate performance data for large second-level caches. We compare the performance of systems with 512K to 16M second-level caches, and show that for todays large programs, second-level caches of more than 4MB may be unnecessary. We also show that set associativity in second-level caches of more than 1MB does not significantly improve system performance. In addition, our experiments also provide insights into first-level and second-level cache line size.

programming language design and implementation | 1988

Register windows vs. register allocation

David W. Wall

A large register set can be exploited by keeping variables and constants in registers instead of in memory. Hardware register windows and compile-time or link-time global register allocation are ways to do this. A measure of the effectiveness of any of these register management schemes is how thoroughly they remove loads and stores. This measure must also count extra loads and stores executed because of window overflow or conflicts between procedures.By combining profiling, instrumentation, and in-line simulation, we measured the effectiveness of several register management schemes. These included compile-time and link-time schemes for allocating registers, and register window schemes using fixed-size or variable-sized windows. Link-time allocation based on profile information was the clear winner in some cases and did about as well as windows in the rest. Even link-time allocation based on an estimated profile was about as good as windows. Variable-sized windows sometimes did better than fixed-sized windows, but the difference was usually small.Register windows require extra logic in the data path, which may slow the machine cycle slightly, and often use more chip real estate for additional registers. Proponents of windows suppose that they trade these drawbacks for a reduction in the number of memory references they must make. Our results show that this tradeoff should be made the other way. Keep the hardware simple, because a link-time register allocator can nearly duplicate the improvement in memory reference frequency. Then the cycle time can be as small as possible, resulting in faster programs overall.

programming language design and implementation | 1994

Link-time optimization of address calculation on a 64-bit architecture

Amitabh Srivastava; David W. Wall

Compilers for new machines with 64-bit addresses must generate code that works when the memory used by the program is large. Procedures and global variables are accessed indirectly via global address tables, and calling conventions include code to establish the addressability of the appropriate tables. In the common case of a program that does not require a lot of memory, all of this can be simplified considerably, with a corresponding reduction in program size and execution time. We have used our link-time code modification system OM to perform program transformations related to global address use on the Alpha AXP. Though simple, many of these arewhole-program optimizations that can be done only when we can see the entire program at once, so link-time is an ideal occasion to perform them. This paper describes the optimizations performed and shows their effects on program size and performance. Relatively modest transformations, possible without moving code, improve the performance of SPEC benchmarks by an average of 1.5%. More ambitious transformations, requiring an understanding of program structure that is thorough but not difficult at link-time, can do even better, reducing program size by 10% or more, and improving performance by an average of 3.8%. Even a program compiled monolithically with interprocedural optimization can benefit nearly as much from this technique, if it contains statically-linked pre-compiled library code. When the benchmark sources were compiled in this way, we were still able to improve their performance by 1.35% with the modest transformations and 3.4% with the ambitious transformations.

Archive | 1989

Link-Time Code Modification

David W. Wall

Archive | 1986

Fast Printed Circuit Board Routing

David W. Wall

Archive | 1989

Two Papers on Test Pattern Generation

Norman P. Jouppi; David W. Wall

architectural support for programming languages and operating systems | 1989

A unified vector/scalar floating-point architecture

Norman P. Jouppi; Jonathan Bertoni; David W. Wall

Explore More

Collaboration

Dive into the David W. Wall's collaboration.

Top Co-Authors

Norman P. Jouppi

Stanford University

View shared research outputs

Top Co-Authors

Amitabh Srivastava

Pennsylvania State University

View shared research outputs

Top Co-Authors

Michael L. Powell

University of California

View shared research outputs

Top Co-Authors

Richard E. Kessler

University of Wisconsin-Madison

View shared research outputs

Explore More

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot

Dive into the research topics where David W. Wall is active.

Publication

Featured researches published by David W. Wall.

Global register allocation at link time

Predicting program behavior using real or estimated profiles

Systems for Late Code Modification

Generation and analysis of very long address traces

Register windows vs. register allocation

Link-time optimization of address calculation on a 64-bit architecture

Link-Time Code Modification

Fast Printed Circuit Board Routing

Two Papers on Test Pattern Generation

A unified vector/scalar floating-point architecture

Collaboration

Dive into the David W. Wall's collaboration.