Jennifer-Ann M. Anderson

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Jennifer-Ann M. Anderson is active.

Explore More

Publication

Featured researches published by Jennifer-Ann M. Anderson.

IEEE Computer | 1996

Maximizing multiprocessor performance with the SUIF compiler

Mary W. Hall; Jennifer-Ann M. Anderson; Saman P. Amarasinghe; Brian R. Murphy; Shih-Wei Liao; Edouard Bugnion; Monica S. Lam

This article describes automatic parallelization techniques in the SUIF (Stanford University Intermediate Format) compiler that result in good multiprocessor performance for array-based numerical programs. Parallelizing compilers for multiprocessors face many hurdles. However, SUIFs robust analysis and memory optimization techniques enabled speedups on three fourths of the NAS and SPECfp95 benchmark programs.

Sigplan Notices | 1994

SUIF: an infrastructure for research on parallelizing and optimizing compilers

Robert P. Wilson; Robert S. French; Christopher S. Wilson; Saman P. Amarasinghe; Jennifer-Ann M. Anderson; Steven W. K. Tjiang; Shih-Wei Liao; Chau-Wen Tseng; Mary W. Hall; Monica S. Lam; John L. Hennessy

Compiler infrastructures that support experimental research are crucial to the advancement of high-performance computing. New compiler technology must be implemented and evaluated in the context of a complete compiler, but developing such an infrastructure requires a huge investment in time and resources. We have spent a number of years building the SUIF compiler into a powerful, flexible system, and we would now like to share the results of our efforts.SUIF consists of a small, clearly documented kernel and a toolkit of compiler passes built on top of the kernel. The kernel defines the intermediate representation, provides functions to access and manipulate the intermediate representation, and structures the interface between compiler passes. The toolkit currently includes C and Fortran front ends, a loop-level parallelism and locality optimizer, an optimizing MIPS back end, a set of compiler development tools, and support for instructional use.Although we do not expect SUIF to be suitable for everyone, we think it may be useful for many other researchers. We thus invite you to use SUIF and welcome your contributions to this infrastructure. Directions for obtaining the SUIF software are included at the end of this paper.

programming language design and implementation | 1993

Global optimizations for parallelism and locality on scalable parallel machines

Jennifer-Ann M. Anderson; Monica S. Lam

Data locality is critical to achieving high performance on large-scale parallel machines. Non-local data accesses result in communication that can greatly impact performance. Thus the mapping, or decomposition, of the computation and data onto the processors of a scalable parallel machine is a key issue in compiling programs for these architectures. This paper describes a compiler algorithm that automatically finds computation and data decompositions that optimize both parallelism and locality. This algorithm is designed for use with both distributed and shared address space machines. The scope of our algorithm is dense matrix computations where the array accesses are affine functions of the loop indices. Our algorithm can handle programs with general nestings of parallel and sequential loops. We present a mathematical framework that enables us to systematically derive the decompositions. Our algorithm can exploit parallelism in both fully parallelizable loops as well as loops that require explicit synchronization. The algorithm will trade off extra degrees of parallelism to eliminate communication. If communication is needed, the algorithm will try to introduce the least expensive forms of communication into those parts of the program that are least frequently executed.

acm sigplan symposium on principles and practice of parallel programming | 1995

Data and computation transformations for multiprocessors

Jennifer-Ann M. Anderson; Saman P. Amarasinghe; Monica S. Lam

Effective memory hierarchy utilization is critical to the performance of modern multiprocessor architectures. We have developed the first compiler system that fully automatically parallelizes sequential programs and changes the original array layouts to improve memory system performance. Our optimization algorithm consists of two steps. The first step chooses the parallelization and computation assignment such that synchronization and data sharing are minimized. The second step then restructures the layout of the data in the shared address space with an algorithm that is based on a new data transformation framework. We ran our compiler on a set of application programs and measured their performance on the Stanford DASH multiprocessor. Our results show that the compiler can effectively optimize parallelism in conjunction with memory subsystem performance.

architectural support for programming languages and operating systems | 1996

Compiler-directed page coloring for multiprocessors

Edouard Bugnion; Jennifer-Ann M. Anderson; Todd C. Mowry; Mendel Rosenblum; Monica S. Lam

This paper presents a new technique, compiler-directed page coloring, that eliminates conflict misses in multiprocessor applications. It enables applications to make better use of the increased aggregate cache size available in a multiprocessor. This technique uses the compilers knowledge of the access patterns of the parallelized applications to direct the operating systems virtual memory page mapping strategy. We demonstrate that this technique can lead to significant performance improvements over two commonly used page mapping strategies for machines with either direct-mapped or two-way set-associative caches. We also show that it is complementary to latency-hiding techniques such as prefetching.We implemented compiler-directed page coloring in the SUIF parallelizing compiler and on two commercial operating systems. We applied the technique to the SPEC95fp benchmark suite, a representative set of numeric programs. We used the SimOS machine simulator to analyze the applications and isolate their performance bottlenecks. We also validated these results on a real machine, an eight-processor 350MHz Digital AlphaServer. Compiler-directed page coloring leads to significant performance improvements for several applications. Overall, our technique improves the SPEC95fp rating for eight processors by 8% over Digital UNIXs page mapping policy and by 20% over a page coloring, a standard page mapping policy. The SUIF compiler achieves a SPEC95fp ratio of 57.4, the highest ratio to date.

programming language design and implementation | 1997

Data distribution support on distributed shared memory multiprocessors

Rohit Chandra; Ding-Kai Chen; Robert Cox; Dror E. Maydan; Nenad Nedeljkovic; Jennifer-Ann M. Anderson

Cache-coherent multiprocessors with distributed shared memory are becoming increasingly popular for parallel computing. However, obtaining high performance on these machines mquires that an application execute with good data locality. In addition to making efiective use of caches, it is often necessary to distribute data structures across the local memories of the processing nodes, thereby reducing the latency of cache misses.We have designed a set of abstractions for performing data distribution in the context of explicitly parallel programs and implemented them within the SGI MIPSpro compiler system. Our system incorporates many unique features to enhance both programmability and performance. We address the former by providing a very simple programmming model with extensive support for error detection. Regarding performance, we carefully design the user abstractions with the underlying compiler optimizations in mind, we incorporate several optimization techniques to generate efficient code for accessing distributed data, and we provide a tight integration of these techniques with other optimizations within the compiler Our initial experience suggests that the directives are easy to use and can yield substantial performance gains, in some cases by as much as a factor of 3 over the same codes without distribution.

languages and compilers for parallel computing | 1993

An Overview of a Compiler for Scalable Parallel Machines

Saman P. Amarasinghe; Jennifer-Ann M. Anderson; Monica S. Lam; Amy W. Lim

This paper presents an overview of a parallelizing compiler to automatically generate efficient code for large-scale parallel architectures from sequential input programs. This research focuses on loop-level parallelism in dense matrix computations. We illustrate the basic techniques the compiler uses by describing the entire compilation process for a simple example.

international symposium on microarchitecture | 1996

Multiprocessors from a software perspective

Saman P. Amarasinghe; Jennifer-Ann M. Anderson; Christopher S. Wilson; Shih-Wei Liao; Brian R. Murphy; Robert S. French; Monica S. Lam; Mary W. Hall

Like many architectural techniques that originated with mainframes. the use of multiple processors in a single computer is becoming popular in workstations and even personal computers. Multiprocessors constitute a significant percentage of recent workstation sales, and highly affordable multiprocessor personal computers are available in local computer stores. Once again, we find ourselves in a familiar situation: hardware is ahead of software. Because of the complexity of parallel programming, multiprocessors today are rarely used to speed up individual applications. Instead, they usually function as cycle-servers that achieve increased system throughput by running multiple tasks simultaneously. Automatic parallelization by a compiler is a particularly attractive approach to software development for multiprocessors, as it enables ordinary sequential programs to take advantage of the multiprocessor hardware without user involvement. This article looks to the future by examining some of the latest research results in automatic parallelization technology.

international conference on supercomputing | 1995

Unified compilation techniques for shared and distributed address space machines

Chau-Wen Tseng; Jennifer-Ann M. Anderson; Saman P. Amarasinghe; Monica S. Lam

Parallel machines with shared address spaces are easy to program because they provide hardware support that allows each processor to transparently access non-local data. However, obtaining scalable performance can be difficult due to memory access and synchronization overhead. In this paper, we use profiling and simulation studies to identify the sources of parallel overhead. We demonstrate that compilation techniques for distributed address space machines can be very effective when used in compilers for shared address space machines. Automatic data decomposition can co-locate data and computation to improve locality. Data reorganization transformations can reduce harmful cache effects. Communication analysis can eliminate barrier synchronization. We present a set of unified compilation techniques that exemplify this convergence in compilers for shared and distributed address space machines, and illustrate their effectiveness using two example applications.

languages and compilers for parallel computing | 1991

Hierarchical Concurrency in Jade

Daniel J. Scales; Martin C. Rinard; Monica S. Lam; Jennifer-Ann M. Anderson

Jade is a data-oriented language for exploiting coarse-grain parallelism. A Jade programmer simply augments a serial program with assertions specifying how the program accesses data. The Jade implementation dynamically interprets these assertions, using them to execute the program concurrently while enforcing the programs data dependence constraints. Jade has been implemented as extensions to C, FORTRAN, and C++, and currently runs on the Encore Multimax, Silicon Graphics IRIS 4D/240S, and the Stanford DASH multiprocessors. In this paper, we show how Jade programmers can naturally express hierarchical concurrency patterns by specifying how a program uses hierarchically structured data.

Explore More