Is this you? Create Your Porfile

Carl A. Waldspurger

Massachusetts Institute of Technology

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Carl A. Waldspurger is active.

Explore More

Publication

Featured researches published by Carl A. Waldspurger.

symposium on operating systems principles | 1997

Continuous profiling: where have all the cycles gone?

Jennifer-Ann M. Anderson; Lance M. Berc; Jeffrey Dean; Sanjay Ghemawat; Monika Rauch Henzinger; Shun-Tak Leung; Richard L. Sites; Mark T. Vandevoorde; Carl A. Waldspurger; William E. Weihl

This article describes the Digital Continuous Profiling Infrastructure, a sampling-based profiling system designed to run continuously on production systems. The system supports multiprocessors, works on unmodified executables, and collects profiles for entire systems, including user programs, shared libraries, and the operating system kernel. Samples are collected at a high rate (over 5200 samples/sec. per 333MHz processor), yet with low overhead (1–3% slowdown for most workloads). Analysis tools supplied with the profiling system use the sample data to produce a precise and accurate accounting, down to the level of pipeline stalls incurred by individual instructions, of where time is bring spent. When instructions incur stalls, the tools identify possible reasons, such as cache misses, branch mispredictions, and functional unit contention. The fine-grained instruction-level analysis guides users and automated optimizers to the causes of performance problems and provides important insights for fixing them.

international symposium on microarchitecture | 1997

ProfileMe : hardware support for instruction-level profiling on out-of-order processors

Jeffrey Dean; James E. Hicks; Carl A. Waldspurger; William E. Weihl; George Z. Chrysos

Profile data is valuable for identifying performance bottlenecks and guiding optimizations. Periodic sampling of a processors performance monitoring hardware is an effective, unobtrusive way to obtain detailed profiles. Unfortunately, existing hardware simply counts events, such as cache misses and branch mispredictions, and cannot accurately attribute these events to instructions, especially on out-of-order machines. We propose an alternative approach, called ProfileMe, that samples instructions. As a sampled instruction moves through the processor pipeline, a detailed record of all interesting events and pipeline stage latencies is collected. ProfileMe also supports paired sampling, which captures information about the interactions between concurrent instructions, revealing information about useful concurrency and the utilization of various pipeline stages while an instruction is in flight. We describe an inexpensive hardware implementation of ProfileMe, outline a variety of software techniques to extract useful profile information from the hardware, and explain several ways in which this information can provide valuable feedback for programmers and optimizers.

international symposium on computer architecture | 1993

Register relocation: flexible contexts for multithreading

Carl A. Waldspurger; William E. Weihl

Multithreading is an important technique that improves processor utilization by allowing computation to be overlapped with the long latency operations that commonly occur in multiprocessor systems. This paper presents register relocation, a new mechanism that efficiently supports flexible partitioning of the register file into variable-size contexts with minimal hardware support. Since the number of registers required by thread contexts varies, this flexibility permits a better utilization of scarce registers, allowing more contexts to be resident, which in turn allows applications to tolerate shorter run lengths and longer latencies. Our experiments show that compared to fixed-size hardware contexts, register relocation can improve processor utilization by a factor of two for many workloads.

international workshop on object orientation in operating systems | 1996

An object-oriented framework for modular resource management

Carl A. Waldspurger; William E. Weihl

The authors present a flexible object-oriented framework for specifying modular resource management policies in concurrent systems. The framework generalizes the basic abstractions they originally developed for lottery scheduling. It is independent of the underlying proportional-share scheduler; a variety of probabilistic and deterministic algorithms can be used, including a min-funding revocation algorithm that they introduce for space-shared resources. The framework supports diverse resources and policies, including both proportional shares and guaranteed reservations. A repayment mechanism prevents allocation distortions caused by transfers of resource rights. Key framework concepts are analogous to features of object-oriented languages.

international conference on parallel architectures and languages europe | 1992

PRELUDE: A System for Portable Parallel Software

William E. Weihl; Eric A. Brewer; Adrian Colbrook; Chrysanthos Dellarocas; Wilson C. Hsieh; Anthony D. Joseph; Carl A. Waldspurger; Paul S. Wang

Abstract : In this paper we describe PRELUDE, a programming language and accompanying system support for writing portable MIMD parallel programs. PRELUDE supports methodology for designing and organizing parallel programs that makes them easier to tune for particular architectures and to port to new architectures. It builds on earlier work on Emerald, Amber, and various Fortran extensions to allow the programmer to divide programs into architecture-dependent and architecture-independent parts, and then to change the architecture-dependent parts to port the program to a new machine or to tune its performance on a single machine. The architecture-dependent parts of a program are specified by annotations that describe the mapping of a program onto a machine. PRELUDE provides a variety of mapping mechanisms similar to those in other systems, including remote procedure call, object migration, and data replication and partitioning. In addition, PRELUDE includes novel migration mechanisms for computations based on a form of continuation passing. The implementation of object migration in PRELUDE uses a novel approach based on fixup blocks that is more efficient than previous approaches, and amortizes the cost of each migration so that the cost per migration drops as the frequency of migrations increases.

international parallel processing symposium | 1992

Preventing recursion deadlock in concurrent object-oriented systems

Eric A. Brewer; Carl A. Waldspurger

This paper presents solutions to the problem of deadlock due to recursion in concurrent object-oriented programming languages. Two language-independent, system-level mechanisms are proposed: a novel technique using multi-ported objects, and a named-threads scheme that borrows from previous work in distributed computing. The authors compare the solutions, and present an analysis of their relative merits.<<ETX>>

operating systems design and implementation | 1994