William E. Weihl
Massachusetts Institute of Technology
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by William E. Weihl.
symposium on operating systems principles | 1997
Jennifer-Ann M. Anderson; Lance M. Berc; Jeffrey Dean; Sanjay Ghemawat; Monika Rauch Henzinger; Shun-Tak Leung; Richard L. Sites; Mark T. Vandevoorde; Carl A. Waldspurger; William E. Weihl
This article describes the Digital Continuous Profiling Infrastructure, a sampling-based profiling system designed to run continuously on production systems. The system supports multiprocessors, works on unmodified executables, and collects profiles for entire systems, including user programs, shared libraries, and the operating system kernel. Samples are collected at a high rate (over 5200 samples/sec. per 333MHz processor), yet with low overhead (1–3% slowdown for most workloads). Analysis tools supplied with the profiling system use the sample data to produce a precise and accurate accounting, down to the level of pipeline stalls incurred by individual instructions, of where time is bring spent. When instructions incur stalls, the tools identify possible reasons, such as cache misses, branch mispredictions, and functional unit contention. The fine-grained instruction-level analysis guides users and automated optimizers to the causes of performance problems and provides important insights for fixing them.
Journal of the ACM | 1986
Danny Dolev; Nancy A. Lynch; Shlomit S. Pinter; Eugene W. Stark; William E. Weihl
This paper considers a variant of the Byzantine Generals problem, in which processes start with arbitrary real values rather than Boolean values or values from some bounded range, and in which approximate, rather than exact, agreement is the desired goal. Algorithms are presented to reach approximate agreement in asynchronous, as well as synchronous systems. The asynchronous agreement algorithm is an interesting contrast to a result of Fischer et al, who show that exact agreement with guaranteed termination is not attainable in an asynchronous system with as few as one faulty process. The algorithms work by successive approximation, with a provable convergence rate that depends on the ratio between the number of faulty processes and the total number of processes. Lower bounds on the convergence rate for algorithms of this form are proved, and the algorithms presented are shown to be optimal.
international symposium on microarchitecture | 1997
Jeffrey Dean; James E. Hicks; Carl A. Waldspurger; William E. Weihl; George Z. Chrysos
Profile data is valuable for identifying performance bottlenecks and guiding optimizations. Periodic sampling of a processors performance monitoring hardware is an effective, unobtrusive way to obtain detailed profiles. Unfortunately, existing hardware simply counts events, such as cache misses and branch mispredictions, and cannot accurately attribute these events to instructions, especially on out-of-order machines. We propose an alternative approach, called ProfileMe, that samples instructions. As a sampled instruction moves through the processor pipeline, a detailed record of all interesting events and pipeline stage latencies is collected. ProfileMe also supports paired sampling, which captures information about the interactions between concurrent instructions, revealing information about useful concurrency and the utilization of various pipeline stages while an instruction is in flight. We describe an inexpensive hardware implementation of ProfileMe, outline a variety of software techniques to extract useful profile information from the hardware, and explain several ways in which this information can provide valuable feedback for programmers and optimizers.
IEEE Transactions on Computers | 1988
William E. Weihl
Two novel concurrency control algorithms for abstract data types are presented. The algorithms ensure serializability of transactions by using conflict relations based on the commutativity of operations. It is proved that both algorithms ensure a local atomicity property called dynamic atomicity. This means that the algorithms can be used in combination with any other algorithms that also ensure dynamic atomicity. The algorithms are quite general, permitting operations to be both partial and nondeterministic., They permit the results returned by operations to be used in determining conflicts, thus permitting higher levels of concurrency than is otherwise possible. The descriptions and proofs encompass recovery as well as concurrency control. The two algorithms use different recovery methods: one uses intentions lists, and the other uses undo logs. It is shown that conflict relations that work with one recovery method do not necessarily work with the other. A general correctness condition that must be satisfied by the combination of a recovery method and a conflict relation is identified.<<ETX>>
measurement and modeling of computer systems | 1992
Eric A. Brewer; Chrysanthos Dellarocas; Adrian Colbrook; William E. Weihl
PROTEUS is a high-performance simulator for MIMD multiprocessors. It is fast, accurate, and flexible: it is one to two orders of magnitude faster than comparable simulators, it can reproduce results from real multiprocessors, and it is easily configured to simulate a wide range of architectures. PROTEUS provides a modular structure that simplifies customization and independent replacement of parts of architecture. There are typically multiple implementations of each module that provide different combinations of accuracy and performance; users pay for accuracy only when and where they need it. Finally, PROTEUS provides repeatability, nonintrusive monitoring and debugging, and integrated graphical output, which result in a development environment superior to those available on real multiprocessors
ACM Transactions on Programming Languages and Systems | 1989
William E. Weihl
Atomic actions (or transactions) are useful for coping with concurrency and failures. One way of ensuring atomicity of actions is to implement applications in terms of atomic data types: abstract data types whose objects ensure serializability and recoverability of actions using them. Many atomic types can be implemented to provide high levels of concurrency by taking advantage of algebraic properties of the types operations, for example, that certain operations commute. In this paper we analyze the level of concurrency permitted by an atomic type. We introduce several local constraints on individual objects that suffice to ensure global atomicity of actions; we call these constraints local atomicity properties. We present three local atomicity properties, each of which is optimal: no strictly weaker local constraint on objects suffices to ensure global atomicity for actions. Thus, the local atomicity properties define precise limits on the amount of concurrency that can be permitted by an atomic type.
job scheduling strategies for parallel processing | 1998
Patrick G. Sobalvarro; Scott Pakin; William E. Weihl; Andrew A. Chien
Coscheduling has been shown to be a critical factor in achieving efficient parallel execution in timeshared environments [12, 19, 4]. However, the most common approach, gang scheduling, has limitations in scaling, can compromise good interactive response, and requires that communicating processes be identified in advance.
job scheduling strategies for parallel processing | 1995
Patrick G. Sobalvarro; William E. Weihl
This thesis describes demand-based coscheduling, a new approach to scheduling parallel computations on multiprogrammed multiprocessors. In demand-based coscheduling, rather than making the pessimistic assumption that all the processes constituting a parallel job must be simultaneously scheduled in order to achieve good performance, information about which processes are communicating is used in order to coschedule only these; the resulting scheme is well-suited to implementation on a workstation cluster because it is naturally decentralized. I present an analytical model and simulations of demand-based coscheduling, an implementation on a cluster of workstations connected by a high-speed network, and a set of experimental results. An analysis of the results shows that demand-based coscheduling successfully coschedules parallel processes in a timeshared workstation cluster, significantly reducing the response times of parallel computations. (Copies available exclusively from MIT Libraries, Rm. 14-0551, Cambridge, MA 02139-4307. Ph. 617-253-5668; Fax 617-253-1690.)
IEEE Transactions on Software Engineering | 1987
William E. Weihl
Typical concurrency control protocols for atomic actions, such as two-phase locking, perform poorly for long read-only actions. We present four new concurrency control protocols that eliminate all interference between read-only actions and update actions, and thus offer significantly improved performance for read-only actions. The protocols work by maintaining multiple versions of the system state; read-only actions read old versions, while update actions manipulate the most recent version. We focus on the problem of managing the storage required for old versions in a distributed system. One of the protocols uses relatively little space, but has a potentially significant communication cost. The other protocols use more space, but may be cheaper in terms of communication.
international symposium on computer architecture | 1993
Carl A. Waldspurger; William E. Weihl
Multithreading is an important technique that improves processor utilization by allowing computation to be overlapped with the long latency operations that commonly occur in multiprocessor systems. This paper presents register relocation, a new mechanism that efficiently supports flexible partitioning of the register file into variable-size contexts with minimal hardware support. Since the number of registers required by thread contexts varies, this flexibility permits a better utilization of scarce registers, allowing more contexts to be resident, which in turn allows applications to tolerate shorter run lengths and longer latencies. Our experiments show that compared to fixed-size hardware contexts, register relocation can improve processor utilization by a factor of two for many workloads.