James P. Laudon | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where James P. Laudon is active.

Explore More

Publication

Featured researches published by James P. Laudon.

international symposium on computer architecture | 1990

Memory consistency and event ordering in scalable shared-memory multiprocessors

Kourosh Gharachorloo; Daniel E. Lenoski; James P. Laudon; Phillip B. Gibbons; Anoop Gupta; John L. Hennessy

Scalable shared-memory multiprocessors distribute memory among the processors and use scalable interconnection networks to provide high bandwidth and low latency communication. In addition, memory accesses are cached, buffered, and pipelined to bridge the gap between the slow shared memory and the fast processors. Unless carefully controlled, such architectural optimizations can cause memory accesses to be executed in an order different from what the programmer expects. The set of allowable memory access orderings forms the memory consistency model or event ordering model for an architecture. This paper introduces a new model of memory consistency, called release consistency, that allows for more buffering and pipelining than previously proposed models. A framework for classifying shared accesses and reasoning about event ordering is developed. The release consistency model is shown to be equivalent to the sequential consistency model for parallel programs with sufficient synchronization. Possible performance gains from the less strict constraints of the release consistency model are explored. Finally, practical implementation issues are discussed, concentrating on issues relevant to scalable architectures.

international symposium on computer architecture | 1997

The SGI Origin: a ccNUMA highly scalable server

James P. Laudon; Daniel E. Lenoski

The SGI Origin 2000 is a cache-coherent non-uniform memory access (ccNUMA) multiprocessor designed and manufactured by Silicon Graphics, Inc. The Origin system was designed from the ground up as a multiprocessor capable of scaling to both small and large processor counts without any bandwidth, latency, or cost cliffs. The Origin system consists of up to 512 nodes interconnected by a scalable Craylink network. Each node consists of one or two R10000 processors, up to 4 GB of coherent memory, and a connection to a portion of the XIO IO subsystem. This paper discusses the motivation for building the Origin 2000 and then describes its architecture and implementation. In addition, performance results are presented for the NAS Parallel Benchmarks V2.2 and the SPLASH2 applications. Finally, the Origin system is compared to other contemporary commercial ccNUMA systems.

international symposium on computer architecture | 1990

The directory-based cache coherence protocol for the DASH multiprocessor

Daniel E. Lenoski; James P. Laudon; Kourosh Gharachorloo; Anoop Gupta; John L. Hennessy

DASH is a scalable shared-memory multiprocessor currently being developed at Stanfords Computer Systems Laboratory. The architecture consists of powerful processing nodes, each with a portion of the shared-memory, connected to a scalable interconnection network. A key feature of DASH is its distributed directory-based cache coherence protocol. Unlike traditional snoopy coherence protocols, the DASH protocol does not rely on broadcast; instead it uses point-to-point messages sent between the processors and memories to keep caches consistent. Furthermore, the DASH system does not contain any single serialization or control point. While these features provide the basis for scalability, they also force a reevaluation of many fundamental issues involved in the design of a protocol. These include the issues of correctness, performance and protocol complexity. In this paper, we present the design of the DASH coherence protocol and discuss how it addresses the above issues. We also discuss our strategy for verifying the correctness of the protocol and briefly compare our protocol to the IEEE Scalable Coherent Interface protocol.

IEEE Transactions on Parallel and Distributed Systems | 1993

The DASH prototype: Logic overhead and performance

Daniel E. Lenoski; James P. Laudon; Truman Joe; David Nakahira; Luis Stevens; Anoop Gupta; John L. Hennessy

The fundamental premise behind the DASH project is that it is feasible to build large-scale shared-memory multiprocessors with hardware cache coherence. The hardware overhead of directory-based cache coherence in a 48-processor is examined. The data show that the overhead is only about 10-15%, which appears to be a small cost for the ease of programming offered by coherent caches and the potential for higher performance. The performance of the system is discussed, and the speedups obtained by a variety of parallel applications running on the prototype are shown. Using a sophisticated hardware performance monitor, the effectiveness of coherent caches and the relationship between an applications reference behavior and its speedup are characterized. The optimizations incorporated in the DASH protocol are evaluated in terms of their effectiveness on parallel applications and on atomic tests that stress the memory system. >

international symposium on computer architecture | 1992

The DASH prototype: implementation and performance

Daniel E. Lenoski; James P. Laudon; Truman Joe; David Nakahira; Luis Stevens; Anoop Gupta; John L. Hennessy

The fundamental premise behind the DASH project is that it is feasible to build large-scale shared-memory multiprocessors with hardware cache coherence. While paper studies and software simulators are useful for understanding many high-level design trade-offs, prototypes are essential to ensure that no critical details are overlooked. A prototype provides convincing evidence of the feasibility of the design allows one to accurately estimate both the hardware and the complexity cost of various features, and provides a platform for studying real workloads. A 16-processor prototype of the DASH multiprocessor has been operational for the last six months. In this paper, the hardware overhead of directory-based cache coherence in the prototype is examined. We also discuss the performance of the system, and the speedups obtained by parallel applications running on the prototype. Using a sophisticated hardware performance monitor, we characterize the effectiveness of coherent caches and the relationship between an applications reference behavior and its speedup.

architectural support for programming languages and operating systems | 1994

Interleaving: a multithreading technique targeting multiprocessors and workstations

James P. Laudon; Anoop Gupta; Mark Horowitz

There is an increasing trend to use commodity microprocessors as the compute engines in large-scale multiprocessors. However, given that the majority of the microprocessors are sold in the workstation market, not in the multiprocessor market, it is only natural that architectural features that benefit only multiprocessors are less likely to be adopted in commodity microprocessors. In this paper, we explore multiple-context processors, an architectural technique proposed to hide the large memory latency in multiprocessors. We show that while current multiple-context designs work reasonably well for multiprocessors, they are ineffective in hiding the much shorter uniprocessor latencies using the limited parallelism found in workstation environments. We propose an alternative design that combines the best features of two existing approaches, and present simulation results that show it yields better performance for both multiprogrammed workloads on a workstation and parallel applications on a multiprocessor. By addressing the needs of the workstation environment, our proposal makes multiple contexts more attractive for commodity microprocessors.

ieee computer society international conference | 1990

Design of scalable shared-memory multiprocessors: the DASH approach

Daniel E. Lenoski; Kourosh Gharachorloo; James P. Laudon; Anoop Gupta; John L. Hennessy; Mark Horowitz; Monica S. Lam

The DASH (directory architecture for shared-memory) multiprocessor, which combines the programmability of shared-memory machines with the scalability of message-passing machines, is described. Hardware-supported coherent caches provide for low-latency access of shared data and ease of programming. Caches are kept coherent by means of a distributed directory-based protocol. Shared memory in the machine is distributed among the processing nodes, and scalable memory bandwidth is provided by connecting the nodes through a general interconnection network. The prototype DASH machine will consists of 64 high-performance microprocessors, with an aggregate performance of over 1200 MIPS and 250 scalar MFLOPS. The fundamental premise in DASH is that it is possible to build a scalable shared-memory machine with hardware-supported coherent caches by using a distributed directory-based cache coherence protocol. The mechanisms for providing scalable memory bandwidth, reducing and tolerating memory latency, and supporting efficient synchronization are described. A brief description of the machines implementation is given.<<ETX>>

Proceedings IEEE COMPCON 97. Digest of Papers | 1997

System overview of the SGI Origin 200/2000 product line

James P. Laudon; Daniel E. Lenoski

The SGI Origin 200/2000 is a cache-coherent non-uniform memory access (ccNUMA) multiprocessor, designed and manufactured by Silicon Graphics Inc. (SGI). The Origin system was designed from the ground up as a multiprocessor that was capable of scaling to both small and large processor counts without any cost, bandwidth or latency cliffs. The Origin system consists of up to 512 nodes interconnected by a highly scalable Craylink network. Each node consists of one or two R10000 processors and up to 4 GBytes of coherent memory. Each node also connects to the scalable XIO I/O subsystem. This paper discusses the motivation for building the Origin 200/2000 and describes its architecture and implementation.

international symposium on computer architecture | 1992

Architectural and implementation tradeoffs in the design of multiple-context processors (abstract)

James P. Laudon; Anoop Gupta; Mark Horowitz

We examine two multiple-context schemes in the context of scalable shared-memory multiprocessors. The blocked scheme switches between contexts at cache misses. The proposed interleaved scheme switches between available contexts on a cycle-by-cycle basis, while providing full pipeline interlocks for good single-context performance. We show the interleaved scheme to have a performance advantage over the blocked scheme due to its ability to hide pipeline dependencies and reduce the context switch cost. We also show that, while the implementation of the interleaved scheme is more complex, this complexity is not overwhelming.

international symposium on computer architecture | 1998

Retrospective: the DASH prototype: implementation and performance

Daniel E. Lenoski; James P. Laudon

The fundamental premise behind the DASH project is that it is feasible to build large-scale shared-memory multiprocessors with hardware cache coherence. While paper studies and software sirnulators are useful for understanding many high-level design tradeoffs, prototypes are essential to ensure that no critical details are overlooked. A prototype provides convincing evidence of the feasibility of the design allows one to accurately estimate both the hardware and the complexity cost of various features. and provides a platform for studying real workloads. A 16-processor prototype of the DASH multiprocessor has been operational for the last six months. In this paper, the hardware overhead of directory-based cache coherence in the prototype is examined. We also discuss the performance of the system. and the speedups obtained by parallel applications running on the prototype. Using a sophisticated harclwere performance monitor, we characterize the effectiveness of coherent caches and the relationship between an application’s reference behavior and its speedup.

Explore More