Ashley Saulsbury
Sun Microsystems
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Ashley Saulsbury.
international symposium on computer architecture | 1996
Andreas Nowatzyk; Fong Pong; Ashley Saulsbury
Current high performance computer systems use complex, large superscalar CPUs that interface to the main memory through a hierarchy of caches and interconnect systems. These CPU-centric designs invest a lot of power and chip area to bridge the widening gap between CPU and main memory speeds. Yet, many large applications do not operate well on these systems and are limited by the memory subsystem performance.This paper argues for an integrated system approach that uses less-powerful CPUs that are tightly integrated with advanced memory technologies to build competitive systems with greatly reduced cost and complexity. Based on a design study using the next generation 0.25µm, 256Mbit dynamic random-access memory (DRAM) process and on the analysis of existing machines, we show that processor memory integration can be used to build competitive, scalable and cost-effective MP systems.We present results from execution driven uni- and multi-processor simulations showing that the benefits of lower latency and higher bandwidth can compensate for the restrictions on the size and complexity of the integrated processor. In this system, small direct mapped instruction caches with long lines are very effective, as are column buffer data caches augmented with a victim cache.
high performance computer architecture | 1995
Ashley Saulsbury; Tim Wilkinson; John B. Carter; Anders Landin
We present design details and some initial performance results of a novel scalable shared memory multiprocessor architecture. This architecture features the automatic data migration and replication capabilities of cache-only memory architecture (COMA) machines, without the accompanying hardware complexity. A software layer manages cache space allocation at a page-granularity-similarly to distributed virtual shared memory (DVSM) systems, leaving simpler hardware to maintain shared memory coherence at a cache line granularity. By reducing the hardware complexity, the machine cost and development time are reduced. We call the resulting hybrid hardware and software multiprocessor architecture Simple COMA. Preliminary results indicate that the performance of Simple COMA is comparable to that of more complex contemporary all hardware designs. >
hawaii international conference on system sciences | 1994
Erik Hagersten; Ashley Saulsbury; Anders Landin
Shared memory architectures often have caches to reduce the number of slow remote memory accesses. The largest possible caches exist in shared memory architectures called Cache-Only Memory Architectures (COMAs). In a COMA all the memory resources are used to implement large caches. Unfortunately, these large caches also have their price. Due to its lack of physically shared memory, COMA may suffer from a longer remote access latency than alternatives. Large COMA caches might also introduce an extra latency for local memory accesses, unless the node architecture is designed with care. The authors examine the implementation of COMAs, and consider how to move much of the complex functionality into software. They introduce the idea of a simple COMA architecture, a hybrid with hardware support only for the functionality frequently used. Such a system is expected to have good performance, and because of its simplicity it should be quick and cheap to develop and engineer.<<ETX>>
international conference on parallel architectures and languages europe | 1993
Erik Hagersten; Mats Grindal; Anders Landin; Ashley Saulsbury; Bengt Werner; Seif Haridi
Large-scale multiprocessors suffer from long latencies for remote accesses. Caching is by far the most popular technique for hiding such delays. Caching not only hides the delay, but also decreases the network load. Cache-Only Memory Architectures (COMA), have no physically shared memory. Instead, all the memory resources are invested in caches, enabling in caches of the largest possible size. A datum has no home, and is moved by a protocol between the caches according to its usage. Furthermore, it might exist in multiple caches. Even though no shared memory exists in the traditional sense, the architecture provides a shared memory view to a processor, and hence also to the programmer. The simulation results of large programs running on up to 128 processors indicate that the COMA adapts well to existing shared memory programs. They also show that an application with a poor locality can benefit by adopting the COMA principle of no fixed home for data, resulting in a reduction of execution time by a factor three.
Archive | 2002
Jeffrey M. Broughton; Liang T. Chen; William K. Lam; Derek E. Pappas; Ihao Chen; Thomas M. McWilliams; Ankur Narang; Jeffrey B. Rubin; Earl T. Cohen; Michael W. Parkin; Ashley Saulsbury; Michael S. Ball
Archive | 2001
Ashley Saulsbury; Nyles Nettleton; Michael W. Parkin
Archive | 1996
Ashley Saulsbury; Andreas Nowatzyk; Fong Pong
Archive | 2001
Ashley Saulsbury
Archive | 2001
Ashley Saulsbury; James E. Kocol; Sandra C. Lee
Archive | 2005
Bruce J. Chang; Ricky C. Hetherington; Brian J. McGee; David M. Kahn; Ashley Saulsbury