Christoph Scheurich
University of Southern California
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Christoph Scheurich.
international symposium on computer architecture | 1986
Michel Dubois; Christoph Scheurich; Faye A. Briggs
In highly-pipelined machines, instructions and data are prefetched and buffered in both the processor and the cache. This is done to reduce the average memory access latency and to take advantage of memory interleaving. Lock-up free caches are designed to avoid processor blocking on a cache miss. Write buffers are often included in a pipelined machine to avoid processor waiting on writes. In a shared memory multiprocessor, there are more advantages in buffering memory requests, since each memory access has to traverse the memory- processor interconnection and has to compete with memory requests issued by different processors. Buffering, however, can cause logical problems in multiprocessors. These problems are aggravated if each processor has a private memory in which shared writable data may be present, such as in a cache-based system or in a system with a distributed global memory. In this paper, we analyze the benefits and problems associated with the buffering of memory requests in shared memory multiprocessors. We show that the logical problem of buffering is directly related to the problem of synchronization. A simple model is presented to evaluate the performance improvement resulting from buffering.
IEEE Computer | 1988
Michel Dubois; Christoph Scheurich; Faye A. Briggs
The problems addressed apply to both throughput-oriented and speedup-oriented multiprocessor systems, either at the user level or the operating-system level. basic definitions are provided. Communication and synchronization are briefly explained, and hardware-level and software-level synchronization mechanisms are discussed. The cache coherence problem is examined, and solutions are described. Strong and weak ordering of events is considered. The user interface is discussed.<<ETX>>
international symposium on computer architecture | 1987
Christoph Scheurich; Michel Dubois
This paper shows that cache coherence protocols can implement indivisible synchronization primitives reliably and can also enforce sequential consistency. Sequential consistency provides a commonly accepted model of behavior of multiprocessors. We derive a simple set of conditions needed to enforce sequential consistency in multiprocessors. These conditions are easily applied to prove the correctness of existing cache coherence protocols that rely on one or multiple broadcast buses to enforce atomicity of updates; in these protocols, all processing elements must be connected to the broadcast buses. The conditions are also used in this paper to establish new protocols which do not rely on the atomicity of updates and therefore do not require single access buses to propagate invalidations or to perform distributed WRITEs. It is also shown how such protocols can implement atomic READ&MODIFY operations for synchronization purposes.
IEEE Transactions on Software Engineering | 1990
Michel Dubois; Christoph Scheurich
The presence of high-performance mechanisms in shared-memory multiprocessors such as private caches, the extensive pipelining of memory access, and combining networks may render a logical concurrency model complex to implement or inefficient. The problem of implementing a given logical concurrency model in such a multiprocessor is addressed. Two concurrency models are considered, and simple rules are introduced to verify that a multiprocessor architecture adheres to the models. The rules are applied to several examples of multiprocessor architectures. >
IEEE Transactions on Computers | 1989
Christoph Scheurich; Michel Dubois
A mechanism called the pivot mechanism is introduced and described. It controls the dynamic migration of data pages between neighboring memory modules during program execution to improve the performance and programmability of multiprocessors with distributed global memory. The programmer or compiler is relieved from the data allocation task; moreover, because data allocation is dynamically modified to minimize communication traffic, algorithms with varying and unpredictable data access patterns can run efficiently. Flexible data migration serves the dual purpose of making algorithms the efficient machine-specific and making possible the efficient execution of algorithms for which a good static allocation is not possible. Simulation results based on a mesh-connected multiprocessor performing a matrix multiplication are presented. >
Journal of Parallel and Distributed Computing | 1990
Christoph Scheurich; Michel Dubois
Abstract The performance of shared-memory multiprocessors can suffer greatly from moderate cache miss rates because of the usually high ratio between memory access and cache access times. In this paper we propose a cache design in which the handling of one or several cache misses occurs concurrently with processor activity. Concurrent miss resolution in multiprocessor caches must function in conjunction with the systems synchronization hardware and cache coherence protocol. Through performance models, we identify system configurations for which concurrent miss resolution is effective. Compiler techniques to take advantage of the proposed design are illustrated at the end of the paper.
conference on high performance computing (supercomputing) | 1988
Christoph Scheurich; Michel Dubois
The performance of cache-based, shared-memory multiprocessors can suffer greatly from moderate cache miss rates because of the usually high ratio between memory-access and cache-access times. The authors propose a lockup-free cache design in which the handling of one or several cache misses is overlapped with processor activity. In multiprocessors, lockup-free caches aggravate the memory coherence problem. Three different cache architectures relying on different compiler interventions are introduced. A performance model demonstrates the usefulness of lockup-free caches for high-performance processors. The merits and disadvantages of the three schemes are discussed, and compiler techniques to take advantage of the proposed designs are illustrated.<<ETX>>
Archive | 1989
Christoph Scheurich; Michel Dubois
international conference on parallel processing | 1988
Christoph Scheurich; Michel Dubois
Archive | 1986
Michel Dubois; Christoph Scheurich; Faye A. Briggs