Truman Joe | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Truman Joe is active.

Explore More

Publication

Featured researches published by Truman Joe.

IEEE Transactions on Parallel and Distributed Systems | 1993

The DASH prototype: Logic overhead and performance

Daniel E. Lenoski; James P. Laudon; Truman Joe; David Nakahira; Luis Stevens; Anoop Gupta; John L. Hennessy

The fundamental premise behind the DASH project is that it is feasible to build large-scale shared-memory multiprocessors with hardware cache coherence. The hardware overhead of directory-based cache coherence in a 48-processor is examined. The data show that the overhead is only about 10-15%, which appears to be a small cost for the ease of programming offered by coherent caches and the potential for higher performance. The performance of the system is discussed, and the speedups obtained by a variety of parallel applications running on the prototype are shown. Using a sophisticated hardware performance monitor, the effectiveness of coherent caches and the relationship between an applications reference behavior and its speedup are characterized. The optimizations incorporated in the DASH protocol are evaluated in terms of their effectiveness on parallel applications and on atomic tests that stress the memory system. >

international symposium on computer architecture | 1992

The DASH prototype: implementation and performance

Daniel E. Lenoski; James P. Laudon; Truman Joe; David Nakahira; Luis Stevens; Anoop Gupta; John L. Hennessy

The fundamental premise behind the DASH project is that it is feasible to build large-scale shared-memory multiprocessors with hardware cache coherence. While paper studies and software simulators are useful for understanding many high-level design trade-offs, prototypes are essential to ensure that no critical details are overlooked. A prototype provides convincing evidence of the feasibility of the design allows one to accurately estimate both the hardware and the complexity cost of various features, and provides a platform for studying real workloads. A 16-processor prototype of the DASH multiprocessor has been operational for the last six months. In this paper, the hardware overhead of directory-based cache coherence in the prototype is examined. We also discuss the performance of the system, and the speedups obtained by parallel applications running on the prototype. Using a sophisticated hardware performance monitor, we characterize the effectiveness of coherent caches and the relationship between an applications reference behavior and its speedup.

international symposium on computer architecture | 1994

Evaluating the memory overhead required for COMA architectures

Truman Joe; John L. Hennessy

Cache only memory architectures (COMA) have an inherent memory overhead due to the organization of main memory as a large cache called an attraction memory. This overhead consists of memory left unallocated for performance reasons as well as additional physical memory required due to the cache organization of memory. In this work, we examine the effect of data reshuffling and data replication on the memory overhead. Data reshuffling occurs when space needs to be allocated to store a remote memory line in the local memory. Data that is reshuffled is sent between memories via replacement messages. A simple mathematical model predicts the frequency of data reshuffling as a function of the attraction memory parameters. Simulation data shows that the frequency of data reshuffling is sensitive to the allocation policy and associativity of the memory but is relatively unaffected by the block size chosen. The simulation data also shows that data replication in the attraction memory is important for good performance, but most gains can be achieved through replication in the processor caches.

conference on high performance computing (supercomputing) | 1993

An empirical comparison of the Kendall Square Research KSR-1 and Stanford DASH multiprocessors

Jaswinder Pal Singh; Truman Joe; Anoop Gupta; John L. Hennessy

Two interesting variants of large-scale shared-address-space parallel architectures are cache-coherent non-uniform-memory-access machines (CC-NUMA) and cache-only memory architectures (COMA). Both have distributed main memory and use directory-based cache coherence. While both architectures migrate and replicate data at the cache level automatically under hardware control, COMA machines do this at the main memory level as well. The authors compare the parallel performance of a recent realization of each type of architecture, the Stanford DASH multiprocessor (CC-NUMA) and the Kendall Square Research KSR-1 (COMA). Using a suite of important computational kernels and complete scientific applications, they examine performance differences resulting both from the CC-NUMA/COMA nature of the machines as well as from specific differences in system implementation.

Archive | 1993