Caroline D. Benveniste

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Caroline D. Benveniste is active.

Explore More

Publication

Featured researches published by Caroline D. Benveniste.

IEEE Transactions on Computers | 2001

Cache-memory interfaces in compressed memory systems

Caroline D. Benveniste; Peter A. Franaszek; John T. Robinson

We consider a number of cache/memory hierarchy design issues in systems with compressed random access memories (C-RAMs) In which compression and decompression occur automatically to and from main memory. Using a C-RAM as main memory, the bulk of main memory contents are stored in a compressed format and dynamically decompressed to handle cache misses at the next higher level of memory. This is the general approach adopted in IBMs memory expansion technology (MXT). The design of the main memory directory structures and storage allocation methods in such systems is described elsewhere; here, we focus on issues related to cache-memory interfaces. In particular, if the cache line size (of the cache or caches to which main memory data is transferred) is different than the size of the unit of compression in main memory, bandwidth and latency problems can occur. Another issue is that of guaranteed forward progress, that is, ensuring that modified lines can be written to the compressed main memory so that the system can continue operation even if overall compression deteriorates. We study several approaches for solving these problems, using trace-driven analysis to evaluate alternatives.

international conference on supercomputing | 1995

Performance evaluation of a parallel I/O architecture

Sandra Johnson Baylor; Caroline D. Benveniste; Yarsun Hsu

Presented are the results of a study conducted to evaluate the performance of parallel I/O on a massively parallel processor (MPP). The network traversal and total processing times are calculated for I/O reads and writes while varying the I/O and non-I/O request rates and the request size. Also studied is the performance impact of I/O and non-I/O traffic on each other. The results show that the system is scalable for I/O loads considered; however, the scalability is limited by I/O node saturation or considerable network contention.

ACM Sigarch Computer Architecture News | 1994

Performance evaluation of a massively parallel I/O subsystem

Sandra Johnson Baylor; Caroline D. Benveniste; Yarsun Hsu

Presented are the trace-driven simulation results of a study conducted to evaluate the performance of the internal parallel I/O subsystem of the Vulcan MPP architecture. The system sizes evaluated vary from 16 to 512 nodes. The results show that a compute node to I/O node ratio of four is the most cost effective for all system sizes, showing high scalability. Also, processor-to-processor communication effects are negligible for small message sizes and the greater the fraction of I/O reads, the better the I/O performance. Worse case I/O node placement is within 13% of more efficient placement strategies. Introducing parallelism into the internal I/O subsystem improves I/O performance significantly.

Journal of Parallel and Distributed Computing | 1997

Clock Synchronization on a Multicomputer

Bulent Abali; Craig B. Stunkel; Caroline D. Benveniste

We describe hardware and software schemes for achieving precise clock synchronization on SP2 parallel system nodes. The SP2 multistage interconnection network has an unusual hardware feature, a set of distributed counters that the processor nodes may utilize for synchronizing their time?of?day clocks. We describe an algorithm for synchronizing the counters to within less than 200 nanoseconds of each other in a network of up to 512 processor nodes. This is 4?5 orders of magnitude better than what can be achieved by existing software schemes. We also describe experimental system software, calledsptimed, for synchronizing the node clocks to the Internet time of day, utilizing the synchronous counters in the SP2 network.Sptimedsynchronizes the node clocks typically within 5 ?s of each other, which is up to 2?3 orders of magnitude better than could be achieved by previous methods on the SP2 system. Synchronized clocks are useful in parallel and distributed environments, for example for performance measurement, tuning, tracing, debugging, gang scheduling of parallel processes, and timestamping of transactions. We also measure the performance of a widely used time synchronization utility, the Network Time Protocol, using the synchronous counters of the SP2 interconnection network.

annual simulation symposium | 1994

A methodology for evaluating parallel I/O performance for massively parallel processors

S. Johnson Baylor; Caroline D. Benveniste; L.J. Boelhouwer

Accurate performance modelling of massively parallel processors (MPPs) is a complex and arduous task. While some work has been done in evaluating the performance of processors and memory systems for MPPs, much less work has been done in evaluating the performance of parallel I/O subsystems. The authors present a hybrid methodology for evaluating the performance of parallel I/O subsystems using PIOS, a simulator for parallel I/O, and an analytical model. The results show a performance accuracy within 13 percent of a full simulation model. This hybrid methodology has a broad application for evaluating the performance of many types of diverse systems.<<ETX>>

custom integrated circuits conference | 1990

Design of VLSI switch for highly parallel multiprocessor system

Yarsun Hsu; Caroline D. Benveniste; J Ruedinger; Cj Tan

The design of a large, multistage interconnection network that has been successfully constructed and used in a version of the RP3 system is described. The network hardware is scalable and can be used for systems consisting of anywhere from four to hundreds of processor and memory elements. An overview is given of the switch architecture, followed by the packaging structure. A description of the methodology used for logic design and verification of the large silicon chip is presented.<<ETX>>

Archive | 2000