Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Caroline D. Benveniste.
IEEE Transactions on Computers | 2001
Caroline D. Benveniste; Peter A. Franaszek; John T. Robinson
We consider a number of cache/memory hierarchy design issues in systems with compressed random access memories (C-RAMs) In which compression and decompression occur automatically to and from main memory. Using a C-RAM as main memory, the bulk of main memory contents are stored in a compressed format and dynamically decompressed to handle cache misses at the next higher level of memory. This is the general approach adopted in IBMs memory expansion technology (MXT). The design of the main memory directory structures and storage allocation methods in such systems is described elsewhere; here, we focus on issues related to cache-memory interfaces. In particular, if the cache line size (of the cache or caches to which main memory data is transferred) is different than the size of the unit of compression in main memory, bandwidth and latency problems can occur. Another issue is that of guaranteed forward progress, that is, ensuring that modified lines can be written to the compressed main memory so that the system can continue operation even if overall compression deteriorates. We study several approaches for solving these problems, using trace-driven analysis to evaluate alternatives.
international conference on supercomputing | 1995
Sandra Johnson Baylor; Caroline D. Benveniste; Yarsun Hsu
Presented are the results of a study conducted to evaluate the performance of parallel I/O on a massively parallel processor (MPP). The network traversal and total processing times are calculated for I/O reads and writes while varying the I/O and non-I/O request rates and the request size. Also studied is the performance impact of I/O and non-I/O traffic on each other. The results show that the system is scalable for I/O loads considered; however, the scalability is limited by I/O node saturation or considerable network contention.
ACM Sigarch Computer Architecture News | 1994
Sandra Johnson Baylor; Caroline D. Benveniste; Yarsun Hsu
Presented are the trace-driven simulation results of a study conducted to evaluate the performance of the internal parallel I/O subsystem of the Vulcan MPP architecture. The system sizes evaluated vary from 16 to 512 nodes. The results show that a compute node to I/O node ratio of four is the most cost effective for all system sizes, showing high scalability. Also, processor-to-processor communication effects are negligible for small message sizes and the greater the fraction of I/O reads, the better the I/O performance. Worse case I/O node placement is within 13% of more efficient placement strategies. Introducing parallelism into the internal I/O subsystem improves I/O performance significantly.
Journal of Parallel and Distributed Computing | 1997
Bulent Abali; Craig B. Stunkel; Caroline D. Benveniste
We describe hardware and software schemes for achieving precise clock synchronization on SP2 parallel system nodes. The SP2 multistage interconnection network has an unusual hardware feature, a set of distributed counters that the processor nodes may utilize for synchronizing their time?of?day clocks. We describe an algorithm for synchronizing the counters to within less than 200 nanoseconds of each other in a network of up to 512 processor nodes. This is 4?5 orders of magnitude better than what can be achieved by existing software schemes. We also describe experimental system software, calledsptimed, for synchronizing the node clocks to the Internet time of day, utilizing the synchronous counters in the SP2 network.Sptimedsynchronizes the node clocks typically within 5 ?s of each other, which is up to 2?3 orders of magnitude better than could be achieved by previous methods on the SP2 system. Synchronized clocks are useful in parallel and distributed environments, for example for performance measurement, tuning, tracing, debugging, gang scheduling of parallel processes, and timestamping of transactions. We also measure the performance of a widely used time synchronization utility, the Network Time Protocol, using the synchronous counters of the SP2 interconnection network.
annual simulation symposium | 1994
S. Johnson Baylor; Caroline D. Benveniste; L.J. Boelhouwer
Accurate performance modelling of massively parallel processors (MPPs) is a complex and arduous task. While some work has been done in evaluating the performance of processors and memory systems for MPPs, much less work has been done in evaluating the performance of parallel I/O subsystems. The authors present a hybrid methodology for evaluating the performance of parallel I/O subsystems using PIOS, a simulator for parallel I/O, and an analytical model. The results show a performance accuracy within 13 percent of a full simulation model. This hybrid methodology has a broad application for evaluating the performance of many types of diverse systems.<<ETX>>
custom integrated circuits conference | 1990
Yarsun Hsu; Caroline D. Benveniste; J Ruedinger; Cj Tan
The design of a large, multistage interconnection network that has been successfully constructed and used in a version of the RP3 system is described. The network hardware is scalable and can be used for systems consisting of anywhere from four to hundreds of processor and memory elements. An overview is given of the switch architecture, followed by the packaging structure. A description of the methodology used for logic design and verification of the large silicon chip is presented.<<ETX>>
Archive | 2000
Caroline D. Benveniste; Peter A. Franaszek; John T. Robinson
Archive | 2002
Caroline D. Benveniste; Vittorio Castelli; Peter A. Franaszek
Archive | 1999
Caroline D. Benveniste; Peter A. Franaszek; John T. Robinson; Charles O. Schulz
Archive | 2011
Caroline D. Benveniste; Vittorio Castelli; Peter A. Franaszek