Margaret L. Simmons
Los Alamos National Laboratory
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Margaret L. Simmons.
conference on high performance computing (supercomputing) | 1991
Margaret L. Simmons; Harvey J. Wasserman; Olaf M. Lubeck; Christopher Eoyang; Raul Mendez; Hirro Harada; Misako Ishiguro
No abstract available
conference on high performance computing (supercomputing) | 1991
Ingrid Y. Bucher; Margaret L. Simmons
No abstract available
conference on high performance computing (supercomputing) | 1992
Olaf M. Lubeck; Margaret L. Simmons; Harvey J. Wasserman
The authors present the results of an architectural comparison of SIMD (single-instruction multiple-data) massive parallelism, as implemented in the Thinking Machines Corp. CM-2, and vector or concurrent-vector processing, as implemented in the Cray Research Inc., Y-MP/8. The comparison is based primarily upon three application codes taken from the LANL (Los Alamos National Laboratory) CM-2 workload. Tests were run by porting CM Fortran codes to the Y-MP, so that nearly the same level of optimization was obtained on both machines. The results for fully configured systems, using measured data rather than scaled data from smaller configurations, show that the Y-MP/8 is faster than the 64 k CM-2 for all three codes. A simple model that accounts for the relative characteristic computational speeds of the two machines, and reduction in overall CM-2 performance due to communication or SIMD conditional execution, accurately predicts the performance of two of the three codes. The authors show the similarity of the CM-2 and Y-MP programming models and comment on selected future massively parallel processor designs.<<ETX>>
measurement and modeling of computer systems | 1987
Ingrid Y. Bucher; Margaret L. Simmons
Vector computers have dominated the field of hlgh speed scientiSc computation over the past decade because of their effectiveness in performing repetitive floating point operations. The basic design concept of their arithmetic vector units is ptpelinlng, The units are subdivided into several stages, each of which can work on a pair of operands independently and in parallel. ParUally processed work progresses from stage to stage each clock period. The whole unit functions in a fashion similar to an assembly llne with a conveyor bell And although a floating point operation requires several clock cycles, results can be produced each clock cycle after an initial startup period of the pipa
conference on high performance computing (supercomputing) | 1990
Margaret L. Simmons; Harvey J. Wasserman
The authors report the performance of the 6000-series computers as measured using a set of portable, standard-Fortran, computationally intensive benchmark codes that represent the scientific workload at the Los Alamos National Laboratory. On all but three of the benchmark codes, the 40-ns RISC (reduced instruction set computer) system was able to perform as well as a single Convex C-240 processor, a vector processor that also has a 40-ns clock cycle, and, on these same codes, it performed as well as the FPS-500, a vector processor with a 30-ns clock cycle.<<ETX>>
The Journal of Supercomputing | 1990
O. M. Lubeck; Margaret L. Simmons
Although considerable technology has been developed for debugging and developing sequential programs, producing verifiably correct parallel code is a much harder task. In view of the large number of possible scheduling sequences, exhaustive testing is not a feasible method for determining whether a given parallel program is correct; nor have there been sufficient theoretical developments to allow the automatic verification of parallel programs. PTOOL, a tool being developed at Rice University in collaboration with users at Los Alamos National Laboratory, provides an alternative mechanism for producing correct parallel code. PTOOL is a semi-automatic tool for detecting implicit parallelism in sequential Fortran code. It uses vectorizing compiler techniques to identify dependences preventing the parallelization of sequential regions. According to the model supported by PTOOL, a programmer should first implement and test his program using traditional sequential debugging techniques. Then, using PTOOL, he can select loop bodies that can be safely executed in parallel. At Los Alamos, we have been interested in examining the role of dependence-analysis tools in the parallel programming process. Therefore, we have used PTOOL as a static debugging tool to analyze parallel Fortran programs. Our experiences using PTOOL lead us to conclude that dependence-analysis tools are useful to todays parallel programmers. Dependence-analysis is particularly useful in the development of asynchronous parallel code. With a tool like PTOOL, a programmer can guarantee that processor scheduling cannot affect the results of his parallel program. If a programmer wishes to implement a partially parallelized region through the use of synchronization primitives, however, he will find that dependence analysis is less useful. While a dependence-analysis tool can greatly simplify the task of writing synchronization code, the ultimate responsibility of correctness is left to the programmer.
conference on high performance computing (supercomputing) | 1988
Margaret L. Simmons; Harvey J. Wasserman
The serial and parallel performance of one of the worlds fastest general purpose computers, the CRAY-2, is analyzed using the standard Los Alamos Benchmark Set plus codes adapted for parallel processing. For comparison, architectural and performance data are also given for the CRAY X-MP/416. Factors affecting performance, such as memory bandwidth, size and access speed of memory, and software exploitation of hardware, are examined. The parallel processing environments of both machines are evaluated, and speedup measurements for the parallel codes are given.
parallel computing | 1988
Harvey J. Wasserman; Margaret L. Simmons; Olaf M. Lubeck
Abstract A comparison of the architectures and performance of a set of standard FORTRAN benchmark codes is made of the Alliant FX, Convex C-1, and SCS-40 minisupercomputers.
conference on high performance computing supercomputing | 1988
Margaret L. Simmons; Harvey J. Wasserman
The serial and parallel performance of the Cray-2 is analyzed using the standard Los Alamos benchmark set plus codes adopted for parallel processing. For comparison, architectural and performance data are given for the Cray-X-MP/416. Factors affecting performance, such as memory bandwidth, size and access speed of memory, and software exploitation of hardware, are examined. The parallel-processing environments of both machines are evaluated, and speedup measurements for the parallel codes are given.<<ETX>>
Archive | 1990
Margaret L. Simmons; Rebecca Koskela