Robert E. Benner
Sandia National Laboratories
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Robert E. Benner.
Siam Journal on Scientific and Statistical Computing | 1988
John L. Gustafson; Gary R. Montry; Robert E. Benner
We have developed highly efficient parallel solutions for three practical, full-scale scientific problems: wave mechanics, fluid dynamics, and structural analysis. Several algorithmic techniques are used to keep communication and serial overhead small as both problem size and number of processors are varied. A new parameter, operation efficiency, is introduced that quantifies the tradeoff between communication and redundant computation. A 1024-processor MIMD ensemble is measured to be 502 to 637 times as fast as a single processor when problem size for the ensemble is fixed, and 1009 to 1020 times as fast as a single processor when problem size per processor is fixed. The latter measure, denoted scaled speedup, is developed and contrasted with the traditional measure of parallel speedup. The scaled-problem paradigm better reveals the capabilities of large ensembles, and permits detection of subtle hardware-induced load imbalances (such as error correction and data-dependent MFLOPS rates) that may become increasingly important as parallel processors increase in node count. Sustained performance for the applications is 70 to 130 MFLOPS, validating the massively parallel ensemble approach as a practical alternative to more conventional processing methods. The techniques presented appear extensible to even higher levels of parallelism than the 1024-processor level explored here.
conference on high performance computing supercomputing | 1989
John L. Gustafson; Robert E. Benner; Mark P. Sears; Thomas D. Sullivan
We have developed a fast parallel version of an existing synthetic aperture radar (SAR) simulation program, SRIM. On a 1024-processor NCUBE hypercube it runs an order of magnitude faster than on a CRAY X-MP or CRAY Y-MP processor. This speed advantage is coupled with an order of magnitude advantage in machine acquisition cost. SRIM is a somewhat large (30,000 lines of Fortran 77) program designed for uniprocessors; its restructuring for a hypercube provides new lessons in the task of altering older serial programs to run well on modern parallel architectures. We describe the techniques used for parallelization, and the performance obtained. Several novel parallel approaches to problems of task distribution, data distribution, and direct output were required. These techniques increase performance and appear to have general applicability for massive parallelism. We describe the hierarchy necessary to dynamically manage (i.e., load balance) a large ensemble. The ensemble is used in a heterogeneous manner, with different programs on different parts of the hypercube. The heterogeneous approach takes advantage of the independent instruction streams possible on MIMD machines.
conference on high performance computing (supercomputing) | 1994
David E. Womble; David S. Greenberg; Stephen R. Wheat; Robert E. Benner; Marc S. Ingber; Greg Henry; Satya Gutpa
This paper describes three applications of the boundary element method and their implementations on the Intel Paragon supercomputer. Each of these applications sustains over 99 Gflops/s based on wall-clock time for the entire application and an actual count of flops executed; one application sustains over 140 Gflops/s. Each application accepts the description of an arbitrary geometry and computes the solution to a problem of commercial and research interest. The common kernel for these applications is a dense equation solver based on LU factorization. It is generally accepted that good performance can be achieved by dense matrix algorithms, but achieving the excellent performance demonstrated here required the development of a variety of special techniques to take full advantage of the power of the Intel Paragon.<<ETX>>
2013 International Conference on Computing, Networking and Communications (ICNC) | 2013
Robert E. Benner; Victor T. E. Echeverria; Uzoma Onunkwo; Jay S. Patel; David John Zage
Many-core processors have become the mainstay of todays computing systems. This fact and their ease of accessibility is now broadening the horizons of computational advances. In this work, we demonstrate the use of many-core processing platforms to provide scalable, efficient, and easily configurable firewall implementations on many-core processors. Our work has made possible, to the best of our knowledge, a first-known pipelined and scalable implementation of a stateful firewall on many-core processors. We discuss the results of our work and highlight areas for future considerations and improvements. Although this work focuses on the firewall as an exemplar network protection tool, the ideas developed apply to other network processing applications like network intrusion detection systems.
Archive | 2011
Kyle Bruce Wheeler; John Hunt Naegle; Brian J. Wright; Robert E. Benner; Jeffrey Scott Shelburg; David Benjamin Pearson; Joshua Alan Johnson; Uzoma Onunkwo; David John Zage; Jay S. Patel
This report documents our first year efforts to address the use of many-core processors for high performance cyber protection. As the demands grow for higher bandwidth (beyond 1 Gbits/sec) on network connections, the need to provide faster and more efficient solution to cyber security grows. Fortunately, in recent years, the development of many-core network processors have seen increased interest. Prior working experiences with many-core processors have led us to investigate its effectiveness for cyber protection tools, with particular emphasis on high performance firewalls. Although advanced algorithms for smarter cyber protection of high-speed network traffic are being developed, these advanced analysis techniques require significantly more computational capabilities than static techniques. Moreover, many locations where cyber protections are deployed have limited power, space and cooling resources. This makes the use of traditionally large computing systems impractical for the front-end systems that process large network streams; hence, the drive for this study which could potentially yield a highly reconfigurable and rapidly scalable solution.
ieee computer society international conference | 1988
Robert E. Benner
Direct solution methods based on Gauss elimination and its variants are considered. These are of particular interest because they readily expose memory limitations and communication bottlenecks of parallel architectures, are integral components of other, more highly parallel matrix algorithms, and can solve poorly conditioned problems on which other methods fail. The parallel implementation and performance of a class of direct methods is examined on three architectures; namely, the CRAY X-MP/48, ELXSI 6400, and NCUBE/ten.<<ETX>>
Archive | 1989
Robert E. Benner; John L. Gustafson; Gary R. Montry
Archive | 1991
Robert E. Benner; John L. Gustafson; Gary R. Montry
Archive | 1986
Robert E. Benner; Gary R. Montry
AT&T technical journal | 1991
Robert E. Benner; Joseph Harris