Gary R. Montry | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Gary R. Montry is active.

Explore More

Publication

Featured researches published by Gary R. Montry.

Siam Journal on Scientific and Statistical Computing | 1988

Development of Parallel Methods for a

John L. Gustafson; Gary R. Montry; Robert E. Benner

We have developed highly efficient parallel solutions for three practical, full-scale scientific problems: wave mechanics, fluid dynamics, and structural analysis. Several algorithmic techniques are used to keep communication and serial overhead small as both problem size and number of processors are varied. A new parameter, operation efficiency, is introduced that quantifies the tradeoff between communication and redundant computation. A 1024-processor MIMD ensemble is measured to be 502 to 637 times as fast as a single processor when problem size for the ensemble is fixed, and 1009 to 1020 times as fast as a single processor when problem size per processor is fixed. The latter measure, denoted scaled speedup, is developed and contrasted with the traditional measure of parallel speedup. The scaled-problem paradigm better reveals the capabilities of large ensembles, and permits detection of subtle hardware-induced load imbalances (such as error correction and data-dependent MFLOPS rates) that may become increasingly important as parallel processors increase in node count. Sustained performance for the applications is 70 to 130 MFLOPS, validating the massively parallel ensemble approach as a practical alternative to more conventional processing methods. The techniques presented appear extensible to even higher levels of parallelism than the 1024-processor level explored here.

ieee international conference on high performance computing data and analytics | 1987

1024

R.E. Benner; Gary R. Montry; G.G. Weigand; Iain S. Duff

Frontal methods are an efficient and pop ular means of Gauss elimination of matrix equations that arise in finite element analysis. Nested dissection of a computa tional domain makes possible high-level parallelism in a widely used frontal algo rithm for unsymmetric systems. A concur rent, highly vectorized, multifrontal, finite element analysis of axisymmetric liquid drop oscillations with 2,210 equations runs on the CRAY X-MP/48 with factors of 1.9 and 2.9 reduction in elapsed time on two and four processors, respectively. On an ELXSI 6400 (which has an additional memory level, local processor cache, ig nored in the algorithms design for the CRAY), implementation of the same problem initially achieved a speedup of only 1.4 on four processors. Modification of the concurrent algorithm, to take ad vantage of the cache and frontwidth re duction by element reordering, doubled the concurrent speedup on the ELXSI to 2.8 on four processors.

ieee computer society international conference | 1988

-Processor Hypercube

John L. Gustafson; Gary R. Montry

The strengths and weaknesses of the first generation of commercial hypercube multiprocessors and their effects on software development and implementation are examined. Program loading, language, debugging, communications, algorithms, and load balance are considered. These issues are addressed with respect to hypercubes of a thousand or more processing elements.<<ETX>>

ieee international conference on high performance computing data and analytics | 1989

Concurrent Multifrontal Methods: Shared Memory, Cache, and Frontwidth Issues

Gary R. Montry

The Sieve of Eratosthenes is a well-known algorithm for finding all prime numbers in a given subset of integers. A parallel version of the Sieve is described that produces computational speedups over 800 on a hypercube with 1,024 processing elements for problems of fixed size. Computational speedups as high as 980 are achieved when the problem size per processor is fixed. The method of parallelization generalizes to other sieves and will be efficient on any ensemble architecture. We in vestigate two highly parallel sieves using scattered de composition and compare their performance on a hyper cube multiprocessor. A comparison of different paralleli zation techniques for the sieve illustrates the trade-offs necessary in the design and implementation of mas sively parallel algorithms for large ensemble computers.

hypercube concurrent computers and applications | 1989

Programming and performance on a cube-connected architecture

David Walker; Geoffrey C. Fox; Gary R. Montry

This work describes the implementation of a finite-difference algorithm, incorporating the flux-corrected transport technique, on the NCUBE hypercube. The algorithm is used to study two-dimensional, convectively-dominated fluid flows, and as a sample problem the onset and growth of the Kelvin-Helmholtz instability is investigated. Timing results are presented for a number of different sized problems on hypercubes of dimension up to 9. These results are interpreted by means of a simple performance model. The extension of the algorithm to the three-dimensional case is also discussed.

Applied Mathematics and Computation | 1988

Massively Parallel Mathematical Sieves

Melvin R. Scott; Gary R. Montry

In the next six years, Sandia National Laboratories plans to acquire several of the most powerful supercomputers available at that particular time. We anticipate that these machines will have multitasking capabilities. To ensure that the Department of Energy programmatic goals will be met, it is essential that these machines be used efficiently. For this purpose, the Applied Mathematics Division and the Parallel Processing Division at Sandia have underway research programs to investigate the interaction between algorithms, the software environment, and advanced computer architecture for numerical-analysis applications in both mathematical-software libraries and large production codes. This report presents some preliminary results of running a code for solving two-point boundary-value problems in a multitasking environment on the ELXSI 6400.

Archive | 1989