Greg Henry | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Greg Henry is active.

Explore More

Publication

Featured researches published by Greg Henry.

Archive | 1997

ScaLAPACK users' guide

L. S. Blackford; Jaeyoung Choi; Andrew J. Cleary; Eduardo F. D'Azevedo; James Demmel; Inderjit S. Dhillon; Jack J. Dongarra; Sven Hammarling; Greg Henry; Antoine Petitet; K. Stanley; David Walker; R. C. Whaley

Where you can find the scalapack users guide easily? Is it in the book store? On-line book store? are you sure? Keep in mind that you will find the book in this site. This book is very referred for you because it gives not only the experience but also lesson. The lessons are very valuable to serve for you, thats not about who are reading this scalapack users guide book. It is about this book that will give wellness for all people from many societies.

ACM Transactions on Mathematical Software | 2002

An Updated Set of Basic Linear Algebra Subprograms (BLAS)

L. Susan Blackford; Antoine Petitet; Roldan Pozo; Karin A. Remington; R. Clint Whaley; James Demmel; Jack J. Dongarra; Iain S. Duff; Sven Hammarling; Greg Henry; Michael A. Heroux; Linda Kaufman; Andrew Lumsdaine

L. SUSAN BLACKFORD Myricom, Inc. JAMES DEMMEL University of California, Berkeley JACK DONGARRA The University of Tennessee IAIN DUFF Rutherford Appleton Laboratory and CERFACS SVEN HAMMARLING Numerical Algorithms Group, Ltd. GREG HENRY Intel Corporation MICHAEL HEROUX Sandia National Laboratories LINDA KAUFMAN William Patterson University ANDREW LUMSDAINE Indiana University ANTOINE PETITET Sun Microsystems ROLDAN POZO National Institute of Standards and Technology KARIN REMINGTON The Center for Advancement of Genomics and R. CLINT WHALEY Florida State University

conference on high performance computing (supercomputing) | 1996

ScaLAPACK: A Portable Linear Algebra Library for Distributed Memory Computers - Design Issues and Performance

L. S. Blackford; Jaeyoung Choi; A. Cleary; Antoine Petitet; R. C. Whaley; James Demmel; Inderjit S. Dhillon; K. Stanley; Jack J. Dongarra; S. Hammarling; Greg Henry; David W. Walker

This paper outlines the content and performance of ScaLAPACK, a collection of mathematical software for linear algebra computations on distributed memory computers. The importance of developing standards for computational and message passing interfaces is discussed. We present the different components and building blocks of ScaLAPACK, and indicate the difficulties inherent in producing correct codes for networks of heterogeneous processors. Finally, this paper briefly describes future directions for the ScaLAPACK library and concludes by suggesting alternative approaches to mathematical libraries, explaining how ScaLAPACK could be integrated into efficient and user-friendly distributed systems.

ACM Transactions on Mathematical Software | 2001

FLAME: Formal Linear Algebra Methods Environment

John A. Gunnels; Fred G. Gustavson; Greg Henry; Robert A. van de Geijn

Since the advent of high-performance distributed-memory parallel computing, the need for intelligible code has become ever greater. The development and maintenance of libraries for these architectures is simply too complex to be amenable to conventional approaches to implementation. Attempts to employ traditional methodology have led, in our opinion, to the production of an abundance of anfractuous code that is difficult to maintain and almost impossible to upgrade.Having struggled with these issues for more than a decade, we have concluded that a solution is to apply a technique from theoretical computer science, formal derivation, to the development of high-performance linear algebra libraries. We think the resulting approach results in aesthetically pleasing, coherent code that greatly facilitates intelligent modularity and high performance while enhancing confidence in its correctness. Since the technique is language-independent, it lends itself equally well to a wide spectrum of programming languages (and paradigms) ranging from C and Fortran to C++ and Java. In this paper, we illustrate our observations by looking at the Formal Linear Algebra Methods Environment (FLAME), a framework that facilitates the derivation and implementation of linear algebra algorithms on sequential architectures. This environment demonstrates that lessons learned in the distributed-memory world can guide us toward better approaches even in the sequential world.We present performance experiments on the Intel (R) Pentium (R) III processor that demonstrate that high performance can be attained by coding at a high level of abstraction.

international parallel and distributed processing symposium | 2013

Design and Implementation of the Linpack Benchmark for Single and Multi-node Systems Based on Intel® Xeon Phi Coprocessor

Alexander Heinecke; Karthikeyan Vaidyanathan; Mikhail Smelyanskiy; Alexander Kobotov; Roman Dubtsov; Greg Henry; Aniruddha G. Shet; George Z. Chrysos; Pradeep Dubey

Dense linear algebra has been traditionally used to evaluate the performance and efficiency of new architectures. This trend has continued for the past half decade with the advent of multi-core processors and hardware accelerators. In this paper we describe how several flavors of the Linpack benchmark are accelerated on Intels recently released Intel® Xeon Phi™1 co-processor (code-named Knights Corner) in both native and hybrid configurations. Our native DGEMM implementation takes full advantage of Knights Corners salient architectural features and successfully utilizes close to 90% of its peak compute capability. Our native Linpack implementation running entirely on Knights Corner employs novel dynamic scheduling and achieves close to 80% efficiency - the highest published co-processor efficiency. Similarly to native, our single-node hybrid implementation of Linpack also achieves nearly 80% efficiency. Using dynamic scheduling and an enhanced look-ahead scheme, this implementation scales well to a 100-node cluster, on which it achieves over 76% efficiency while delivering the total performance of 107 TFLOPS.

SIAM Journal on Scientific Computing | 2002

A Parallel Implementation of the Nonsymmetric QR Algorithm for Distributed Memory Architectures

Greg Henry; David S. Watkins; Jack J. Dongarra

One approach to solving the nonsymmetric eigenvalue problem in parallel is to parallelize the QR algorithm. Not long ago, this was widely considered to be a hopeless task. Recent efforts have led to significant advances, although the methods proposed up to now have suffered from scalability problems. This paper discusses an approach to parallelizing the QR algorithm that greatly improves scalability. A theoretical analysis indicates that the algorithm is ultimately not scalable, but the nonscalability does not become evident until the matrix dimension is enormous. Experiments on the Intel Paragon system, the IBM SP2 supercomputer, the SGI Origin 2000, and the Intel ASCI Option Red supercomputer are reported.

SIAM Journal on Scientific Computing | 1996

Parallelizing the QR algorithm for the unsymmetric algebraic eigenvalue problem: myths and reality

Greg Henry; Robert A. van de Geijn

Over the last few years, it has been suggested that the popular QR algorithm for the unsymmetric Schur decomposition does not parallelize. In this paper, we present both positive and negative results on this subject. In theory, asymptotically perfect speedup can be obtained. In practice, reasonable speedup can be obtained on an MIMD distributed memory computer for a relatively small number of processors. However, we also show theoretically that it is impossible for the standard QR algorithm to be scalable. Performance of a parallel implementation of the LAPACK DLAHQR routine on the Intel

conference on high performance computing (supercomputing) | 2000

High-Performance Reactive Fluid Flow Simulations Using Adaptive Mesh Refinement on Thousands of Processors

Alan Clark Calder; B. C. Curtis; L. J. Dursi; Bruce Fryxell; P. Macneice; K. Olson; Paul M. Ricker; R. Rosner; F. X. Timmes; Henry M. Tufo; J. W. Turan; Michael Zingale; Greg Henry

{\text{Paragon}}^{{\text{TM}}}

parallel computing | 2004

A family of high-performance matrix multiplication algorithms

John A. Gunnels; Fred G. Gustavson; Greg Henry; Robert A. van de Geijn

system is reported.

international conference on computational science | 2001

A Family of High-Performance Matrix Multiplication Algorithms

John A. Gunnels; Greg Henry; Robert A. van de Geijn

We present simulations and performance results of nuclear burning fronts in super- novae on the largest domain and at the finest spatial resolution studied to date. These simulations were performed on the Intel ASCI-Red machine at Sandia National Laboratories using FLASH, a code developed at the Center for Astrophysical Thermonuclear Flashes at the University of Chicago. FLASH is a modular, adaptive mesh, parallel simulation code capable of handling compressible, reactive fluid flows in astrophysical environments. FLASH is written primarily in Fortran 90, uses the Message-Passing Interface library for inter-processor communication and portability, and employs the PARAMESH package to manage a block-structured adaptive mesh that places blocks only where resolution is required and tracks rapidly changing flow features, such as detonation fronts, with ease. We describe the key algorithms and their implementation as well as the optimizations required to achieve sustained performance of 238 GFLOPS on 6420 processors of ASCI-Red in 64 bit arithmetic.

Explore More