T. J. C. Ward
IBM
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by T. J. C. Ward.
Ibm Journal of Research and Development | 2005
Maria Eleftheriou; Blake G. Fitch; Aleksandr Rayshubskiy; T. J. C. Ward; Robert S. Germain
This paper presents results on a communications-intensive kernel, the three-dimensional fast Fourier transform (3D FFT), running on the 2,048-node Blue Gene®/L (BG/L) prototype. Two implementations of the volumetric FFT algorithm were characterized, one built on the Message Passing Interface library and another built on an active packet Application Program Interface supported by the hardware bring-up environment, the BG/L advanced diagnostics environment. Preliminary performance experiments on the BG/L prototype indicate that both of our implementations scale well up to 1,024 nodes for 3D FFTs of size 128 × 128 × 128. The performance of the volumetric FFT is also compared with that of the Fastest Fourier Transform in the West (FFTW) library. In general, the volumetric FFT outperforms a port of the FFTW Version 2.1.5 library on large-node-count partitions.
Ibm Journal of Research and Development | 2005
Robert S. Germain; Yuriy Zhestkov; Maria Eleftheriou; Aleksandr Rayshubskiy; Frank Suits; T. J. C. Ward; Blake G. Fitch
Blue Matter is the application framework being developed in conjunction with the scientific portion of the IBM Blue Gene® project. We describe the parallel decomposition currently being used to target the Blue Gene/L machine and discuss the application-based trace tools used to analyze the performance of the application. We also present the results of early performance studies, including a comparison of the performance of the Ewald and the particle-particle particle-mesh (P3ME) methods, compare the measured performance of some key collective operations with the limitations imposed by the hardware, and discuss some future directions for research.
Ibm Journal of Research and Development | 2008
Blake G. Fitch; Aleksandr Rayshubskiy; Maria Eleftheriou; T. J. C. Ward; Mark E. Giampapa; Mike Pitman; Jed W. Pitera; William C. Swope; Robert S. Germain
N-body simulations present some of the most interesting challenges in the area of massively parallel computing, especially when the object is to improve the time to solution for a fixed-size problem. The Blue Matter molecular simulation framework was developed specifically to address these challenges, to explore programming models for massively parallel machine architectures in a concrete context, and to support the scientific goals of the IBM Blue Gene® Project. This paper reviews the key issues involved in achieving ultrastrong scaling of methodologically correct biomolecular simulations, particularly the treatment of the long-range electrostatic forces present in simulations of proteins in water and membranes. Blue Matter computes these forces using the particle-particle particle-mesh Ewald (P3ME) method, which breaks the problem up into two pieces, one that requires the use of three-dimensional fast Fourier transforms with global data dependencies and another that involves computing interactions between pairs of particles within a cutoff distance. We summarize our exploration of the parallel decompositions used to compute these finite-ranged interactions, describe some of the implementation details involved in these decompositions, and present the evolution of strong-scaling performance achieved over the course of this exploration, along with evidence for the quality of simulation achieved.
Ibm Journal of Research and Development | 2005
Robert F. Enenkel; Blake G. Fitch; Robert S. Germain; Fred G. Gustavson; Andrew K. Martin; Mark P. Mendell; Jed W. Pitera; Mike Pitman; Aleksandr Rayshubskiy; Frank Suits; William C. Swope; T. J. C. Ward
While developing the protein folding application for the IBM Blue Gene®/L supercomputer, some frequently executed computational kernels were encountered. These were significantly more complex than the linear algebra kernels that are normally provided as tuned libraries with modern machines. Using regular library functions for these would have resulted in an application that exploited only 5-10% of the potential floating-point throughput of the machine. This paper is a tour of the functions encountered; they have been expressed in C++ (and could be expressed in other languages such as Fortran or C). With the help of a good optimizing compiler, floating-point efficiency is much closer to 100%. The protein folding application was initially run by the life science researchers on IBM POWER3™ machines while the computer science researchers were designing and bringing up the Blue Gene/L hardware. Some of the work discussed resulted in enhanced compiler optimizations, which now improve the performance of floating-point-intensive applications compiled by the IBM VisualAge® series of compilers for POWER3, POWER4™, POWER4+™, and POWER5™. The implementations are offered in the hope that they may help in other implementations of molecular dynamics or in other fields of endeavor, and in the hope that others may adapt the ideas presented here to deliver additional mathematical functions at high throughput.
international conference on supercomputing | 2014
Felix Schürmann; Fabien Delalondre; Pramod S. Kumbhar; John Biddiscombe; Miguel Gila; Davide Tacchella; Alessandro Curioni; Bernard Metzler; Peter Morjan; Joachim Fenkes; Michele M. Franceschini; Robert S. Germain; Lars Schneidenbach; T. J. C. Ward; Blake G. Fitch
Storage class memory is receiving increasing attention for use in HPC systems for the acceleration of intensive IO operations. We report a particular instance using SLC FLASH memory integrated with an IBM BlueGene/Q supercomputer at scale Blue Gene Active Storage, BGAS. We describe two principle modes of operation of the non-volatile memory: 1 block device; 2 direct storage access DSA. The block device layer, built on the DSA layer, provides compatibility with IO layers common to existing HPC IO systems POSIX, MPIO, HDF5 and is expected to provide high performance in bandwidth critical use cases. The novel DSA strategy enables a low-overhead, byte addressable, asynchronous, kernel by-pass access method for very high user space IOPs in multithreaded application environments. Here, we expose DSA through HDF5 using a custom file driver. Benchmark results for the different modes are presented and scale-out to full system size showcases the capabilities of this technology.
Journal of Physical Chemistry B | 2004
William C. Swope; Jed W. Pitera; Frank Suits; Mike Pitman; Maria Eleftheriou; Blake G. Fitch; Robert S. Germain; Aleksandr Rayshubski; T. J. C. Ward; Yuriy Zhestkov; Ruhong Zhou
Archive | 1989
T. J. C. Ward
Archive | 2006
Bryan L. Behrmann; Michael D. Dunagan; Eric Pyle; T. J. C. Ward; Terell White
Archive | 2006
Rohini Nair; T. J. C. Ward
Archive | 2005
Blake G. Fitch; Robert S. Germain; T. J. C. Ward; Aleksandr Rayshubskiy