T. J. Christopher Ward

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where T. J. Christopher Ward is active.

Explore More

Publication

Featured researches published by T. J. Christopher Ward.

conference on high performance computing (supercomputing) | 2006

Blue matter: approaching the limits of concurrency for classical molecular dynamics

Blake G. Fitch; Aleksandr Rayshubskiy; Maria Eleftheriou; T. J. Christopher Ward; Mark E. Giampapa; Michael C. Pitman; Robert S. Germain

This paper describes a novel spatial-force decomposition for N-body simulations for which we observe O(sqrt(p)) communication scaling. This has enabled Blue Matter to approach the effective limits of concurrency for molecular dynamics using particle-mesh (FFT-based) methods for handling electrostatic interactions. Using this decomposition, Blue Matter running on Blue Gene/L has achieved simulation rates in excess of 1000 time steps per second and demonstrated significant speed-ups to O(1) atom per node. Blue Matter employs a communicating sequential process (CSP) style model with application communication state machines compiled to hardware interfaces. The scalability achieved has enabled methodologically rigorous biomolecular simulations on biologically interesting systems, such as membrane-bound proteins, whose time scales dwarf previous work on those systems. Major scaling improvements require exploration of alternative algorithms for treating the long range electrostatics

international conference on computational science | 2006

Blue matter: strong scaling of molecular dynamics on blue gene/l

Blake G. Fitch; Aleksandr Rayshubskiy; Maria Eleftheriou; T. J. Christopher Ward; Mark E. Giampapa; Yuriy Zhestkov; Michael C. Pitman; Frank Suits; Alan Grossfield; Jed W. Pitera; William C. Swope; Ruhong Zhou; Scott E. Feller; Robert S. Germain

This paper presents strong scaling performance data for the Blue Matter molecular dynamics framework using a novel n-body spatial decomposition and a collective communications technique implemented on both MPI and low level hardware interfaces. Using Blue Matter on Blue Gene/L, we have measured scalability through 16,384 nodes with measured time per time-step of under 2.3 milliseconds for a 43,222 atom protein/lipid system. This is equivalent to a simulation rate of over 76 nanoseconds per day and represents an unprecedented time-to-solution for biomolecular simulation as well as continued speed-up to fewer than three atoms per node. On a smaller, solvated lipid system with 13,758 atoms, we have achieved continued speedups through fewer than one atom per node and less than 2 milliseconds/time-step. On a 92,224 atom system, we have achieved floating point performance of over 1.8 TeraFlops/second on 16,384 nodes. Strong scaling of fixed-size classical molecular dynamics of biological systems to large numbers of nodes is necessary to extend the simulation time to the scale required to make contact with experimental data and derive biologically relevant insights.

european conference on parallel processing | 2005

Performance measurements of the 3D FFT on the blue gene/l supercomputer

Maria Eleftheriou; Blake G. Fitch; Aleksandr Rayshubskiy; T. J. Christopher Ward; Robert S. Germain

This paper presents performance characteristics of a communications-intensive kernel, the complex data 3D FFT, running on the Blue Gene/L architecture. Two implementations of the volumetric FFT algorithm were characterized, one built on the MPI library using an optimized collective all-to-all operation [2] and another built on a low-level System Programming Interface (SPI) of the Blue Gene/L Advanced Diagnostics Environment (BG/L ADE) [17]. We compare the current results to those obtained using a reference MPI implementation (MPICH2 ported to BG/L with unoptimized collectives) and to a port of version 2.1.5 the FFTW library [14]. Performance experiments on the Blue Gene/L prototype indicate that both of our implementations scale well and the current MPI-based implementation shows a speedup of 730 on 2048 nodes for 3D FFTs of size 128 × 128 × 128. Moreover, the volumetric FFT outperforms FFTW port by a factor 8 for a 128× 128× 128 complex FFT on 2048 nodes.

international conference on parallel architectures and compilation techniques | 2004

A High-Performance SIMD Floating Point Unit for BlueGene/L: Architecture, Compilation, and Algorithm Design

Leonardo R. Bachega; Siddhartha Chatterjee; Kenneth Dockser; John A. Gunnels; Manish Gupta; Fred G. Gustavson; Christopher A. Lapkowski; Gary K. Liu; Mark P. Mendell; Charles D. Wait; T. J. Christopher Ward

We describe the design, implementation, and evaluation of a dual-issue SIMD-like extension of the PowerPC 440 floating-point unit (FPU) core. This extended FPU is targeted at both IBMs massively parallel BlueGene/L machine as well as more pervasive embedded platforms. It has several novel features, such as a computational crossbar and cross-load/store instructions, which enhance the performance of numerical codes. We further discuss the hardware-software co-design that was essential to fully realize the performance benefits of the FPU when constrained by the memory bandwidth limitations and high penalties for misaligned data access imposed by the memory hierarchy on a BlueGene/L node. We describe several novel compiler and algorithmic techniques to take advantage of this architecture. Using both hand-optimized and compiled code for key linear algebraic kernels, we validate the architectural design choices, evaluate the success of the compiler, and quantify the effectiveness of the novel algorithm design techniques. Preliminary performance data shows that the algorithm-compiler-hardware combination delivers a significant fraction of peak floating-point performance for compute-bound kernels such as matrix multiplication, and delivers a significant fraction of peak memory bandwidth for memory-bound kernels such as daxpy, while being largely insensitive to data alignment.

petascale data storage workshop | 2009

Using the Active Storage Fabrics model to address petascale storage challenges

Blake G. Fitch; Aleksandr Rayshubskiy; Michael C. Pitman; T. J. Christopher Ward; Robert S. Germain

We present the Active Storage Fabrics (ASF) model for storage embedded parallel processing as a way to address petascale data intensive challenges. ASF is aimed at emerging scalable system-on-a-chip, storage class memory architectures, but may be realized in prototype form on current parallel systems. ASF can be used to transparently accelerate host workloads by close integration at the middleware data/storage boundary or directly by data intensive applications. We provide an overview of the major components involved in accelerating a parallel file system and a relational database management system, describe some early results, and outline our current research directions.

international parallel and distributed processing symposium | 2016

Key/Value-Enabled Flash Memory for Complex Scientific Workflows with On-Line Analysis and Visualization

Stefan Eilemann; Fabien Delalondre; Jon Bernard; Judit Planas; Felix Schuermann; John Biddiscombe; Costas Bekas; Alessandro Curioni; Bernard Metzler; Peter Kaltstein; Peter Morjan; Joachim Fenkes; Ralph Bellofatto; Lars Schneidenbach; T. J. Christopher Ward; Blake G. Fitch

Scientific workflows are often composed of compute-intensive simulations and data-intensive analysis and visualization, both equally important for productivity. High-performance computers run the compute-intensive phases efficiently, but data-intensive processing is still getting less attention. Dense non-volatile memory integrated into super-computers can help address this problem. In addition to density, it offers significantly finer-grained I/O than disk-based I/O systems. We present a way to exploit the fundamental capabilities of Storage-Class Memories (SCM), such as Flash, by using scalable key-value (KV) I/O methods instead of traditional file I/O calls commonly used in HPC systems. Our objective is to enable higher performance for on-line and near-line storage for analysis and visualization of very high resolution, but correspondingly transient, simulation results. In this paper, we describe 1) the adaptation of a scalable key-value store to a BlueGene/Q system with integrated Flash memory, 2) a novel key-value aggregation module which implements coalesced, function-shipped calls between the clients and the servers, and 3) the refactoring of a scientific workflow to use application-relevant keys for fine-grained data subsets. The resulting implementation is analogous to function-shipping of POSIX I/O calls but shows an order of magnitude increase in read and a factor 2.5x increase in write IOPS performance (11 million read IOPS, 2.5 million write IOPS from 4096 compute nodes) when compared to a classical file system on the same system. It represents an innovative approach for the integration of SCM within an HPC system at scale.

european conference on parallel processing | 2006

Progress in scaling biomolecular simulations to petaflop scale platforms

Blake G. Fitch; Aleksandr Rayshubskiy; Maria Eleftheriou; T. J. Christopher Ward; Mark E. Giampapa; Michael C. Pitman; Robert S. Germain

This paper describes some of the issues involved with scaling biomolecular simulations onto massively parallel machines drawing on the Blue Matter application teams experiences with Blue Gene/L. Our experiences in scaling biomolecular simulation to one atom/node on BG/L should be relevant to scaling biomolecular simulations onto larger peta-scale platforms because the path to increased performance is through the exploitation of increased concurrency so that even larger systems will have to operate in the extreme strong scaling regime. Petascale platforms also present challenges with regard to the correctness of biomolecular simulations since longer time-scale simulations are more likely to encounter significant energy drift. Total energy drift data for a microsecond-scale simulation is presented along with the measured scalability of various components of a molecular dynamics time-step.

Current Topics in Membranes | 2008

Chapter 6 Blue Matter: Scaling of N-Body Simulations to One Atom per Node

Blake G. Fitch; Aleksandr Rayshubskiy; Maria Eleftheriou; T. J. Christopher Ward; Mark E. Giampapa; Michael C. Pitman; Jed W. Pitera; William C. Swope; Robert S. Germain

N-body simulations present some of the most interesting challenges in the area of massively parallel computing, especially when the object is to improve the total time to solution for a fixed size problem. The Blue Matter molecular simulation framework has been developed specifically to address these challenges in order to explore programming models for massively parallel machine architectures in a concrete context and to support the scientific goals of the IBM Blue Gene project. This paper reviews the key issues involved in achieving ultra-strong scaling of methodologically correct biomolecular simulations, in particular, the treatment of the long range electrostatic forces present in simulations of proteins in water and membranes. Blue Matter computes these forces using the Particle–Particle–Particle–Mesh Ewald (P3ME) method which breaks the problem up into two pieces, one of which requires the use of three-dimensional Fast Fourier Transforms with global data dependencies and the other which involves computing interactions between pairs of particles within a cut-off distance. We will summarize our exploration of the parallel decompositions used to compute these finite-ranged interactions carried out as part of the Blue Matter development effort, describe some of the implementation details involved in these decompositions, and present the evolution in (strong scaling) performance achieved over the course of this exploration along with evidence for the quality of simulation achieved.

Archive | 2007