Kevin J. Bowers | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Kevin J. Bowers is active.

Explore More

Publication

Featured researches published by Kevin J. Bowers.

conference on high performance computing (supercomputing) | 2006

Scalable algorithms for molecular dynamics simulations on commodity clusters

Kevin J. Bowers; Edmond Chow; Huafeng Xu; Ron O. Dror; Michael P. Eastwood; Brent A. Gregersen; John L. Klepeis; István Kolossváry; Mark A. Moraes; Federico D. Sacerdoti; John K. Salmon; Yibing Shan; David E. Shaw

Although molecular dynamics (MD) simulations of biomolecular systems often run for days to months, many events of great scientific interest and pharmaceutical relevance occur on long time scales that remain beyond reach. We present several new algorithms and implementation techniques that significantly accelerate parallel MD simulations compared with current state-of-the-art codes. These include a novel parallel decomposition method and message-passing techniques that reduce communication requirements, as well as novel communication primitives that further reduce communication time. We have also developed numerical techniques that maintain high accuracy while using single precision computation in order to exploit processor-level vector instructions. These methods are embodied in a newly developed MD code called Desmond that achieves unprecedented simulation throughput and parallel scalability on commodity clusters. Our results suggest that Desmonds parallel performance substantially surpasses that of any previously described code. For example, on a standard benchmark, Desmonds performance on a conventional Opteron cluster with 2K processors slightly exceeded the reported performance of IBMs Blue Gene/L machine with 32K processors running its Blue Matter MD code

ieee international conference on high performance computing data and analytics | 2009

Millisecond-scale molecular dynamics simulations on Anton

David E. Shaw; Ron O. Dror; John K. Salmon; J. P. Grossman; Kenneth M. Mackenzie; Joseph A. Bank; Cliff Young; Martin M. Deneroff; Brannon Batson; Kevin J. Bowers; Edmond Chow; Michael P. Eastwood; Douglas J. Ierardi; John L. Klepeis; Jeffrey S. Kuskin; Richard H. Larson; Kresten Lindorff-Larsen; Paul Maragakis; Mark A. Moraes; Stefano Piana; Yibing Shan; Brian Towles

Anton is a recently completed special-purpose supercomputer designed for molecular dynamics (MD) simulations of biomolecular systems. The machines specialized hardware dramatically increases the speed of MD calculations, making possible for the first time the simulation of biological molecules at an atomic level of detail for periods on the order of a millisecond-about two orders of magnitude beyond the previous state of the art. Anton is now running simulations on a timescale at which many critically important, but poorly understood phenomena are known to occur, allowing the observation of aspects of protein dynamics that were previously inaccessible to both computational and experimental study. Here, we report Antons performance when executing actual MD simulations whose accuracy has been validated against both existing MD software and experimental observations. We also discuss the manner in which novel algorithms have been coordinated with Antons co-designed, application-specific hardware to achieve these results.

international symposium on computer architecture | 2007

Anton, a special-purpose machine for molecular dynamics simulation

David E. Shaw; Martin M. Deneroff; Ron O. Dror; Jeffrey S. Kuskin; Richard H. Larson; John K. Salmon; Cliff Young; Brannon Batson; Kevin J. Bowers; Jack C. Chao; Michael P. Eastwood; Joseph Gagliardo; J. P. Grossman; C. Richard Ho; Douglas J. Ierardi; István Kolossváry; John L. Klepeis; Timothy Layman; Christine McLeavey; Mark A. Moraes; Rolf Mueller; Edward C. Priest; Yibing Shan; Jochen Spengler; Michael Theobald; Brian Towles; Stanley C. Wang

The ability to perform long, accurate molecular dynamics (MD) simulations involving proteins and other biological macro-molecules could in principle provide answers to some of the most important currently outstanding questions in the fields of biology, chemistry and medicine. A wide range of biologically interesting phenomena, however, occur over time scales on the order of a millisecond--about three orders of magnitude beyond the duration of the longest current MD simulations. In this paper, we describe a massively parallel machine called Anton, which should be capable of executing millisecond-scale classical MD simulations of such biomolecular systems. The machine, which is scheduled for completion by the end of 2008, is based on 512 identical MD-specific ASICs that interact in a tightly coupled manner using a specialized high-speed communication network. Anton has been designed to use both novel parallel algorithms and special-purpose logic to dramatically accelerate those calculations that dominate the time required for a typical MD simulation. The remainder of the simulation algorithm is executed by a programmable portion of each chip that achieves a substantial degree of parallelism while preserving the flexibility necessary to accommodate anticipated advances in physical models and simulation methods.

Journal of Chemical Physics | 2006

The midpoint method for parallelization of particle simulations

Kevin J. Bowers; Ron O. Dror; David E. Shaw

The evaluation of interactions between nearby particles constitutes the majority of the computational workload involved in classical molecular dynamics (MD) simulations. In this paper, we introduce a new method for the parallelization of range-limited particle interactions that proves particularly suitable to MD applications. Because it applies not only to pairwise interactions but also to interactions involving three or more particles, the method can be used for evaluation of both nonbonded and bonded forces in a MD simulation. It requires less interprocessor data transfer than traditional spatial decomposition methods at all but the lowest levels of parallelism. It gains an additional practical advantage in certain commonly used interprocessor communication networks by distributing the communication burden more evenly across network links and by decreasing the associated latency. When used to parallelize MD, it further reduces communication requirements by allowing the computations associated with short-range nonbonded interactions, long-range electrostatics, bonded interactions, and particle migration to use much of the same communicated data. We also introduce certain variants of this method that can significantly improve the balance of computational load across processors.

Journal of Computational Physics | 2007

Zonal methods for the parallel execution of range-limited N-body simulations

Kevin J. Bowers; Ron O. Dror; David E. Shaw

Particle simulations in fields ranging from biochemistry to astrophysics require the evaluation of interactions between all pairs of particles separated by less than some fixed interaction radius. The applicability of such simulations is often limited by the time required for calculation, but the use of massive parallelism to accelerate these computations is typically limited by inter-processor communication requirements. Recently, Snir [M. Snir, A note on N-body computations with cutoffs, Theor. Comput. Syst. 37 (2004) 295-318] and Shaw [D.E. Shaw, A fast, scalable method for the parallel evaluation of distance-limited pairwise particle interactions, J. Comput. Chem. 26 (2005) 1318-1328] independently introduced two distinct methods that offer asymptotic reductions in the amount of data transferred between processors. In the present paper, we show that these schemes represent special cases of a more general class of methods, and introduce several new algorithms in this class that offer practical advantages over all previously described methods for a wide range of problem parameters. We also show that several of these algorithms approach an approximate lower bound on inter-processor data transfer.

Journal of Chemical Physics | 2007

A common, avoidable source of error in molecular dynamics integrators

Ross A. Lippert; Kevin J. Bowers; Ron O. Dror; Michael P. Eastwood; Brent A. Gregersen; John L. Klepeis; István Kolossváry; David E. Shaw

In constrained molecular dynamics simulations using some of the most popular molecular dynamics codes, calculation of the velocities of constrained particles is based solely on the differences in particle positions during two successive time steps. This creates a numerical instability that the authors’ show to be signicant in a typical single-precision floating-point simulation. They describe a simple modification that eliminates this source of instability and demonstrate that this change substantially reduces the energy drift of a sample single-precision NVE simulation.

Journal of Physics: Conference Series | 2005

Overview of neutral territory methods for the parallel evaluation of pairwise particle interactions

Kevin J. Bowers; Ron O. Dror; David E. Shaw

Particle simulations in fields ranging from biochemistry to astrophysics require evaluation of the interactions between all pairs of particles separated by less than some fixed interaction radius. The extent to which such simulations can be parallelized has historically been limited by the time required for inter-processor communication. Recently, Snir (1) and Shaw (2) independently introduced two distinct methods for parallelization that achieve asymptotic and practical advantages over traditional techniques. We give an overview of these methods and show that they represent special cases of a more general class of methods. We describe other methods in this class that can confer advantages over any previously described method in terms of communication bandwidth and latency. Practically speaking, the best choice among the broad category of methods depends on such parameters as the interaction radius, the size of the simulated system, and the number of processors. We analyze the best choice among a subset of these methods across a broad range of parameters.

Molecular Physics | 2012

Computationally efficient molecular dynamics integrators with improved sampling accuracy

Cristian Predescu; Ross A. Lippert; Michael P. Eastwood; Douglas J. Ierardi; Huafeng Xu; Morten Ø. Jensen; Kevin J. Bowers; Justin Gullingsrud; Charles A. Rendleman; Ron O. Dror; David E. Shaw

The design of numerical integrators for particle simulations with arbitrary potentials entails fundamental trade-offs between the accuracy achieved and the amount of computation required. Here we introduce a class of explicit variational integrators designed to achieve high accuracy for quadratic potentials, with little additional computation relative to traditional integrators. We show that, in practice, these new integrators also improve accuracy for classical biomolecular simulations, since the potential in the vicinity of a typical trajectory point in such a simulation is generally well modelled by a quadratic well. In particular, these integrators provide better sampling accuracy for biomolecular simulation than the commonly used Verlet integrators, as indicated by a weaker dependence of simulated ensemble properties on the time step. They also reduce short-timescale energy fluctuations, thus substantially improving the efficiency of Hybrid Monte Carlo methods, and are easy to implement through straightforward modification of codes based on Verlet integrators.

IEEE Transactions on Signal Processing | 2010

Improved Twiddle Access for Fast Fourier Transforms

Kevin J. Bowers; Ross A. Lippert; Ron O. Dror; David E. Shaw

Optimizing the number of arithmetic operations required in fast Fourier transform (FFT) algorithms has been the focus of extensive research, but memory management is of comparable importance on modern processors. In this article, we investigate two known FFT algorithms, G and GT , that are similar to Cooley-Tukey decimation-in-time and decimation-in-frequency FFT algorithms but that give an asymptotic reduction in the number of twiddle factor loads required for depth-first recursions. The algorithms also allow for aggressive vectorization (even for non-power-of-2 orders) and easier optimization of trivial twiddle factor multiplies. We benchmark G and GT implementations with comparable Cooley-Tukey implementations on commodity hardware. In a comparison designed to isolate the effect of twiddle factor access optimization, these benchmarks show typical speedups ranging from 10% to 65%, depending on transform order, precision, and vectorization. A more heavily optimized implementation of GT yields substantial performance improvements over the widely used code FFTW for many transform orders. The twiddle factor access optimization technique can be generalized to other common FFT algorithms, including real-data FFTs, split-radix FFTs, and multidimensional FFTs.

Science | 2007

Mechanism of Na+/H+ Antiporting

Isaiah T. Arkin; Huafeng Xu; Morten Ø. Jensen; Eyal Arbely; Estelle R. Bennett; Kevin J. Bowers; Edmond Chow; Ron O. Dror; Michael P. Eastwood; Ravenna Flitman-Tene; Brent A. Gregersen; John L. Klepeis; István Kolossváry; Yibing Shan; David E. Shaw

Explore More