Mark A. Moraes
D. E. Shaw Research
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Mark A. Moraes.
conference on high performance computing (supercomputing) | 2006
Kevin J. Bowers; Edmond Chow; Huafeng Xu; Ron O. Dror; Michael P. Eastwood; Brent A. Gregersen; John L. Klepeis; István Kolossváry; Mark A. Moraes; Federico D. Sacerdoti; John K. Salmon; Yibing Shan; David E. Shaw
Although molecular dynamics (MD) simulations of biomolecular systems often run for days to months, many events of great scientific interest and pharmaceutical relevance occur on long time scales that remain beyond reach. We present several new algorithms and implementation techniques that significantly accelerate parallel MD simulations compared with current state-of-the-art codes. These include a novel parallel decomposition method and message-passing techniques that reduce communication requirements, as well as novel communication primitives that further reduce communication time. We have also developed numerical techniques that maintain high accuracy while using single precision computation in order to exploit processor-level vector instructions. These methods are embodied in a newly developed MD code called Desmond that achieves unprecedented simulation throughput and parallel scalability on commodity clusters. Our results suggest that Desmonds parallel performance substantially surpasses that of any previously described code. For example, on a standard benchmark, Desmonds performance on a conventional Opteron cluster with 2K processors slightly exceeded the reported performance of IBMs Blue Gene/L machine with 32K processors running its Blue Matter MD code
ieee international conference on high performance computing data and analytics | 2009
David E. Shaw; Ron O. Dror; John K. Salmon; J. P. Grossman; Kenneth M. Mackenzie; Joseph A. Bank; Cliff Young; Martin M. Deneroff; Brannon Batson; Kevin J. Bowers; Edmond Chow; Michael P. Eastwood; Douglas J. Ierardi; John L. Klepeis; Jeffrey S. Kuskin; Richard H. Larson; Kresten Lindorff-Larsen; Paul Maragakis; Mark A. Moraes; Stefano Piana; Yibing Shan; Brian Towles
Anton is a recently completed special-purpose supercomputer designed for molecular dynamics (MD) simulations of biomolecular systems. The machines specialized hardware dramatically increases the speed of MD calculations, making possible for the first time the simulation of biological molecules at an atomic level of detail for periods on the order of a millisecond-about two orders of magnitude beyond the previous state of the art. Anton is now running simulations on a timescale at which many critically important, but poorly understood phenomena are known to occur, allowing the observation of aspects of protein dynamics that were previously inaccessible to both computational and experimental study. Here, we report Antons performance when executing actual MD simulations whose accuracy has been validated against both existing MD software and experimental observations. We also discuss the manner in which novel algorithms have been coordinated with Antons co-designed, application-specific hardware to achieve these results.
international symposium on computer architecture | 2007
David E. Shaw; Martin M. Deneroff; Ron O. Dror; Jeffrey S. Kuskin; Richard H. Larson; John K. Salmon; Cliff Young; Brannon Batson; Kevin J. Bowers; Jack C. Chao; Michael P. Eastwood; Joseph Gagliardo; J. P. Grossman; C. Richard Ho; Douglas J. Ierardi; István Kolossváry; John L. Klepeis; Timothy Layman; Christine McLeavey; Mark A. Moraes; Rolf Mueller; Edward C. Priest; Yibing Shan; Jochen Spengler; Michael Theobald; Brian Towles; Stanley C. Wang
The ability to perform long, accurate molecular dynamics (MD) simulations involving proteins and other biological macro-molecules could in principle provide answers to some of the most important currently outstanding questions in the fields of biology, chemistry and medicine. A wide range of biologically interesting phenomena, however, occur over time scales on the order of a millisecond--about three orders of magnitude beyond the duration of the longest current MD simulations. In this paper, we describe a massively parallel machine called Anton, which should be capable of executing millisecond-scale classical MD simulations of such biomolecular systems. The machine, which is scheduled for completion by the end of 2008, is based on 512 identical MD-specific ASICs that interact in a tightly coupled manner using a specialized high-speed communication network. Anton has been designed to use both novel parallel algorithms and special-purpose logic to dramatically accelerate those calculations that dominate the time required for a typical MD simulation. The remainder of the simulation algorithm is executed by a programmable portion of each chip that achieves a substantial degree of parallelism while preserving the flexibility necessary to accommodate anticipated advances in physical models and simulation methods.
ieee international conference on high performance computing data and analytics | 2011
John K. Salmon; Mark A. Moraes; Ron O. Dror; David E. Shaw
Most pseudorandom number generators (PRNGs) scale poorly to massively parallel high-performance computation because they are designed as sequentially dependent state transformations. We demonstrate that independent, keyed transformations of counters produce a large alternative class of PRNGs with excellent statistical properties (long period, no discernable structure or correlation). These counter-based PRNGs are ideally suited to modern multi- core CPUs, GPUs, clusters, and special-purpose hardware because they vectorize and parallelize well, and require little or no memory for state. We introduce several counter-based PRNGs: some based on cryptographic standards (AES, Threefish) and some completely new (Philox). All our PRNGs pass rigorous statistical tests (including TestUOls BigCrush) and produce at least 264 unique parallel streams of random numbers, each with period 2128 or more. In addition to essentially unlimited parallel scalability, our PRNGs offer excellent single-chip performance: Philox is faster than the CURAND library on a single NVIDIA GPU.
ieee international conference on high performance computing data and analytics | 2014
David E. Shaw; J. P. Grossman; Joseph A. Bank; Brannon Batson; J. Adam Butts; Jack C. Chao; Martin M. Deneroff; Ron O. Dror; Amos Even; Christopher H. Fenton; Anthony Forte; Joseph Gagliardo; Gennette Gill; Brian Greskamp; Richard C. Ho; Douglas J. Ierardi; Lev Iserovich; Jeffrey S. Kuskin; Richard H. Larson; Timothy Layman; Li-Siang Lee; Adam K. Lerer; Chester Li; Daniel Killebrew; Kenneth M. Mackenzie; Shark Yeuk-Hai Mok; Mark A. Moraes; Rolf Mueller; Lawrence J. Nociolo; Jon L. Peticolas
Anton 2 is a second-generation special-purpose supercomputer for molecular dynamics simulations that achieves significant gains in performance, programmability, and capacity compared to its predecessor, Anton 1. The architecture of Anton 2 is tailored for fine-grained event-driven operation, which improves performance by increasing the overlap of computation with communication, and also allows a wider range of algorithms to run efficiently, enabling many new software-based optimizations. A 512-node Anton 2 machine, currently in operation, is up to ten times faster than Anton 1 with the same number of nodes, greatly expanding the reach of all-atom bio molecular simulations. Anton 2 is the first platform to achieve simulation rates of multiple microseconds of physical time per day for systems with millions of atoms. Demonstrating strong scaling, the machine simulates a standard 23,558-atom benchmark system at a rate of 85 μs/day -- 180 times faster than any commodity hardware platform or general-purpose supercomputer.
ieee international conference on high performance computing data and analytics | 2010
Ron O. Dror; J.P. Grossman; Kenneth M. Mackenzie; Brian Towles; Edmond Chow; John K. Salmon; Cliff Young; Joseph A. Bank; Brannon Batson; Martin M. Deneroff; Jeffrey S. Kuskin; Richard H. Larson; Mark A. Moraes; David E. Shaw
Strong scaling of scientific applications on parallel architectures is increasingly limited by communication latency. This paper describes the techniques used to mitigate latency in Anton, a massively parallel special-purpose machine that accelerates molecular dynamics (MD) simulations by orders of magnitude compared with the previous state of the art. Achieving this speedup required a combination of hardware mechanisms and software constructs to reduce network latency, sender and receiver overhead, and synchronization costs. Key elements of Antons approach, in addition to tightly integrated communication hardware, include formulating data transfer in terms of counted remote writes, leveraging fine-grained communication, and establishing fixed, optimized communication patterns. Anton delivers software-to-software inter-node latency significantly lower than any other large-scale parallel machine, and the total critical-path communication time for an Anton MD simulation is less than 4% that of the next fastest MD platform.
international parallel and distributed processing symposium | 2013
Daniele Paolo Scarpazza; Douglas J. Ierardi; Adam K. Lerer; Kenneth M. Mackenzie; Albert C. Pan; Joseph A. Bank; Edmond Chow; Ron O. Dror; J. P. Grossman; Daniel Killebrew; Mark A. Moraes; Cristian Predescu; John K. Salmon; David E. Shaw
Special-purpose computing hardware can provide significantly better performance and power efficiency for certain applications than general-purpose processors. Even within a single application area, however, a special-purpose machine can be far more valuable if it is capable of efficiently supporting a number of different computational methods that, taken together, expand the machines functionality and range of applicability. We have previously described a massively parallel special-purpose supercomputer, called Anton, and have shown that it executes traditional molecular dynamics simulations orders of magnitude faster than the previous state of the art. Here, we describe how we extended Antons software to support a more diverse set of methods, allowing scientists to simulate a broader class of biological phenomena at extremely high speeds. Key elements of our approach, which exploits Antons tightly integrated hardwired pipelines and programmable cores, are applicable to the hardware and software design of various other specialized or heterogeneous parallel computing platforms.
IEEE Micro | 2011
Ron O. Dror; J. P. Grossman; Kenneth M. Mackenzie; Brian Towles; Edmond Chow; John K. Salmon; Cliff Young; Joseph A. Bank; Brannon Batson; Martin M. Deneroff; Jeffrey S. Kuskin; Richard H. Larson; Mark A. Moraes; David E. Shaw
Anton, a massively parallel special-purpose machine that accelerates molecular dynamics simulations by orders of magnitude, uses a combination of specialized hardware mechanisms and restructured software algorithms to reduce and hide communication latency. Anton delivers end-to-end internode latency significantly lower than any other large-scale parallel machine, and its critical-path communication time for molecular dynamics simulations is less than 3 percent that of the next-fastest platform.
Archive | 1996
David E. Shaw; Charles E. Ardai; Brian D. Marsh; Mark A. Moraes; Dana B. Rudolph; Jon D. Mc Auliffe
Archive | 1996
Jon D. McAuliffe; Brian D. Marsh; Mark A. Moraes