Rahul S. Sampath
Oak Ridge National Laboratory
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Rahul S. Sampath.
ieee international conference on high performance computing data and analytics | 2010
Abtin Rahimian; Ilya Lashuk; Shravan Veerapaneni; Aparna Chandramowlishwaran; Dhairya Malhotra; Logan Moon; Rahul S. Sampath; Aashay Shringarpure; Jeffrey S. Vetter; Richard W. Vuduc; Denis Zorin; George Biros
We present a fast, petaflop-scalable algorithm for Stokesian particulate flows. Our goal is the direct simulation of blood, which we model as a mixture of a Stokesian fluid (plasma) and red blood cells (RBCs). Directly simulating blood is a challenging multiscale, multiphysics problem. We report simulations with up to 200 million deformable RBCs. The largest simulation amounts to 90 billion unknowns in space. In terms of the number of cells, we improve the state-of-the art by several orders of magnitude: the previous largest simulation, at the same physical fidelity as ours, resolved the flow of O(1,000-10,000) RBCs. Our approach has three distinct characteristics: (1) we faithfully represent the physics of RBCs by using nonlinear solid mechanics to capture the deformations of each cell; (2) we accurately resolve the long-range, N-body, hydrodynamic interactions between RBCs (which are caused by the surrounding plasma); and (3) we allow for the highly non-uniform distribution of RBCs in space. The new method has been implemented in the software library MOBO (for “Moving Boundaries”). We designed MOBO to support parallelism at all levels, including inter-node distributed memory parallelism, intra-node shared memory parallelism, data parallelism (vectorization), and fine-grained multithreading for GPUs. We have implemented and optimized the majority of the computation kernels on both Intel/AMD x86 and NVidias Tesla/Fermi platforms for single and double floating point precision. Overall, the code has scaled on 256 CPU-GPUs on the Teragrids Lincoln cluster and on 200,000 AMD cores of the Oak Ridge national Laboratorys Jaguar PF system. In our largest simulation, we have achieved 0.7 Petaflops/s of sustained performance on Jaguar.
SIAM Journal on Scientific Computing | 2008
Hari Sundar; Rahul S. Sampath; George Biros
In this article, we propose new parallel algorithms for the construction and 2:1 balance refinement of large linear octrees on distributed memory machines. Such octrees are used in many problems in computational science and engineering, e.g., object representation, image analysis, unstructured meshing, finite elements, adaptive mesh refinement, and N-body simulations. Fixed-size scalability and isogranular analysis of the algorithms using an MPI-based parallel implementation was performed on a variety of input data and demonstrated good scalability for different processor counts (1 to 1024 processors) on the Pittsburgh Supercomputing Centers TCS-1 AlphaServer. The results are consistent for different data distributions. Octrees with over a billion octants were constructed and balanced in less than a minute on 1024 processors. Like other existing algorithms for constructing and balancing octrees, our algorithms have
SIAM Journal on Scientific Computing | 2010
Rahul S. Sampath; George Biros
\mathcal{O}(N\log N)
conference on high performance computing (supercomputing) | 2007
Hari Sundar; Rahul S. Sampath; Santi S. Adavani; Christos Davatzikos; George Biros
work and
ieee international conference on high performance computing data and analytics | 2008
Rahul S. Sampath; Santi S. Adavani; Hari Sundar; Ilya Lashuk; George Biros
\mathcal{O}(N)
ieee international conference on high performance computing data and analytics | 2010
Rahul S. Sampath; Hari Sundar; Shravan Veerapaneni
storage complexity. Under reasonable assumptions on the distribution of octants and the work per octant, the parallel time complexity is
international conference on cluster computing | 2013
Manjunath Gorentla Venkata; Pavel Shamis; Rahul S. Sampath; Richard L. Graham; Joshua S. Ladd
\mathcal{O}(\frac{N}{n_p}\log(\frac{N}{n_p})+n_p\log n_p)
Journal of Computational Physics | 2015
Bobby Philip; M. Berrill; Srikanth Allu; Steven P. Hamilton; Rahul S. Sampath; Kevin T. Clarno; Gary A. Dilts
, where
Nuclear Science and Engineering | 2014
Aaron M. Phillippe; James E Banfield; Kevin T. Clarno; Larry J. Ott; Bobby Philip; M. Berrill; Rahul S. Sampath; Srikanth Allu; Steven P. Hamilton
N
Archive | 2011
M. Berrill; Bobby Philip; Rahul S. Sampath; Srikanth Allu; Pallab Barai; Bill Cochran; Kevin T. Clarno; Gary A. Dilts
is the size of the final linear octree and