Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Aleksandr Rayshubskiy is active.

Publication


Featured researches published by Aleksandr Rayshubskiy.


Ibm Journal of Research and Development | 2005

Scalable framework for 3D FFTs on the Blue Gene/L supercomputer: implementation and early performance measurements

Maria Eleftheriou; Blake G. Fitch; Aleksandr Rayshubskiy; T. J. C. Ward; Robert S. Germain

This paper presents results on a communications-intensive kernel, the three-dimensional fast Fourier transform (3D FFT), running on the 2,048-node Blue Gene®/L (BG/L) prototype. Two implementations of the volumetric FFT algorithm were characterized, one built on the Message Passing Interface library and another built on an active packet Application Program Interface supported by the hardware bring-up environment, the BG/L advanced diagnostics environment. Preliminary performance experiments on the BG/L prototype indicate that both of our implementations scale well up to 1,024 nodes for 3D FFTs of size 128 × 128 × 128. The performance of the volumetric FFT is also compared with that of the Fastest Fourier Transform in the West (FFTW) library. In general, the volumetric FFT outperforms a port of the FFTW Version 2.1.5 library on large-node-count partitions.


conference on high performance computing (supercomputing) | 2006

Blue matter: approaching the limits of concurrency for classical molecular dynamics

Blake G. Fitch; Aleksandr Rayshubskiy; Maria Eleftheriou; T. J. Christopher Ward; Mark E. Giampapa; Michael C. Pitman; Robert S. Germain

This paper describes a novel spatial-force decomposition for N-body simulations for which we observe O(sqrt(p)) communication scaling. This has enabled Blue Matter to approach the effective limits of concurrency for molecular dynamics using particle-mesh (FFT-based) methods for handling electrostatic interactions. Using this decomposition, Blue Matter running on Blue Gene/L has achieved simulation rates in excess of 1000 time steps per second and demonstrated significant speed-ups to O(1) atom per node. Blue Matter employs a communicating sequential process (CSP) style model with application communication state machines compiled to hardware interfaces. The scalability achieved has enabled methodologically rigorous biomolecular simulations on biologically interesting systems, such as membrane-bound proteins, whose time scales dwarf previous work on those systems. Major scaling improvements require exploration of alternative algorithms for treating the long range electrostatics


international conference on computational science | 2006

Blue matter: strong scaling of molecular dynamics on blue gene/l

Blake G. Fitch; Aleksandr Rayshubskiy; Maria Eleftheriou; T. J. Christopher Ward; Mark E. Giampapa; Yuriy Zhestkov; Michael C. Pitman; Frank Suits; Alan Grossfield; Jed W. Pitera; William C. Swope; Ruhong Zhou; Scott E. Feller; Robert S. Germain

This paper presents strong scaling performance data for the Blue Matter molecular dynamics framework using a novel n-body spatial decomposition and a collective communications technique implemented on both MPI and low level hardware interfaces. Using Blue Matter on Blue Gene/L, we have measured scalability through 16,384 nodes with measured time per time-step of under 2.3 milliseconds for a 43,222 atom protein/lipid system. This is equivalent to a simulation rate of over 76 nanoseconds per day and represents an unprecedented time-to-solution for biomolecular simulation as well as continued speed-up to fewer than three atoms per node. On a smaller, solvated lipid system with 13,758 atoms, we have achieved continued speedups through fewer than one atom per node and less than 2 milliseconds/time-step. On a 92,224 atom system, we have achieved floating point performance of over 1.8 TeraFlops/second on 16,384 nodes. Strong scaling of fixed-size classical molecular dynamics of biological systems to large numbers of nodes is necessary to extend the simulation time to the scale required to make contact with experimental data and derive biologically relevant insights.


european conference on parallel processing | 2005

Performance measurements of the 3D FFT on the blue gene/l supercomputer

Maria Eleftheriou; Blake G. Fitch; Aleksandr Rayshubskiy; T. J. Christopher Ward; Robert S. Germain

This paper presents performance characteristics of a communications-intensive kernel, the complex data 3D FFT, running on the Blue Gene/L architecture. Two implementations of the volumetric FFT algorithm were characterized, one built on the MPI library using an optimized collective all-to-all operation [2] and another built on a low-level System Programming Interface (SPI) of the Blue Gene/L Advanced Diagnostics Environment (BG/L ADE) [17]. We compare the current results to those obtained using a reference MPI implementation (MPICH2 ported to BG/L with unoptimized collectives) and to a port of version 2.1.5 the FFTW library [14]. Performance experiments on the Blue Gene/L prototype indicate that both of our implementations scale well and the current MPI-based implementation shows a speedup of 730 on 2048 nodes for 3D FFTs of size 128 × 128 × 128. Moreover, the volumetric FFT outperforms FFTW port by a factor 8 for a 128× 128× 128 complex FFT on 2048 nodes.


Ibm Journal of Research and Development | 2005

Early performance data on the blue matter molecular simulation framework

Robert S. Germain; Yuriy Zhestkov; Maria Eleftheriou; Aleksandr Rayshubskiy; Frank Suits; T. J. C. Ward; Blake G. Fitch

Blue Matter is the application framework being developed in conjunction with the scientific portion of the IBM Blue Gene® project. We describe the parallel decomposition currently being used to target the Blue Gene/L machine and discuss the application-based trace tools used to analyze the performance of the application. We also present the results of early performance studies, including a comparison of the performance of the Ewald and the particle-particle particle-mesh (P3ME) methods, compare the measured performance of some key collective operations with the limitations imposed by the hardware, and discuss some future directions for research.


petascale data storage workshop | 2009

Using the Active Storage Fabrics model to address petascale storage challenges

Blake G. Fitch; Aleksandr Rayshubskiy; Michael C. Pitman; T. J. Christopher Ward; Robert S. Germain

We present the Active Storage Fabrics (ASF) model for storage embedded parallel processing as a way to address petascale data intensive challenges. ASF is aimed at emerging scalable system-on-a-chip, storage class memory architectures, but may be realized in prototype form on current parallel systems. ASF can be used to transparently accelerate host workloads by close integration at the middleware data/storage boundary or directly by data intensive applications. We provide an overview of the major components involved in accelerating a parallel file system and a relational database management system, describe some early results, and outline our current research directions.


Ibm Journal of Research and Development | 2008

Blue matter: scaling of N-body simulations to one atom per node

Blake G. Fitch; Aleksandr Rayshubskiy; Maria Eleftheriou; T. J. C. Ward; Mark E. Giampapa; Mike Pitman; Jed W. Pitera; William C. Swope; Robert S. Germain

N-body simulations present some of the most interesting challenges in the area of massively parallel computing, especially when the object is to improve the time to solution for a fixed-size problem. The Blue Matter molecular simulation framework was developed specifically to address these challenges, to explore programming models for massively parallel machine architectures in a concrete context, and to support the scientific goals of the IBM Blue Gene® Project. This paper reviews the key issues involved in achieving ultrastrong scaling of methodologically correct biomolecular simulations, particularly the treatment of the long-range electrostatic forces present in simulations of proteins in water and membranes. Blue Matter computes these forces using the particle-particle particle-mesh Ewald (P3ME) method, which breaks the problem up into two pieces, one that requires the use of three-dimensional fast Fourier transforms with global data dependencies and another that involves computing interactions between pairs of particles within a cutoff distance. We summarize our exploration of the parallel decompositions used to compute these finite-ranged interactions, describe some of the implementation details involved in these decompositions, and present the evolution of strong-scaling performance achieved over the course of this exploration, along with evidence for the quality of simulation achieved.


international conference of the ieee engineering in medicine and biology society | 2009

Strong scaling and speedup to 16,384 processors in cardiac electro — Mechanical simulations

Matthias Reumann; Blake G. Fitch; Aleksandr Rayshubskiy; David U. J. Keller; Gunnar Seemann; Olaf Dössel; Michael C. Pitman; John Rice

High performance computing is required to make feasible simulations of whole organ models of the heart with biophysically detailed cellular models in a clinical setting. Increasing model detail by simulating electrophysiology and mechanical models increases computation demands. We present scaling results of an electro — mechanical cardiac model of two ventricles and compare them to our previously published results using an electrophysiological model only. The anatomical data-set was given by both ventricles of the Visible Female data-set in a 0.2 mm resolution. Fiber orientation was included. Data decomposition for the distribution onto the distributed memory system was carried out by orthogonal recursive bisection. Load weight ratios for non — tissue vs. tissue elements used in the data decomposition were 1:1, 1:2, 1:5, 1:10, 1:25, 1:38.85, 1:50 and 1:100. The ten Tusscher et al. (2004) electrophysiological cell model was used and the Rice et al. (1999) model for the computation of the calcium transient dependent force. Scaling results for 512, 1024, 2048, 4096, 8192 and 16,384 processors were obtained for 1 ms simulation time. The simulations were carried out on an IBM Blue Gene/L supercomputer. The results show linear scaling from 512 to 16,384 processors with speedup factors between 1.82 and 2.14 between partitions. The most optimal load ratio was 1:25 for on all partitions. However, a shift towards load ratios with higher weight for the tissue elements can be recognized as can be expected when adding computational complexity to the model while keeping the same communication setup. This work demonstrates that it is potentially possible to run simulations of 0.5 s using the presented electro-mechanical cardiac model within 1.5 hours.


Ibm Journal of Research and Development | 2005

Custom math functions for molecular dynamics

Robert F. Enenkel; Blake G. Fitch; Robert S. Germain; Fred G. Gustavson; Andrew K. Martin; Mark P. Mendell; Jed W. Pitera; Mike Pitman; Aleksandr Rayshubskiy; Frank Suits; William C. Swope; T. J. C. Ward

While developing the protein folding application for the IBM Blue Gene®/L supercomputer, some frequently executed computational kernels were encountered. These were significantly more complex than the linear algebra kernels that are normally provided as tuned libraries with modern machines. Using regular library functions for these would have resulted in an application that exploited only 5-10% of the potential floating-point throughput of the machine. This paper is a tour of the functions encountered; they have been expressed in C++ (and could be expressed in other languages such as Fortran or C). With the help of a good optimizing compiler, floating-point efficiency is much closer to 100%. The protein folding application was initially run by the life science researchers on IBM POWER3™ machines while the computer science researchers were designing and bringing up the Blue Gene/L hardware. Some of the work discussed resulted in enhanced compiler optimizations, which now improve the performance of floating-point-intensive applications compiled by the IBM VisualAge® series of compilers for POWER3, POWER4™, POWER4+™, and POWER5™. The implementations are offered in the hope that they may help in other implementations of molecular dynamics or in other fields of endeavor, and in the hope that others may adapt the ideas presented here to deliver additional mathematical functions at high throughput.


international symposium on biomedical imaging | 2012

Robust registration of multispectral images of the cortical surface in neurosurgery

Danail Stoyanov; Aleksandr Rayshubskiy; Elizabeth M. C. Hillman

Optical multispectral imaging during open neurosurgery can provide functional information equivalent to fMRI as well as information about blood flow and oxygenation dynamics. To perform multispectral analysis images acquired sequentially under different illumination wavelengths must be coregistered such that information about the same surface region can be obtained throughout the imaging time period. In this paper, we present a feature driven registration technique that can register multispectral images and is robust to specular reflections and to the variable appearance of the brains surface under different illumination conditions. We present preliminary results on data from two patients.

Researchain Logo
Decentralizing Knowledge