Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Sayantan Chakravorty is active.

Publication


Featured researches published by Sayantan Chakravorty.


Engineering With Computers | 2006

ParFUM: a parallel framework for unstructured meshes for scalable dynamic physics applications

Orion Sky Lawlor; Sayantan Chakravorty; Terry Wilmarth; Nilesh Choudhury; Isaac Dooley; Gengbin Zheng; Laxmikant V. Kalé

Unstructured meshes are used in many engineering applications with irregular domains, from elastic deformation problems to crack propagation to fluid flow. Because of their complexity and dynamic behavior, the development of scalable parallel software for these applications is challenging. The Charm++ Parallel Framework for Unstructured Meshes allows one to write parallel programs that operate on unstructured meshes with only minimal knowledge of parallel computing, while making it possible to achieve excellent scalability even for complex applications. Charm++’s message-driven model enables computation/communication overlap, while its run-time load balancing capabilities make it possible to react to the changes in computational load that occur in dynamic physics applications. The framework is highly flexible and has been enhanced with numerous capabilities for the manipulation of unstructured meshes, such as parallel mesh adaptivity and collision detection.


international parallel and distributed processing symposium | 2007

A Fault Tolerance Protocol with Fast Fault Recovery

Sayantan Chakravorty; Laxmikant V. Kalé

Fault tolerance is an important issue for large machines with tens or hundreds of thousands of processors. Checkpoint-based methods, currently used on most machines, rollback all processors to previous checkpoints after a crash. This wastes a significant amount of computation as all processors have to redo all the computation from that checkpoint onwards. In addition, recovery time is bound by the time between the last checkpoint and the crash. Protocols based on message logging avoid the problem of rolling back all processors to their earlier state. However, the recovery time of existing message logging protocols is no smaller than the time between the last checkpoint and crash. We present a fault tolerance protocol, in this paper, that provides fast restarts by using the ideas of message logging and object-based processor virtualization. We evaluate our implementation of the protocol in the Charm++/adaptive MPI runtime system. We show that our protocol provides fast restarts and, for many applications, has low fault-free overhead.


ieee international conference on high performance computing data and analytics | 2006

Proactive fault tolerance in MPI applications via task migration

Sayantan Chakravorty; Celso L. Mendes; Laxmikant V. Kalé

Failures are likely to be more frequent in systems with thousands of processors. Therefore, schemes for dealing with faults become increasingly important. In this paper, we present a fault tolerance solution for parallel applications that proactively migrates execution from processors where failure is imminent. Our approach assumes that some failures are predictable, and leverages the features in current hardware devices supporting early indication of faults. We use the concepts of processor virtualization and dynamic task migration, provided by Charm++ and Adaptive MPI (AMPI), to implement a mechanism that migrates tasks away from processors which are expected to fail. To demonstrate the feasibility of our approach, we present performance data from experiments with existing MPI applications. Our results show that proactive task migration is an effective technique to tolerate faults in MPI applications.


international parallel and distributed processing symposium | 2004

A fault tolerant protocol for massively parallel systems

Sayantan Chakravorty; Laxmikant V. Kalé

Summary form only given. As parallel machines grow larger, the mean time between failure shrinks. With the planned machines of near future, therefore, fault tolerance will become an important issue. The traditional method of dealing with faults is to checkpoint the entire application periodically and to start from the last checkpoint. However, such a strategy wastes resources by requiring all the processors to revert to an earlier state, whereas only one processor has lost its current state. We present a scheme for fault tolerance that aims at low overhead on the forward path (i.e. when there are no failures) and a fast recovery from faults, without wasting computation done by processors that have not faulted. The scheme does not require any individual component to be fault-free. We present the basic scheme and performance data on small clusters. Since it is based on Charm++ and Adaptive MPI, where each processor houses several virtual processors, the scheme has potential to reduce fault recovery time significantly, by migrating the recovering virtual processors.


Engineering With Computers | 2008

Parallel adaptive simulations of dynamic fracture events

Sandhya Mangala; Terry Wilmarth; Sayantan Chakravorty; Nilesh Choudhury; Laxmikant V. Kalé; Philippe H. Geubelle

Finite element simulations of dynamic fracture problems usually require very fine discretizations in the vicinity of the propagating stress waves and advancing crack fronts, while coarser meshes can be used in the remainder of the domain. This need for a constantly evolving discretization poses several challenges, especially when the simulation is performed on a parallel computing platform. To address this issue, we present a parallel computational framework developed specifically for unstructured meshes. This framework allows dynamic adaptive refinement and coarsening of finite element meshes and also performs load balancing between processors. We demonstrate the capability of this framework, called ParFUM, using two-dimensional structural dynamic problems involving the propagation of elastodynamic waves and the spontaneous initiation and propagation of cracks through a domain discretized with triangular finite elements.


ieee international conference on high performance computing data and analytics | 2006

Scalable cosmological simulations on parallel machines

Filippo Gioachin; Amit Sharma; Sayantan Chakravorty; Celso L. Mendes; Laxmikant V. Kalé; Thomas P. Quinn

Cosmological simulators are currently an important component in the study of the formation of galaxies and planetary systems. However, existing simulators do not scale effectively on more recent machines containing thousands of processors. In this paper, we introduce a new parallel simulator called ChaNGa (Charm N-body Gravity). This simulator is based on the Charm++ infrastructure, which provides a powerful runtime system that automatically maps computation to physical processors. Using Charm++ features, in particular its measurementbased load balancers, we were able to scale the gravitational force calculation of ChaNGa on up to one thousand processors, with astronomical datasets containing millions of particles. As we pursue the completion of a production version of the code, our current experimental results show that ChaNGa may become a powerful resource for the astronomy community.


Operating Systems Review | 2006

HPC-Colony: services and interfaces for very large systems

Sayantan Chakravorty; Celso L. Mendes; Laxmikant V. Kalé; Terry Jones; Andrew T. Tauferner; Todd A. Inglett; José E. Moreira

Traditional full-featured operating systems are known to have properties that limit the scalability of distributed memory parallel programs, the most common programming paradigm utilized in high end computing. Furthermore, as processor counts increase with the most capable systems, the necessary activity to manage the system becomes more of a burden. To make a general purpose operating system scale to such levels, new technology is required for parallel resource management and global system management (including fault management). In this paper, we describe the shortcomings of full-featured operating systems and runtime systems and discuss an approach to scale such systems to one hundred thousand processors with both scalable parallel application performance and efficient system management.


languages and compilers for parallel computing | 2008

A Case Study in Tightly Coupled Multi-paradigm Parallel Programming

Sayantan Chakravorty; Aaron T. Becker; Terry Wilmarth; Laxmikant V. Kalé

Programming paradigms are designed to express algorithms elegantly and efficiently. There are many parallel programming paradigms, each suited to a certain class of problems. Selecting the best parallel programming paradigm for a problem minimizes programming effort and maximizes performance. Given the increasing complexity of parallel applications, no one paradigm may be suitable for all components of an application. Today, most parallel scientific applications are programmed with a single paradigm and the challenge of multi-paradigm parallel programming remains unmet in the broader community. We believe that each component of a parallel program should be programmed using the most suitable paradigm. Furthermore, it is not sufficient to simply bolt modules together: programmers should be able to switch between paradigms easily, and resource management across paradigms should be automatic. We present a pre-existing adaptive runtime system (ARTS) and show how it can be used to meet these challenges by allowing the simultaneous use of multiple parallel programming paradigms and supporting resource management across all of them. We discuss the implementation of some common paradigms within the ARTS and demonstrate the use of multiple paradigms within our featurerich unstructured mesh framework. We show how this approach boosts performance and productivity for an application developed using this framework.


international conference on parallel processing | 2009

CkDirect: Unsynchronized One-Sided Communication in a Message-Driven Paradigm

Eric J. Bohm; Sayantan Chakravorty; Pritish Jetely; Abhinav Bhatele; Laxmikant V. Kalé

A significant fraction of parallel scientific codes are iterative with barriers between iterations or even between phases of the same iteration. The sender of a message is assured that the receiver is executing exactly the same iteration or phase. This opens up the opportunity to use one-sided communication without synchronization, explicit or implicit, between the sender and receiver of every message. The synchronization inherent in the application is sufficient to ensure correctness. We present CkDirect, an interface for such one-sided communication in the message driven Charm++ runtime system. CkDirect helps avoid unnecessary synchronization and message copying as well as scheduling overhead in iterative scientific codes. We describe the interface as well as its implementations on two different interconnects: Infiniband and Blue Gene/P. We evaluate CkDirect through a micro-benchmark, two simple scientific codes: stencil computation and matrix multiplication, as well as a full fledged quantum chemistry application called OpenAtom.


Archive | 2004

Proactive Fault Tolerance in Large Systems

Sayantan Chakravorty; Celso L. Mendes; Laxmikant V. Kalé

Collaboration


Dive into the Sayantan Chakravorty's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar

Abhinav Bhatele

Lawrence Livermore National Laboratory

View shared research outputs
Top Co-Authors

Avatar

Terry Jones

Oak Ridge National Laboratory

View shared research outputs
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge