Dinesh Subhraveti | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Dinesh Subhraveti is active.

Explore More

Publication

Featured researches published by Dinesh Subhraveti.

high performance distributed computing | 2011

VMFlock: virtual machine co-migration for the cloud

Samer Al-Kiswany; Dinesh Subhraveti; Prasenjit Sarkar; Matei Ripeanu

This paper presents VMFlockMS, a migration service optimized for cross-datacenter transfer and instantiation of groups of virtual machine (VM) images that comprise an application-level solution (e.g., a three-tier web application). We dub these groups of related VM images VMFlocks. VMFlockMS employs two main techniques: first, data deduplication within the VMFlock to be migrated and between the VMFlock and the data already present at the destination datacenter, and, second, accelerated instantiation of the application at the target datacenter after transferring only a partial set of data blocks and prioritization of the remaining data based on previously observed access patterns originating from the running VMs. VMFlockMS is designed to be deployed as a set of virtual appliances which make efficient use of the available cloud resources to locally access and deduplicate the images and data in a distributed fashion with minimal requirements imposed on the cloud API to access the VM image repository. VMFlockMS provides an incrementally scalable and high-performance migration service. Our evaluation shows that VMFlockMS can reduce the data volumes to be transferred over the network to as low as 3% of the original VMFlock size, enables the complete transfer of the VM images belonging to a VMFlock over transcontinental link up to 3.5x faster than alternative approaches, and enables booting these VM images with as little as 5% of the compressed VMFlock data available at the destination.

measurement and modeling of computer systems | 2011

Record and transplay: partial checkpointing for replay debugging across heterogeneous systems

Dinesh Subhraveti; Jason Nieh

Software bugs that occur in production are often difficult to reproduce in the lab due to subtle differences in the application environment and nondeterminism. To address this problem, we present Transplay, a system that captures production software bugs into small per-bug recordings which are used to reproduce the bugs on a completely different operating system without access to any of the original software used in the production environment. Transplay introduces partial checkpointing, a new mechanism that efficiently captures the partial state necessary to reexecute just the last few moments of the application before it encountered a failure. The recorded state, which typically consists of a few megabytes of data, is used to replay the application without requiring the specific application binaries, libraries, support data, or the original execution environment. Transplay integrates with existing debuggers to provide standard debugging facilities to allow the user to examine the contents of variables and other program state at each source line of the applications replayed execution. We have implemented a Transplay prototype that can record unmodified Linux applications and replay them on different versions of Linux as well as Windows. Experiments with several applications including Apache and MySQL show that Transplay can reproduce real bugs and be used in production with modest recording overhead.

high performance distributed computing | 2012

CAM: a topology aware minimum cost flow based resource manager for MapReduce applications in the cloud

Min Li; Dinesh Subhraveti; Ali Raza Butt; Aleksandr Khasymski; Prasenjit Sarkar

MapReduce has emerged as a prevailing distributed computation paradigm for enterprise and large-scale data-intensive computing. The model is also increasingly used in the massively-parallel cloud environment, where MapReduce jobs are run on a set of virtual machines (VMs) on pay-as-needed basis. However, MapReduce jobs suffer from performance degradation when running in the cloud due to inefficient resource allocation. In particular, the MapReduce model is designed for and leverages information from the native clusters to operate efficiently, whereas the cloud presents a virtual cluster topology overlying or hiding actual network information. This results in two placement anomalies: loss of data locality and loss of job locality, where jobs are placed physically away from their data or other associated jobs, adversely affecting their performance. In this paper we propose, CAM, a cloud platform that provides an innovative resource scheduler particularly designed for hosting MapReduce applications in the cloud. CAM reconciles both data and VM resource allocation with a variety of competing constraints, such as storage utilization, changing CPU load and network link capacities. CAM uses a flow-network-based algorithm that is able to optimize MapReduce performance under the specified constraints -- not only by initial placement, but by readjusting through VM and data migration as well. Additionally, our platform exposes, otherwise hidden, lower-level topology information to the MapReduce job scheduler so that it makes optimal task assignments. Evaluation of CAM using both micro-benchmarks and simulations on a 23 VM cluster shows that compared to a state-of-the-art resource allocator, our system reduces network traffic and average MapReduce job execution time by a factor of 3 and 8.6, respectively.

Ibm Journal of Research and Development | 2011

GPFS-SNC: an enterprise storage framework for virtual-machine clouds

Karan Gupta; Reshu Jain; Ioannis Koltsidas; Himabindu Pucha; Prasenjit Sarkar; Mark James Seaman; Dinesh Subhraveti

In a typical cloud computing environment, the users are provided with storage and compute capacity in the form of virtual machines. The underlying infrastructure for these services typically comprises large distributed clusters of commodity machines and direct-attached storage in concert with a server virtualization layer. The focus of this paper is on an enterprise storage framework that supports the timely and resource-efficient deployment of virtual machines in such a cloud environment. The proposed framework makes use of innovations in the General Parallel File System-Shared Nothing Clusters (GPFS®-SNC) file system, supports optimal allocation of resources to virtual machines in a hypervisor-agnostic fashion, achieves low latency when provisioning for new virtual machines, and adapts to the input-output needs of each virtual-machine instance in order to achieve high performance for all types of applications.

international conference on distributed computing systems | 2007

Fault Tolerance in Multiprocessor Systems Via Application Cloning

Philippe Bergheaud; Dinesh Subhraveti; Marc Vertes

Record and replay (RR) is a software based state replication solution designed to support recording and subsequent replay of the execution of unmodified applications running on multiprocessor systems for fault-tolerance. Multiple instances of the application are simultaneously executed in separate virtualized environments called containers. Containers facilitate state replication between the application instances by resolving the resource conflicts and providing a uniform view of the underlying operating system across all clones. The virtualization layer that creates the container abstraction actively monitors the primary instance of the application and synchronizes its state with that of the clones by transferring the necessary information to enforce identical state among them. In particular, we address the replication of relevant operating system state, such as network state to preserve network connections across failures, and the state that results from nondeterministic interleaved accesses to shared memory in SMP systems. We have implemented RRs state replication mechanisms in the Linux operating system by making novel use of existing features on the Intel and PowerPC architectures.

Ibm Journal of Research and Development | 2013

GPFS-SNC: an enterprise cluster file system for big data

Reshu Jain; Prasenjit Sarkar; Dinesh Subhraveti

A new class of data-intensive applications commonly referred to as Big Data applications (e.g., customer sentiment analysis based on click-stream logs) involves processing massive amounts of data with a focus on semantically transforming the data. This class of applications is massively parallel and well suited for the MapReduce programming framework that allows users to perform large-scale data analyses such that the application execution layer handles the system architecture, data partitioning, and task scheduling. In this paper, we introduce GPFS-SNC (General Parallel File System for Shared Nothing Clusters), a scalable file system that operates over a cluster of commodity machines and direct-attached storage and meets the requirements of analytics and traditional applications that are typically used together in analytics solutions. The architecture extends an existing enterprise cluster file system to support these emerging classes of workloads by applying five innovative optimizations: 1) locality awareness to allow compute jobs to be scheduled on nodes where the data resides, 2) metablocks that allow large and small block sizes to co-exist in the same file system to meet the needs of different types of applications, 3) write affinity that allows applications to dictate the layout of files on different nodes in order to maximize both write and read bandwidth, 4) pipelined replication to maximize use of network bandwidth for data replication, and 5) distributed recovery to minimize the effect of failures on ongoing computation.

international conference on distributed computing systems | 2009

CARP: Handling Silent Data Errors and Site Failures in an Integrated Program and Storage Replication Mechanism

Lanyue Lu; Prasenjit Sarkar; Dinesh Subhraveti; Soumitra Sarkar; Mark James Seaman; Reshu Jain; Ahmed Mohammad Bashir

This paper presents CARP, an integrated program and storage replication solution. CARP extends program replication systems which do not currently address storage errors, builds upon a record-and-replay scheme that handles nondeterminism in program execution, and uses a scheme based on recorded program state and I/O logs to enable efficient detection of silent data errors and efficient recovery from such errors. CARP is designed to be transparent to applications with minimal run-time impact and is general enough to be implemented on commodity machines. We implemented CARP as a prototype on the Linux operating system and conducted extensive sensitivity analysis of its overhead with different application profiles and system parameters. In particular, we evaluated CARP with standard unmodified email, database, and web server benchmarks and showed that it imposes acceptable overhead while providing sub-second program state recovery times on detecting a silent data error.

Archive | 2009

Record and Transplay: Partial Checkpointing for Replay Debugging

Dinesh Subhraveti; Jason Nieh

Software bugs that occur in production are often difficult to reproduce in the lab due to subtle differences in the application environment and nondeterminism. Toward addressing this problem, we present Transplay, a system that captures application software bugs as they occur in production and deterministically reproduces them in a completely different environment, potentially running a different operating system, where the application, its binaries and other support data do not exist. Transplay introduces partial checkpointing, a new mechanism that provides two key properties. It efficiently captures the minimal state necessary to reexecute just the last few moments of the application before it encountered a failure. The recorded state, which typically consists of a few megabytes of data, is used to replay the application without requiring the specific application binaries or the original execution environment. Transplay integrates with existing debuggers to provide facilities such as breakpoints and single-stepping to allow the user to examine the contents of variables and other program state at each source line of the application’s replayed execution. We have implemented a Transplay prototype that can record unmodified Linux applications and replay them on different versions of Linux as well as Windows. Experiments with server applications such as the Apache web server show that Transplay can be used in production with modest recording overhead.

Archive | 2007