Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Sandip Agarwala is active.

Publication


Featured researches published by Sandip Agarwala.


dependable systems and networks | 2007

E2EProf: Automated End-to-End Performance Management for Enterprise Systems

Sandip Agarwala; Fernando Alegre; Karsten Schwan; Jegannathan Mehalingham

Distributed systems are becoming increasingly complex, caused by the prevalent use of Web services, multi-tier architectures, and grid computing, where dynamic sets of components interact with each other across distributed and heterogeneous computing infrastructures. For these applications to be able to predictably and efficiently deliver services to end users, it is therefore, critical to understand and control their runtime behavior. In a datacenter environment, for instance, understanding the end-to-end dynamic behavior of certain IT subsystems, from the time requests are made to when responses are generated and finally, received, is a key prerequisite for improving application response, to provide required levels of performance, or to meet service level agreements (SLAs). The E2EProf toolkit enables the efficient and nonintrusive capture and analysis of end-to-end program behavior for complex enterprise applications. E2EProf permits an enterprise to recognize and analyze performance problems when they occur - online, to take corrective actions as soon as possible and wherever necessary along the paths currently taken by user requests - end-to-end, and to do so without the need to instrument applications - nonintrusively. Online analysis exploits a novel signal analysis algorithm, termed pathmap, which dynamically detects the causal paths taken by client requests through application and backend servers and annotates these paths with end-to-end latencies and with the contributions to these latencies from different path components. Thus, with pathmap, it is possible to dynamically identify the bottlenecks present in selected servers or services and to detect the abnormal or unusual performance behaviors indicative of potential problems or overloads. Pathmap and the E2EProf toolkit successfully detect causal request paths and associated performance bottlenecks in the RUBiS ebay-like multi-tier Web application and in one of the datacenter of our industry partner, Delta Air Lines.


international conference on distributed computing systems | 2006

SysProf: Online Distributed Behavior Diagnosis through Fine-grain System Monitoring

Sandip Agarwala; Karsten Schwan

Runtime monitoring is key to the effective management of enterprise and high performance applications. To deal with the complex behaviors of today’s multi-tier applications running across shared platforms, such monitoring must meet three criteria: (1) fine granularity, including being able to track the resource usage of specific application behaviors like individual client-server interactions, (2) real-time response, referring to the monitoring system’s ability to both capture and analyze currently needed monitoring information with the delays required for online management, and (3) enterprise-wide operation, which means that the monitoring information captured and analyzed must span across the entire software stack and set of machines involved in request generation, request forwarding, service provision, and return. This paper presents the SysProf system-level monitoring toolkit, which provides a flexible, low overhead framework for enterprise-wide monitoring. The toolkit permits the capture of monitoring information at different levels of granularity, ranging from tracking the system-level activities triggered by a single system call, to capturing the client-server interactions associated with certain request classes, to characterizing the server resources consumed by sets of clients or client behaviors. The paper demonstrates the efficacy of SysProf by using it to manage two different enterprise applications: (1) detecting performance bottlenecks in a high performance shared network file service, and (2) enforcing service level agreements in a multi-tier auctioning web site.


international conference on autonomic computing | 2006

QMON: QoS- and Utility-Aware Monitoring in Enterprise Systems

Sandip Agarwala; Yuan Chen; Dejan S. Milojicic; Karsten Schwan

The scale, reliability and cost requirements of enterprise data centers require automation of center management. Examples include provisioning, scheduling, capacity planning, logging and auditing. A key component of such automation functions is online monitoring. In contrast to monitoring systems designed for human users, a particular concern for online enterprise monitoring is Quality of Service (QoS). Since breaking service level agreements (SLAs) has direct financial and legal implications, enterprise monitoring must be conducted so as to maintain SLAs. This includes the ability to differentiate the QoS of monitoring itself for different classes of users or more generally, for software components subject to different SLAs. Thus, without embedding notions of QoS into the monitoring systems used in next generation data centers, it will not be possible to accomplish the desired automation of their operation. This paper both demonstrates the importance of QoS in monitoring and it presents a QoS-capable monitoring system, termed QMON. QMON supports utility-aware monitoring while also able to differentiate between different classes of monitoring, corresponding to classes of SLAs. The implementation of QMON offers high levels of predictability for service delivery (i.e., predictable performance) and it is dynamically configurable to deal with changes in enterprise needs or variations in services and applications. We demonstrate the importance of QoS in monitoring and the QoS capabilities of QMON in a series of case studies and experiments, using a multi-tier web service benchmark.


high performance distributed computing | 2003

Resource-aware stream management with the customizable dproc distributed monitoring mechanisms

Sandip Agarwala; Christian Poellabauer; Jiantao Kong; Karsten Schwan; Matthew Wolf

Monitoring the resources of distributed systems is essential to the successful deployment and execution of grid applications, particularly when such applications have well-defined QoS requirements. The dproc system-level monitoring mechanisms implemented for standard Linux kernels have several key components. First, utilizing the familiar /proc filesystem, dproc extends this interface with resource information collected from both local and remote hosts. Second, to predictably capture and distribute monitoring information, dproc uses a kernel-level group communication facility, termed KECho, which is based on events and event channels. Third and the focus of this paper is dprocs run-time customizability for resource monitoring, which includes the generation and deployment of monitoring functionality within remote operating system kernels. Using dproc, we show that: (a) data streams can be customized according to a clients resource availabilities (dynamic stream management); (b) by dynamically varying distributed monitoring (dynamic filtering of monitoring information), appropriate balance can be maintained between monitoring overheads and application quality; and (c) by performing monitoring at kernel-level, the information captured enables decision making that takes into account the multiple resources used by applications.


autonomic computing workshop | 2003

Service morphing: integrated system- and application-level service adaptation in autonomic systems

Christian Poellabauer; Karsten Schwan; Sandip Agarwala; Ada Gavrilovska; Greg Eisenhauer; Santosh Pande; Calton Pu; Matthew Wolf

Service morphing is a set of techniques used to continuously meet an applications quality of service (QoS) needs, in the presence of run-time variations in service locations, platform capabilities, or end-user needs. These techniques provide high levels of flexibility in how, when, and where necessary processing and communication actions are performed. Lightweight middleware supports flexibility by permitting end-users to subscribe to information channels of interest to them whenever they desire, and then apply exactly the processing to such information they require. New compiler and binary code generation techniques dynamically generate, deploy, and specialize code in order to match current user needs to available platform resources. Finally, to deal with run-time changes in resource availability, kernel-level resource management mechanisms are associated with user-level middleware. Such associations range from loosely coupled, where kernel-level resource management monitors and occasionally responds to userlevel events, to tightly coupled, where kernel-level mechanisms import, export, and use performance and control attributes in conjunction with each resource-relevant userlevel event.


self adaptive and self organizing systems | 2007

e-SAFE: An Extensible, Secure and Fault Tolerant Storage System

Sandip Agarwala; Arnab Paul; Karsten Schwan

This paper describes e-SAFE , a scalable utility-driven distributed storage system that offers very high availability at an archival scale and reduces management overhead such as periodic repairs. e-SAFE is designed to provide a storage utility for environments such as large-scale data centers in enterprise networks where the servers experience temporary unavailability (possibly high load, temporary downtimes due to repair or software/hardware upgrades). e-SAFE is based on a simple principle: efficiently sprinkle data all over a distributed storage and robustly reconstruct even when many of them are unavailable. e-SAFE also provides strong guarantee on data-integrity. The use of Fountain codes for replicating file data blocks, an efficient algorithm for fast parallel encoding and decoding over multiple file segments, a utility module for service differentiation and auto-adjustments of design parameters, and a background replication mechanism hiding the cost of replication and dissemination from the user, provide a fast, durable and autonomous storage solution.


Journal of Grid Computing | 2003

System-Level Resource Monitoring in High-Performance Computing Environments

Sandip Agarwala; Christian Poellabauer; Jiantao Kong; Karsten Schwan; Matthew Wolf

Low-overhead resource monitoring is key to the successful management of distributed high-performance computing environments, particularly when applications have well-defined quality of service (QoS) requirements. The dproc system-level monitoring mechanisms provide tools both for efficiently monitoring system-level events and for notifying remote hosts of events relevant to their operation. Implemented as extension to the Linux kernel, dproc provides several key functions. First, utilizing the familiar /proc virtual filesystem, dproc extends this interface with resource information collected from both local and remote hosts. Second, to predictably capture and distribute monitoring information, dproc uses a kernel-level group communication facility, termed KECho, which implements events and event channels. Third, and the focus of this paper, is dprocs run-time customizability for resource monitoring, which includes the generation and deployment of monitoring functionality within remote operating system kernels. Using dproc, we show that (a) data streams can be customized according to a clients resource availabilities (dynamic stream management), (b) by dynamically varying distributed monitoring (dynamic filtering of monitoring information), an appropriate balance can be maintained between monitoring overheads and application quality, and (c) by performing monitoring at kernel-level, the information captured enables decision making that takes into account the multiple resources used by applications.


international conference on distributed computing systems | 2005

Lightweight Morphing Support for Evolving Middleware Data Exchanges in Distributed Applications

Sandip Agarwala; Greg Eisenhauer; Karsten Schwan

Most systems must evolve as their missions or roles change and/or as they adapt to new execution environments. When evolving large distributed applications, it is particularly difficult to make changes to the data formats that underlie their components communications, because such format evolution can affect all or many application components. Prior approaches to the problem of implementing changes in the communications of a deployed system have relied upon ad-hoc solutions or on protocol negotiation to avoid message format mismatches. Unfortunately, such solutions tend to increase the complexity of application code. This paper presents a novel approach to the problem of data format evolution that combines meta-data about the data being exchanged with dynamic binary code generation to create a robust data exchange system that naturally supports application evolution. The idea is to specialize the communications of application components by dynamically generating the code that can automatically transform incoming data into forms that receiving components can understand. A realistic example in the context of publish/subscribe middleware is used to illustrate how this technique can be applied to enhance interoperability between different version of distributed applications


challenges of large applications in distributed environments | 2004

Morphable messaging: efficient support for evolution in distributed applications

Sandip Agarwala; Greg Eisenhauer; Karsten Schwan

All but the most briefly used systems must evolve as their mission and roles change over time. Evolution in the context of large distributed systems is extraordinarily complex because of the difficulty of upgrading all components simultaneously, and the fact that such systems are often very sensitive to changes in the message formats that underlay their communication. Prior approaches to the problem of implementing changes in a deployed system have relied upon ad-hoc solutions or protocol negotiation to avoid message format mismatches. We present a novel approach that combines message meta-data and dynamic code generation to create a robust messaging system that naturally supports application evolution.


Archive | 2006

AutoFlow: Autonomic Information Flows for Critical Information Systems

Zhongtang Cai; Ada Gavrilovska; Sandip Agarwala; Greg Eisenhauer; Brian F. Cooper; Patrick M. Widener; Jay F. Lofstead; Vibhore Kumar; Matt Wolf; Balasubramanian Seshasayee; Hasan Abbasi; Karsten Schwan; Mohamed S. Mansour

Collaboration


Dive into the Sandip Agarwala's collaboration.

Top Co-Authors

Avatar

Karsten Schwan

Georgia Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Greg Eisenhauer

Georgia Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Ada Gavrilovska

Georgia Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Matthew Wolf

Georgia Institute of Technology

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Hasan Abbasi

Georgia Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Jay F. Lofstead

Sandia National Laboratories

View shared research outputs
Top Co-Authors

Avatar

Jiantao Kong

Georgia Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Mohamed S. Mansour

Georgia Institute of Technology

View shared research outputs
Researchain Logo
Decentralizing Knowledge