Shishir Bharathi | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Shishir Bharathi is active.

Explore More

Publication

Featured researches published by Shishir Bharathi.

workflows in support of large-scale science | 2008

Characterization of scientific workflows

Shishir Bharathi; Ann L. Chervenak; Ewa Deelman; Gaurang Mehta; Mei-Hui Su; Karan Vahi

Researchers working on the planning, scheduling and execution of scientific workflows need access to a wide variety of scientific workflows to evaluate the performance of their implementations. We describe basic workflow structures that are composed into complex workflows by scientific communities. We provide a characterization of workflows from five diverse scientific applications, describing their composition and data and computational requirements. We also describe the effect of the size of the input datasets on the structure and execution profiles of these workflows. Finally, we describe a workflow generator that produces synthetic, parameterizable workflows that closely resemble the workflows that we characterize. We make these workflows available to the community to be used as benchmarks for evaluating various workflow systems and scheduling algorithms.

conference on high performance computing (supercomputing) | 2003

A Metadata Catalog Service for Data Intensive Applications

Gurmeet Singh; Shishir Bharathi; Ann L. Chervenak; Ewa Deelman; Carl Kesselman; Mary Manohar; Sonal Patil; Laura Pearlman

Advances in computational, storage and network technologies as well as middle ware such as the Globus Toolkit allow scientists to expand the sophistication and scope of data-intensive applications. These applications produce and analyze terabytes and petabytes of data that are distributed in millions of files or objects. To manage these large data sets efficiently, metadata or descriptive information about the data needs to be managed. There are various types of metadata, and it is likely that a range of metadata services will exist in Grid environments that are specialized for particular types of metadata cataloguing and discovery. In this paper, we present the design of a Metadata Catalog Service (MCS) that provides a mechanism for storing and accessing descriptive metadata and allows users to query for data items based on desired attributes. We describe our experience in using the MCS with several applications and present a scalability study of the service.

Future Generation Computer Systems | 2013

Characterizing and profiling scientific workflows

Gideon Juve; Ann L. Chervenak; Ewa Deelman; Shishir Bharathi; Gaurang Mehta; Karan Vahi

Researchers working on the planning, scheduling, and execution of scientific workflows need access to a wide variety of scientific workflows to evaluate the performance of their implementations. This paper provides a characterization of workflows from six diverse scientific applications, including astronomy, bioinformatics, earthquake science, and gravitational-wave physics. The characterization is based on novel workflow profiling tools that provide detailed information about the various computational tasks that are present in the workflow. This information includes I/O, memory and computational characteristics. Although the workflows are diverse, there is evidence that each workflow has a job type that consumes the most amount of runtime. The study also uncovered inefficiency in a workflow component implementation, where the component was re-reading the same data multiple times.

arXiv: Computational Engineering, Finance, and Science | 2005

The Earth System Grid: Supporting the Next Generation of Climate Modeling Research

David E. Bernholdt; Shishir Bharathi; David Brown; Kasidit Chanchio; Meili Chen; Ann L. Chervenak; Luca Cinquini; Bob Drach; Ian T. Foster; Peter Fox; José I. García; Carl Kesselman; Rob S. Markel; Don Middleton; Veronika Nefedova; Line C. Pouchard; Arie Shoshani; Alex Sim; Gary Strand; Dean N. Williams

Understanding the Earths climate system and how it might be changing is a preeminent scientific challenge. Global climate models are used to simulate past, present, and future climates, and experiments are executed continuously on an array of distributed supercomputers. The resulting data archive, spread over several sites, currently contains upwards of 100 TB of simulation data and is growing rapidly. Looking toward mid-decade and beyond, we must anticipate and prepare for distributed climate research data holdings of many petabytes. The Earth System Grid (ESG) is a collaborative interdisciplinary project aimed at addressing the challenge of enabling management, discovery, access, and analysis of these critically important datasets in a distributed and heterogeneous computational environment. The problem is fundamentally a Grid problem. Building upon the Globus toolkit and a variety of other technologies, ESG is developing an environment that addresses authentication, authorization for data access, large-scale data transport and management, services and abstractions for high-performance remote data access, mechanisms for scalable data replication, cataloging with rich semantic and syntactic information, data discovery, distributed monitoring, and Web-based portals for using the system.

high performance distributed computing | 2004

Performance and scalability of a replica location service

Ann L. Chervenak; Naveen Palavalli; Shishir Bharathi; Carl Kesselman; Robert Schwartzkopf

We describe the implementation and evaluate the performance of a replica location service that is part of the Globus Toolkit Version 3.0. A replica location service (RLS) provides a mechanism for registering the existence of replicas and discovering them. Features of our implementation include the use of soft state update protocols to populate a distributed index and optional Bloom filter compression to reduce the size of these updates. Our results demonstrate that RLS performance scales well for individual servers with millions of entries and up to 100 requesting threads. We also show that the distributed RLS index scales well when using Bloom filter compression for wide area updates.

grid computing | 2007

Data placement for scientific applications in distributed environments

Ann L. Chervenak; Ewa Deelman; Miron Livny; Mei-Hui Su; Robert Schuler; Shishir Bharathi; Gaurang Mehta; Karan Vahi

Scientific applications often perform complex computational analyses that consume and produce large data sets. We are concerned with data placement policies that distribute data in ways that are advantageous for application execution, for example, by placing data sets so that they may be staged into or out of computations efficiently or by replicating them for improved performance and reliability. In particular, we propose to study the relationship between data placement services and workflow management systems. In this paper, we explore the interactions between two services used in large-scale science today. We evaluate the benefits of prestaging data using the Data Replication Service versus using the native data stage-in mechanisms of the Pegasus workflow management system. We use the astronomy application, Montage, for our experiments and modify it to study the effect of input data size on the benefits of data prestaging. As the size of input data sets increases, prestaging using a data placement service can significantly improve the performance of the overall analysis.

IEEE Transactions on Parallel and Distributed Systems | 2009

The Globus Replica Location Service: Design and Experience

Ann L. Chervenak; Robert Schuler; Matei Ripeanu; M. Ali Amer; Shishir Bharathi; Ian T. Foster; Adriana Iamnitchi; Carl Kesselman

Distributed computing systems employ replication to improve overall system robustness, scalability, and performance. A replica location service (RLS) offers a mechanism to maintain and provide information about physical locations of replicas. This paper defines a design framework for RLSs that supports a variety of deployment options. We describe the RLS implementation that is distributed with the Globus toolkit and is in production use in several grid deployments. Features of our modular implementation include the use of soft-state protocols to populate a distributed index and Bloom filter compression to reduce overheads for distribution of index information. Our performance evaluation demonstrates that the RLS implementation scales well for individual servers with millions of entries and up to 100 clients. We describe the characteristics of existing RLS deployments and discuss how RLS has been integrated with higher-level data management services.

international conference on e science | 2006

Monitoring the Earth System Grid with MDS4

Ann L. Chervenak; Jennifer M. Schopf; Laura Pearlman; Mei-Hui Su; Shishir Bharathi; Luca Cinquini; Mike D'Arcy; Neill Miller; David E. Bernholdt

In production Grids for scientific applications, service and resource failures must be detected and addressed quickly. In this paper, we describe the monitoring infrastructure used by the Earth System Grid (ESG) project, a scientific collaboration that supports global climate research. ESG uses the Globus Toolkit Monitoring and Discovery System (MDS4) to monitor its resources. We describe how the MDS4 Index Service collects information about ESG resources and how the MDS4 Trigger Service checks specified failure conditions and notifies system administrators when failures occur. We present monitoring statistics for May 2006 and describe our experiences using MDS4 to monitor ESG resources over the last two years.

workflows in support of large scale science | 2009

Scheduling data-intensive workflows on storage constrained resources

Shishir Bharathi; Ann L. Chervenak

Data-intensive workflows stage large amounts of data in and out of compute resources. The data staging strategies employed during the execution of such workflows can have a significant impact on the time taken to complete the execution or on the overall cost of the execution. We describe the problem of minimizing the overall time taken for execution and present a heuristic based on ordering clean-up jobs in the workflow. Next, we develop genetic algorithm based approaches to solving the same problem and demonstrate that the results obtained with the heuristic are comparable to the best results obtained with the genetic algorithm based approaches. We also describe the problem of minimizing the overall cost of execution and extend our genetic algorithm to generate schedules that vary the number of processors and the amount of storage provisioned for execution to generate low cost schedules.

Proceedings of the second international workshop on Data-aware distributed computing | 2009

Data Staging Strategies and Their Impact on the Execution of Scientific Workflows

Shishir Bharathi; Ann L. Chervenak

Data intensive workflows process and generate large amounts of data. Strategies employed to stage data in and out of compute resources can often have a significant impact on the overall execution of a workflow. We study the relationships between data placement services that perform the staging and workflow managers that control the release of computational jobs. We describe a framework that classifies data staging strategies into decoupled, loosely-coupled and tightly-coupled modes, based on the degree of their interaction with the workflow manager. We present the results of simulation studies that investigate the effect of decoupled, loosely-coupled and tightly-coupled data staging strategies on synthetic workflows resembling those from real world scientific applications.

Explore More