Is this you? Create Your Porfile

Gaurang Mehta

University of Southern California

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Gaurang Mehta is active.

Explore More

Publication

Featured researches published by Gaurang Mehta.

Scientific Programming | 2005

Pegasus: A framework for mapping complex scientific workflows onto distributed systems

Ewa Deelman; Gurmeet Singh; Mei-Hui Su; Jim Blythe; Yolanda Gil; Carl Kesselman; Gaurang Mehta; Karan Vahi; G. Bruce Berriman; John C. Good; Anastasia C. Laity; Joseph C. Jacob; Daniel S. Katz

This paper describes the Pegasus framework that can be used to map complex scientific workflows onto distributed resources. Pegasus enables users to represent the workflows at an abstract level without needing to worry about the particulars of the target execution systems. The paper describes general issues in mapping applications and the functionality of Pegasus. We present the results of improving application performance through workflow restructuring which clusters multiple tasks in a workflow into single entities. A real-life astronomy application is used as the basis for the study.

Lecture Notes in Computer Science | 2004

Pegasus: Mapping Scientific Workflows onto the Grid

Ewa Deelman; Jim Blythe; Yolanda Gil; Carl Kesselman; Gaurang Mehta; Sonal Patil; Mei-Hui Su; Karan Vahi; Miron Livny

In this paper we describe the Pegasus system that can map complex workflows onto the Grid. Pegasus takes an abstract description of a workflow and finds the appropriate data and Grid resources to execute the workflow. Pegasus is being released as part of the GriPhyN Virtual Data Toolkit and has been used in a variety of applications ranging from astronomy, biology, gravitational-wave science, and high-energy physics. A deferred planning mode of Pegasus is also introduced.

ieee international conference on escience | 2008

On the Use of Cloud Computing for Scientific Workflows

Christina Hoffa; Gaurang Mehta; Timothy Freeman; Ewa Deelman; Kate Keahey; G. Bruce Berriman; John C. Good

This paper explores the use of cloud computing for scientific workflows, focusing on a widely used astronomy application-Montage. The approach is to evaluate from the point of view of a scientific workflow the tradeoffs between running in a local environment, if such is available, and running in a virtual environment via remote, wide-area network resource access. Our results show that for Montage, a workflow with short job runtimes, the virtual environment can provide good compute time performance but it can suffer from resource scheduling delays and widearea communications.

workflows in support of large-scale science | 2008

Characterization of scientific workflows

Shishir Bharathi; Ann L. Chervenak; Ewa Deelman; Gaurang Mehta; Mei-Hui Su; Karan Vahi

Researchers working on the planning, scheduling and execution of scientific workflows need access to a wide variety of scientific workflows to evaluate the performance of their implementations. We describe basic workflow structures that are composed into complex workflows by scientific communities. We provide a characterization of workflows from five diverse scientific applications, describing their composition and data and computational requirements. We also describe the effect of the size of the input datasets on the structure and execution profiles of these workflows. Finally, we describe a workflow generator that produces synthetic, parameterizable workflows that closely resemble the workflows that we characterize. We make these workflows available to the community to be used as benchmarks for evaluating various workflow systems and scheduling algorithms.

Future Generation Computer Systems | 2013

Characterizing and profiling scientific workflows

Gideon Juve; Ann L. Chervenak; Ewa Deelman; Shishir Bharathi; Gaurang Mehta; Karan Vahi

Researchers working on the planning, scheduling, and execution of scientific workflows need access to a wide variety of scientific workflows to evaluate the performance of their implementations. This paper provides a characterization of workflows from six diverse scientific applications, including astronomy, bioinformatics, earthquake science, and gravitational-wave physics. The characterization is based on novel workflow profiling tools that provide detailed information about the various computational tasks that are present in the workflow. This information includes I/O, memory and computational characteristics. Although the workflows are diverse, there is evidence that each workflow has a job type that consumes the most amount of runtime. The study also uncovered inefficiency in a workflow component implementation, where the component was re-reading the same data multiple times.

international conference on e-science | 2009

Scientific workflow applications on Amazon EC2

Gideon Juve; Ewa Deelman; Karan Vahi; Gaurang Mehta; G. Bruce Berriman; Benjamin P. Berman; Philip J. Maechling

The proliferation of commercial cloud computing providers has generated significant interest in the scientific computing community. Much recent research has attempted to determine the benefits and drawbacks of cloud computing for scientific applications. Although clouds have many attractive features, such as virtualization, on-demand provisioning, and “pay as you go” usage-based pricing, it is not clear whether they are able to deliver the performance required for scientific applications at a reasonable price. In this paper we examine the performance and cost of clouds from the perspective of scientific workflow applications. We use three characteristic workflows to compare the performance of a commercial cloud with that of a typical HPC system, and we analyze the various costs associated with running those workflows in the cloud. We find that the performance of clouds is not unreasonable given the hardware resources provided, and that performance comparable to HPC systems can be achieved given similar resources. We also find that the cost of running workflows on a commercial cloud can be reduced by storing data in the cloud rather than transferring it from outside.

high performance distributed computing | 2002

GriPhyN and LIGO, building a virtual data Grid for gravitational wave scientists

Ewa Deelman; Carl Kesselman; Gaurang Mehta; Leila Meshkat; Laura Pearlman; K. Blackburn; Phil Ehrens; Albert Lazzarini; Roy Williams; S. Koranda

Many Physics experiments today generate large volumes of data. That data is then processed in a variety of ways in order to achieve the understanding of fundamental physical phenomena. The goal of the NSF-funded GriPhyN project (Grid Physics Network) is to enable scientists to seamlessly access data whether it is raw experimental data or a data product which is a result of further processing. GriPhyN provides a new degree of transparency in how data-handling and processing capabilities are integrated to deliver data products to end-users or applications, so that requests for such products are easily mapped into computation and/or data access at multiple locations. GriPhyN refers to the set of all data products available to the user as virtual data. Among the physics applications participating in the project is the Laser Interferometer Gravitational-wave Observatory (LIGO), which is being built to observe the gravitational waves predicted by general relativity. We describe our initial design and prototype of a virtual data Grid for LIGO.

international conference on e science | 2006

Managing Large-Scale Workflow Execution from Resource Provisioning to Provenance Tracking: The CyberShake Example

Ewa Deelman; Scott Callaghan; Edward H. Field; H. Francoeur; Robert W. Graves; Nitin Gupta; Vipin Gupta; Thomas H. Jordan; Carl Kesselman; Philip J. Maechling; John Mehringer; Gaurang Mehta; David A. Okaya; Karan Vahi; Li Zhao

This paper discusses the process of building an environment where large-scale, complex, scientific analysis can be scheduled onto a heterogeneous collection of computational and storage resources. The example application is the Southern California Earthquake Center (SCEC) CyberShake project, an analysis designed to compute probabilistic seismic hazard curves for sites in the Los Angeles area. We explain which software tools were used to build to the system, describe their functionality and interactions. We show the results of running the CyberShake analysis that included over 250,000 jobs using resources available through SCEC and the TeraGrid.

ieee international conference on high performance computing data and analytics | 2010

Data Sharing Options for Scientific Workflows on Amazon EC2

Gideon Juve; Ewa Deelman; Karan Vahi; Gaurang Mehta; G. Bruce Berriman; Benjamin P. Berman; Philip J. Maechling

Efficient data management is a key component in achieving good performance for scientific workflows in distributed environments. Workflow applications typically communicate data between tasks using files. When tasks are distributed, these files are either transferred from one computational node to another, or accessed through a shared storage system. In grids and clusters, workflow data is often stored on network and parallel file systems. In this paper we investigate some of the ways in which data can be managed for workflows in the cloud. We ran experiments using three typical workflow applications on Amazons EC2. We discuss the various storage and file systems we used, describe the issues and problems we encountered deploying them on EC2, and analyze the resulting performance and cost of the workflows.

grid computing | 2007

Data placement for scientific applications in distributed environments

Ann L. Chervenak; Ewa Deelman; Miron Livny; Mei-Hui Su; Robert Schuler; Shishir Bharathi; Gaurang Mehta; Karan Vahi

Scientific applications often perform complex computational analyses that consume and produce large data sets. We are concerned with data placement policies that distribute data in ways that are advantageous for application execution, for example, by placing data sets so that they may be staged into or out of computations efficiently or by replicating them for improved performance and reliability. In particular, we propose to study the relationship between data placement services and workflow management systems. In this paper, we explore the interactions between two services used in large-scale science today. We evaluate the benefits of prestaging data using the Data Replication Service versus using the native data stage-in mechanisms of the Pegasus workflow management system. We use the astronomy application, Montage, for our experiments and modify it to study the effect of input data size on the benefits of data prestaging. As the size of input data sets increases, prestaging using a data placement service can significantly improve the performance of the overall analysis.

Explore More