Is this you? Create Your Porfile

Karan Vahi

University of Southern California

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Karan Vahi is active.

Explore More

Publication

Featured researches published by Karan Vahi.

Scientific Programming | 2005

Pegasus: A framework for mapping complex scientific workflows onto distributed systems

Ewa Deelman; Gurmeet Singh; Mei-Hui Su; Jim Blythe; Yolanda Gil; Carl Kesselman; Gaurang Mehta; Karan Vahi; G. Bruce Berriman; John C. Good; Anastasia C. Laity; Joseph C. Jacob; Daniel S. Katz

This paper describes the Pegasus framework that can be used to map complex scientific workflows onto distributed resources. Pegasus enables users to represent the workflows at an abstract level without needing to worry about the particulars of the target execution systems. The paper describes general issues in mapping applications and the functionality of Pegasus. We present the results of improving application performance through workflow restructuring which clusters multiple tasks in a workflow into single entities. A real-life astronomy application is used as the basis for the study.

Lecture Notes in Computer Science | 2004

Pegasus: Mapping Scientific Workflows onto the Grid

Ewa Deelman; Jim Blythe; Yolanda Gil; Carl Kesselman; Gaurang Mehta; Sonal Patil; Mei-Hui Su; Karan Vahi; Miron Livny

In this paper we describe the Pegasus system that can map complex workflows onto the Grid. Pegasus takes an abstract description of a workflow and finds the appropriate data and Grid resources to execute the workflow. Pegasus is being released as part of the GriPhyN Virtual Data Toolkit and has been used in a variety of applications ranging from astronomy, biology, gravitational-wave science, and high-energy physics. A deferred planning mode of Pegasus is also introduced.

workflows in support of large-scale science | 2008

Characterization of scientific workflows

Shishir Bharathi; Ann L. Chervenak; Ewa Deelman; Gaurang Mehta; Mei-Hui Su; Karan Vahi

Researchers working on the planning, scheduling and execution of scientific workflows need access to a wide variety of scientific workflows to evaluate the performance of their implementations. We describe basic workflow structures that are composed into complex workflows by scientific communities. We provide a characterization of workflows from five diverse scientific applications, describing their composition and data and computational requirements. We also describe the effect of the size of the input datasets on the structure and execution profiles of these workflows. Finally, we describe a workflow generator that produces synthetic, parameterizable workflows that closely resemble the workflows that we characterize. We make these workflows available to the community to be used as benchmarks for evaluating various workflow systems and scheduling algorithms.

Future Generation Computer Systems | 2013

Characterizing and profiling scientific workflows

Gideon Juve; Ann L. Chervenak; Ewa Deelman; Shishir Bharathi; Gaurang Mehta; Karan Vahi

Researchers working on the planning, scheduling, and execution of scientific workflows need access to a wide variety of scientific workflows to evaluate the performance of their implementations. This paper provides a characterization of workflows from six diverse scientific applications, including astronomy, bioinformatics, earthquake science, and gravitational-wave physics. The characterization is based on novel workflow profiling tools that provide detailed information about the various computational tasks that are present in the workflow. This information includes I/O, memory and computational characteristics. Although the workflows are diverse, there is evidence that each workflow has a job type that consumes the most amount of runtime. The study also uncovered inefficiency in a workflow component implementation, where the component was re-reading the same data multiple times.

international conference on e-science | 2009

Scientific workflow applications on Amazon EC2

Gideon Juve; Ewa Deelman; Karan Vahi; Gaurang Mehta; G. Bruce Berriman; Benjamin P. Berman; Philip J. Maechling

The proliferation of commercial cloud computing providers has generated significant interest in the scientific computing community. Much recent research has attempted to determine the benefits and drawbacks of cloud computing for scientific applications. Although clouds have many attractive features, such as virtualization, on-demand provisioning, and “pay as you go” usage-based pricing, it is not clear whether they are able to deliver the performance required for scientific applications at a reasonable price. In this paper we examine the performance and cost of clouds from the perspective of scientific workflow applications. We use three characteristic workflows to compare the performance of a commercial cloud with that of a typical HPC system, and we analyze the various costs associated with running those workflows in the cloud. We find that the performance of clouds is not unreasonable given the hardware resources provided, and that performance comparable to HPC systems can be achieved given similar resources. We also find that the cost of running workflows on a commercial cloud can be reduced by storing data in the cloud rather than transferring it from outside.

Future Generation Computer Systems | 2015

Pegasus, a workflow management system for science automation

Ewa Deelman; Karan Vahi; Gideon Juve; Mats Rynge; Scott Callaghan; Philip J. Maechling; Rajiv Mayani; Weiwei Chen; Rafael Ferreira da Silva; Miron Livny; Kent Wenger

Modern science often requires the execution of large-scale, multi-stage simulation and data analysis pipelines to enable the study of complex systems. The amount of computation and data involved in these pipelines requires scalable workflow management systems that are able to reliably and efficiently coordinate and automate data movement and task execution on distributed computational resources: campus clusters, national cyberinfrastructures, and commercial and academic clouds. This paper describes the design, development and evolution of the Pegasus Workflow Management System, which maps abstract workflow descriptions onto distributed computing infrastructures. Pegasus has been used for more than twelve years by scientists in a wide variety of domains, including astronomy, seismology, bioinformatics, physics and others. This paper provides an integrated view of the Pegasus system, showing its capabilities that have been developed over time in response to application needs and to the evolution of the scientific computing platforms. The paper describes how Pegasus achieves reliable, scalable workflow execution across a wide variety of computing infrastructures. Comprehensive description of the Pegasus Workflow Management System.Detailed explanation of Pegasus workflow transformations.Data management in Pegasus.Earthquake science application example.

cluster computing and the grid | 2007

Scheduling Data-IntensiveWorkflows onto Storage-Constrained Distributed Resources

Arun Ramakrishnan; Gurmeet Singh; Henan Zhao; Ewa Deelman; Rizos Sakellariou; Karan Vahi; K. Blackburn; David Meyers; Michael Samidi

In this paper we examine the issue of optimizing disk usage and of scheduling large-scale scientific workflows onto distributed resources where the workflows are data- intensive, requiring large amounts of data storage, and where the resources have limited storage resources. Our approach is two-fold: we minimize the amount of space a workflow requires during execution by removing data files at runtime when they are no longer required and we schedule the workflows in a way that assures that the amount of data required and generated by the workflow fits onto the individual resources. For a workflow used by gravitational- wave physicists, we were able to improve the amount of storage required by the workflow by up to 57 %. We also designed an algorithm that can not only find feasible solutions for workflow task assignment to resources in disk- space constrained environments, but can also improve the overall workflow performance.

international conference on e science | 2006

Managing Large-Scale Workflow Execution from Resource Provisioning to Provenance Tracking: The CyberShake Example

Ewa Deelman; Scott Callaghan; Edward H. Field; H. Francoeur; Robert W. Graves; Nitin Gupta; Vipin Gupta; Thomas H. Jordan; Carl Kesselman; Philip J. Maechling; John Mehringer; Gaurang Mehta; David A. Okaya; Karan Vahi; Li Zhao

This paper discusses the process of building an environment where large-scale, complex, scientific analysis can be scheduled onto a heterogeneous collection of computational and storage resources. The example application is the Southern California Earthquake Center (SCEC) CyberShake project, an analysis designed to compute probabilistic seismic hazard curves for sites in the Los Angeles area. We explain which software tools were used to build to the system, describe their functionality and interactions. We show the results of running the CyberShake analysis that included over 250,000 jobs using resources available through SCEC and the TeraGrid.

ieee international conference on high performance computing data and analytics | 2010

Data Sharing Options for Scientific Workflows on Amazon EC2

Gideon Juve; Ewa Deelman; Karan Vahi; Gaurang Mehta; G. Bruce Berriman; Benjamin P. Berman; Philip J. Maechling

Efficient data management is a key component in achieving good performance for scientific workflows in distributed environments. Workflow applications typically communicate data between tasks using files. When tasks are distributed, these files are either transferred from one computational node to another, or accessed through a shared storage system. In grids and clusters, workflow data is often stored on network and parallel file systems. In this paper we investigate some of the ways in which data can be managed for workflows in the cloud. We ran experiments using three typical workflow applications on Amazons EC2. We discuss the various storage and file systems we used, describe the issues and problems we encountered deploying them on EC2, and analyze the resulting performance and cost of the workflows.

grid computing | 2007

Data placement for scientific applications in distributed environments

Ann L. Chervenak; Ewa Deelman; Miron Livny; Mei-Hui Su; Robert Schuler; Shishir Bharathi; Gaurang Mehta; Karan Vahi

Scientific applications often perform complex computational analyses that consume and produce large data sets. We are concerned with data placement policies that distribute data in ways that are advantageous for application execution, for example, by placing data sets so that they may be staged into or out of computations efficiently or by replicating them for improved performance and reliability. In particular, we propose to study the relationship between data placement services and workflow management systems. In this paper, we explore the interactions between two services used in large-scale science today. We evaluate the benefits of prestaging data using the Data Replication Service versus using the native data stage-in mechanisms of the Pegasus workflow management system. We use the astronomy application, Montage, for our experiments and modify it to study the effect of input data size on the benefits of data prestaging. As the size of input data sets increases, prestaging using a data placement service can significantly improve the performance of the overall analysis.

Explore More