Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Gideon Juve is active.

Publication


Featured researches published by Gideon Juve.


Future Generation Computer Systems | 2013

Characterizing and profiling scientific workflows

Gideon Juve; Ann L. Chervenak; Ewa Deelman; Shishir Bharathi; Gaurang Mehta; Karan Vahi

Researchers working on the planning, scheduling, and execution of scientific workflows need access to a wide variety of scientific workflows to evaluate the performance of their implementations. This paper provides a characterization of workflows from six diverse scientific applications, including astronomy, bioinformatics, earthquake science, and gravitational-wave physics. The characterization is based on novel workflow profiling tools that provide detailed information about the various computational tasks that are present in the workflow. This information includes I/O, memory and computational characteristics. Although the workflows are diverse, there is evidence that each workflow has a job type that consumes the most amount of runtime. The study also uncovered inefficiency in a workflow component implementation, where the component was re-reading the same data multiple times.


international conference on e-science | 2009

Scientific workflow applications on Amazon EC2

Gideon Juve; Ewa Deelman; Karan Vahi; Gaurang Mehta; G. Bruce Berriman; Benjamin P. Berman; Philip J. Maechling

The proliferation of commercial cloud computing providers has generated significant interest in the scientific computing community. Much recent research has attempted to determine the benefits and drawbacks of cloud computing for scientific applications. Although clouds have many attractive features, such as virtualization, on-demand provisioning, and “pay as you go” usage-based pricing, it is not clear whether they are able to deliver the performance required for scientific applications at a reasonable price. In this paper we examine the performance and cost of clouds from the perspective of scientific workflow applications. We use three characteristic workflows to compare the performance of a commercial cloud with that of a typical HPC system, and we analyze the various costs associated with running those workflows in the cloud. We find that the performance of clouds is not unreasonable given the hardware resources provided, and that performance comparable to HPC systems can be achieved given similar resources. We also find that the cost of running workflows on a commercial cloud can be reduced by storing data in the cloud rather than transferring it from outside.


Future Generation Computer Systems | 2015

Pegasus, a workflow management system for science automation

Ewa Deelman; Karan Vahi; Gideon Juve; Mats Rynge; Scott Callaghan; Philip J. Maechling; Rajiv Mayani; Weiwei Chen; Rafael Ferreira da Silva; Miron Livny; Kent Wenger

Modern science often requires the execution of large-scale, multi-stage simulation and data analysis pipelines to enable the study of complex systems. The amount of computation and data involved in these pipelines requires scalable workflow management systems that are able to reliably and efficiently coordinate and automate data movement and task execution on distributed computational resources: campus clusters, national cyberinfrastructures, and commercial and academic clouds. This paper describes the design, development and evolution of the Pegasus Workflow Management System, which maps abstract workflow descriptions onto distributed computing infrastructures. Pegasus has been used for more than twelve years by scientists in a wide variety of domains, including astronomy, seismology, bioinformatics, physics and others. This paper provides an integrated view of the Pegasus system, showing its capabilities that have been developed over time in response to application needs and to the evolution of the scientific computing platforms. The paper describes how Pegasus achieves reliable, scalable workflow execution across a wide variety of computing infrastructures. Comprehensive description of the Pegasus Workflow Management System.Detailed explanation of Pegasus workflow transformations.Data management in Pegasus.Earthquake science application example.


ieee international conference on high performance computing data and analytics | 2012

Cost- and deadline-constrained provisioning for scientific workflow ensembles in IaaS clouds

Maciej Malawski; Gideon Juve; Ewa Deelman; Jarek Nabrzyski

Large-scale applications expressed as scientific workflows are often grouped into ensembles of inter-related workflows. In this paper, we address a new and important problem concerning the efficient management of such ensembles under budget and deadline constraints on Infrastructure- as-aService (IaaS) clouds. We discuss, develop, and assess algorithms based on static and dynamic strategies for both task scheduling and resource provisioning. We perform the evaluation via simulation using a set of scientific workflow ensembles with a broad range of budget and deadline parameters, taking into account uncertainties in task runtime estimations, provisioning delays, and failures. We find that the key factor determining the performance of an algorithm is its ability to decide which workflows in an ensemble to admit or reject for execution. Our results show that an admission procedure based on workflow structure and estimates of task runtimes can significantly improve the quality of solutions.


ieee international conference on high performance computing data and analytics | 2010

Data Sharing Options for Scientific Workflows on Amazon EC2

Gideon Juve; Ewa Deelman; Karan Vahi; Gaurang Mehta; G. Bruce Berriman; Benjamin P. Berman; Philip J. Maechling

Efficient data management is a key component in achieving good performance for scientific workflows in distributed environments. Workflow applications typically communicate data between tasks using files. When tasks are distributed, these files are either transferred from one computational node to another, or accessed through a shared storage system. In grids and clusters, workflow data is often stored on network and parallel file systems. In this paper we investigate some of the ways in which data can be managed for workflows in the cloud. We ran experiments using three typical workflow applications on Amazons EC2. We discuss the various storage and file systems we used, describe the issues and problems we encountered deploying them on EC2, and analyze the resulting performance and cost of the workflows.


scientific cloud computing | 2011

Experiences using cloud computing for a scientific workflow application

Jens-Sönke Vöckler; Gideon Juve; Ewa Deelman; Mats Rynge; Bruce Berriman

Clouds are rapidly becoming an important platform for scientific applications. In this paper we describe our experiences running a scientific workflow application in the cloud. The application was developed to process astronomy data released by the Kepler project, a NASA mission to search for Earth-like planets orbiting other stars. This workflow was deployed across multiple clouds using the Pegasus Workflow Management System. The clouds used include several sites within the FutureGrid, NERSCs Magellan cloud, and Amazon EC2. We describe how the application was deployed, evaluate its performance executing in different clouds (based on Nimbus, Eucalyptus, and EC2), and discuss the challenges of deploying and executing workflows in a cloud environment. We also demonstrate how Pegasus was able to support sky computing by executing a single workflow across multiple cloud infrastructures simultaneously.


ACM Crossroads Student Magazine | 2010

Scientific workflows and clouds

Gideon Juve; Ewa Deelman

In recent years, empirical science has been evolving from physical experimentation to computation-based research. In astronomy, researchers seldom spend time at a telescope, but instead access the large number of image databases that are created and curated by the community [42]. In bioinformatics, data repositories hosted by entities such as the National Institutes of Health [29] provide the data gathered by Genome-Wide Association Studies and enable researchers to link particular genotypes to a variety of diseases.


grid computing | 2012

An Evaluation of the Cost and Performance of Scientific Workflows on Amazon EC2

Gideon Juve; Ewa Deelman; G. Bruce Berriman; Benjamin P. Berman; Philip J. Maechling

Workflows are used to orchestrate data-intensive applications in many different scientific domains. Workflow applications typically communicate data between processing steps using intermediate files. When tasks are distributed, these files are either transferred from one computational node to another, or accessed through a shared storage system. As a result, the efficient management of data is a key factor in achieving good performance for workflow applications in distributed environments. In this paper we investigate some of the ways in which data can be managed for workflows in the cloud. We ran experiments using three typical workflow applications on Amazon’s EC2 cloud computing platform. We discuss the various storage and file systems we used, describe the issues and problems we encountered deploying them on EC2, and analyze the resulting performance and cost of the workflows.


ieee international conference on cloud computing technology and science | 2011

Automating Application Deployment in Infrastructure Clouds

Gideon Juve; Ewa Deelman

Cloud computing systems are becoming an important platform for distributed applications in science and engineering. Infrastructure as a Service (IaaS) clouds provide the capability to provision virtual machines (VMs) on demand with a specific configuration of hardware resources, but they do not provide functionality for managing resources once they are provisioned. In order for such clouds to be used effectively, tools need to be developed that can help users to deploy their applications in the cloud. In this paper we describe a system we have developed to provision, configure, and manage virtual machine deployments in the cloud. We also describe our experiences using the system to provision resources for scientific workflow applications, and identify areas for further research.


ieee international conference on escience | 2008

Resource Provisioning Options for Large-Scale Scientific Workflows

Gideon Juve; Ewa Deelman

Scientists in many fields are developing large-scale workflows containing millions of tasks and requiring thousands of hours of aggregate computation time. Acquiring the computational resources to execute these workflows poses many challenges for application developers. Although the grid provides ready access to large pools of computational resources, the traditional approach to accessing these resources suffers from many overheads that lead to poor performance. In this paper we examine several techniques based on resource provisioning that may be used to reduce these overheads. These techniques include: advance reservations, multi-level scheduling, and infrastructure as a service (IaaS). We explain the advantages and disadvantages of these techniques in terms of cost, performance and usability.

Collaboration


Dive into the Gideon Juve's collaboration.

Top Co-Authors

Avatar

Ewa Deelman

University of Southern California

View shared research outputs
Top Co-Authors

Avatar

Karan Vahi

University of Southern California

View shared research outputs
Top Co-Authors

Avatar

Gaurang Mehta

University of Southern California

View shared research outputs
Top Co-Authors

Avatar

Mats Rynge

University of Southern California

View shared research outputs
Top Co-Authors

Avatar

Philip J. Maechling

University of Southern California

View shared research outputs
Top Co-Authors

Avatar

G. Bruce Berriman

California Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Rafael Ferreira da Silva

University of Southern California

View shared research outputs
Top Co-Authors

Avatar

Thomas H. Jordan

University of Southern California

View shared research outputs
Top Co-Authors

Avatar

Robert W. Graves

United States Geological Survey

View shared research outputs
Top Co-Authors

Avatar

Scott Callaghan

University of Southern California

View shared research outputs
Researchain Logo
Decentralizing Knowledge