G. Bruce Berriman | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where G. Bruce Berriman is active.

Explore More

Publication

Featured researches published by G. Bruce Berriman.

Scientific Programming | 2005

Pegasus: A framework for mapping complex scientific workflows onto distributed systems

Ewa Deelman; Gurmeet Singh; Mei-Hui Su; Jim Blythe; Yolanda Gil; Carl Kesselman; Gaurang Mehta; Karan Vahi; G. Bruce Berriman; John C. Good; Anastasia C. Laity; Joseph C. Jacob; Daniel S. Katz

This paper describes the Pegasus framework that can be used to map complex scientific workflows onto distributed resources. Pegasus enables users to represent the workflows at an abstract level without needing to worry about the particulars of the target execution systems. The paper describes general issues in mapping applications and the functionality of Pegasus. We present the results of improving application performance through workflow restructuring which clusters multiple tasks in a workflow into single entities. A real-life astronomy application is used as the basis for the study.

ieee international conference on high performance computing data and analytics | 2008

The cost of doing science on the cloud: the Montage example

Ewa Deelman; Gurmeet Singh; Miron Livny; G. Bruce Berriman; John C. Good

Utility grids such as the Amazon EC2 cloud and Amazon S3 offer computational and storage resources that can be used on-demand for a fee by compute and data-intensive applications. The cost of running an application on such a cloud depends on the compute, storage and communication resources it will provision and consume. Different execution plans of the same application may result in significantly different costs. Using the Amazon cloud fee structure and a real-life astronomy application, we study via simulation the cost performance tradeoffs of different execution and resource provisioning plans. We also study these trade-offs in the context of the storage and communication fees of Amazon S3 when used for long-term application data archival. Our results show that by provisioning the right amount of storage and compute resources, cost can be significantly reduced with no significant impact on application performance.

ieee international conference on escience | 2008

On the Use of Cloud Computing for Scientific Workflows

Christina Hoffa; Gaurang Mehta; Timothy Freeman; Ewa Deelman; Kate Keahey; G. Bruce Berriman; John C. Good

This paper explores the use of cloud computing for scientific workflows, focusing on a widely used astronomy application-Montage. The approach is to evaluate from the point of view of a scientific workflow the tradeoffs between running in a local environment, if such is available, and running in a virtual environment via remote, wide-area network resource access. Our results show that for Montage, a workflow with short job runtimes, the virtual environment can provide good compute time performance but it can suffer from resource scheduling delays and widearea communications.

international conference on e-science | 2009

Scientific workflow applications on Amazon EC2

Gideon Juve; Ewa Deelman; Karan Vahi; Gaurang Mehta; G. Bruce Berriman; Benjamin P. Berman; Philip J. Maechling

The proliferation of commercial cloud computing providers has generated significant interest in the scientific computing community. Much recent research has attempted to determine the benefits and drawbacks of cloud computing for scientific applications. Although clouds have many attractive features, such as virtualization, on-demand provisioning, and “pay as you go” usage-based pricing, it is not clear whether they are able to deliver the performance required for scientific applications at a reasonable price. In this paper we examine the performance and cost of clouds from the perspective of scientific workflow applications. We use three characteristic workflows to compare the performance of a commercial cloud with that of a typical HPC system, and we analyze the various costs associated with running those workflows in the cloud. We find that the performance of clouds is not unreasonable given the hardware resources provided, and that performance comparable to HPC systems can be achieved given similar resources. We also find that the cost of running workflows on a commercial cloud can be reduced by storing data in the cloud rather than transferring it from outside.

computational science and engineering | 2009

Montage: a grid portal and software toolkit for science-grade astronomical image mosaicking

Joseph C. Jacob; Daniel S. Katz; G. Bruce Berriman; John C. Good; Anastasia C. Laity; Ewa Deelman; Carl Kesselman; Gurmeet Singh; Mei Hui Su; Thomas A. Prince; Roy Williams

Montage is a portable software toolkit to construct custom, science-grade mosaics that preserve the astrometry and photometry of astronomical sources. The user specifies the dataset, wavelength, sky location, mosaic size, coordinate system, projection, and spatial sampling. Montage supports massive astronomical datasets that may be stored in distributed archives. Montage can be run on both single- and multi-processor computers, including clusters and grids. Standard grid tools are used to access remote data or run Montage on remote computers. This paper describes the architecture, algorithms, performance, and usage of Montage as both a software toolkit and a grid portal.

ieee international conference on high performance computing data and analytics | 2010

Data Sharing Options for Scientific Workflows on Amazon EC2

Gideon Juve; Ewa Deelman; Karan Vahi; Gaurang Mehta; G. Bruce Berriman; Benjamin P. Berman; Philip J. Maechling

Efficient data management is a key component in achieving good performance for scientific workflows in distributed environments. Workflow applications typically communicate data between tasks using files. When tasks are distributed, these files are either transferred from one computational node to another, or accessed through a shared storage system. In grids and clusters, workflow data is often stored on network and parallel file systems. In this paper we investigate some of the ways in which data can be managed for workflows in the cloud. We ran experiments using three typical workflow applications on Amazons EC2. We discuss the various storage and file systems we used, describe the issues and problems we encountered deploying them on EC2, and analyze the resulting performance and cost of the workflows.

The Astrophysical Journal | 2008

A Cross-Match of 2MASS and SDSS: Newly Found L and T Dwarfs and an Estimate of the Space Density of T Dwarfs

Stanimir Metchev; J. Davy Kirkpatrick; G. Bruce Berriman; Dagny L. Looper

We report new L and T dwarfs found in a cross-match of the SDSS Data Release 1 and 2MASS. Our simultaneous search of the two databases effectively allows us to relax the criteria for object detection in either survey and to explore the combined databases to a greater completeness level. We find two new T dwarfs in addition to the 13 already known in the SDSS DR1 footprint. We also identify 22 new candidate and bona fide L dwarfs, including a new young L2 dwarf and a peculiar potentially metal-poor L2 dwarf with unusually blue near-IR colors. These discoveries underscore the utility of simultaneous database cross-correlation in searching for rare objects. Our cross-match completes the census of T dwarfs within the joint SDSS and 2MASS flux limits to the ≈97% level. Hence, we are able to accurately infer the space density of T dwarfs. We employ Monte Carlo tools to simulate the observed population of SDSS DR1 T dwarfs with 2MASS counterparts and find that the space density of T0 - T8 dwarf systems is 0.0070^(+0.0032)_ (-0.0030) pc^-3 (95% confidence interval), i.e., about one per 140 pc^3. Compared to predictions for the T dwarf space density that depend on various assumptions for the substellar mass function, this result is most consistent with models that assume a flat substellar mass function dN/dM ∝ M^0.0. No >T8 dwarfs were discovered in the present cross-match, although less than one was expected in the limited area (2099 deg^2) of SDSS DR1.

Proceedings of the 15th ACM Mardi Gras conference on From lightweight mash-ups to lambda grids: Understanding the spectrum of distributed computing requirements, applications, tools, infrastructures, interoperability, and the incremental adoption of key capabilities | 2008

Workflow task clustering for best effort systems with Pegasus

Gurmeet Singh; Mei-Hui Su; Karan Vahi; Ewa Deelman; G. Bruce Berriman; John C. Good; Daniel S. Katz; Gaurang Mehta

Many scientific workflows are composed of fine computational granularity tasks, yet they are composed of thousands of them and are data intensive in nature, thus requiring resources such as the TeraGrid to execute efficiently. In order to improve the performance of such applications, we often employ task clustering techniques to increase the computational granularity of workflow tasks. The goal is to minimize the completion time of the workflow by reducing the impact of queue wait times. In this paper, we examine the performance impact of the clustering techniques using the Pegasus workflow management system. Experiments performed using an astronomy workflow on the NCSA TeraGrid cluster show that clustering can achieve a significant reduction in the workflow completion time (up to 97%).

grid computing | 2012

An Evaluation of the Cost and Performance of Scientific Workflows on Amazon EC2

Gideon Juve; Ewa Deelman; G. Bruce Berriman; Benjamin P. Berman; Philip J. Maechling

Workflows are used to orchestrate data-intensive applications in many different scientific domains. Workflow applications typically communicate data between processing steps using intermediate files. When tasks are distributed, these files are either transferred from one computational node to another, or accessed through a shared storage system. As a result, the efficient management of data is a key factor in achieving good performance for workflow applications in distributed environments. In this paper we investigate some of the ways in which data can be managed for workflows in the cloud. We ran experiments using three typical workflow applications on Amazon’s EC2 cloud computing platform. We discuss the various storage and file systems we used, describe the issues and problems we encountered deploying them on EC2, and analyze the resulting performance and cost of the workflows.

acm symposium on applied computing | 2005

The Pegasus portal: web based grid computing

Gurmeet Singh; Ewa Deelman; Gaurang Mehta; Karan Vahi; Mei Hui Su; G. Bruce Berriman; John C. Good; Joseph C. Jacob; Daniel S. Katz; Albert Lazzarini; K. Blackburn; S. Koranda

Pegasus is a planning framework for mapping abstract workflows for execution on the Grid. This paper presents the implementation of a web-based portal for submitting workflows to the Grid using Pegasus. The portal also includes components for generating abstract workflows based on a metadata description of the desired data products and application-specific services. We describe our experiences in using this portal for two Grid applications. A major contribution of our work is in introducing several components that can be useful for Grid portals and hence should be included in Grid portal development toolkits.

Explore More