Roger S. Barga | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Roger S. Barga is active.

Explore More

Publication

Featured researches published by Roger S. Barga.

high performance distributed computing | 2010

AzureBlast: a case study of developing science applications on the cloud

Wei Lu; Jared Jackson; Roger S. Barga

Cloud computing has emerged as a new approach to large scale computing and is attracting a lot of attention from the scientific and research computing communities. Despite its growing popularity, it is still unclear just how well the cloud model of computation will serve scientific applications. In this paper we analyze the applicability of cloud to the sciences by investigating an implementation of a well known and computationally intensive algorithm called BLAST. BLAST is a very popular life sciences algorithm used commonly in bioinformatics research. The BLAST algorithm makes an excellent case study because it is both crucial to many life science applications and its characteristics are representative of many applications important to data intensive scientific research. In our paper we introduce a methodology that we use to study the applicability of cloud platforms to scientific computing and analyze the results from our study. In particular we examine the best practices of handling the large scale parallelism and large volumes of data. While we carry out our performance evaluation on Microsofts Windows Azure the results readily generalize to other cloud platforms.

extending database technology | 2000

Persistent Client-Server Database Sessions

Roger S. Barga; David B. Lomet; Thomas Baby; Sanjay Agrawal

Database systems support recovery, providing high database availability. However, database applications may lose work because of a server failure. In particular, if a database server crashes, volatile server state associated with a client applications session is lost and applications may require operator-assisted restart. This prevents masking server failures and degrades application availability. In this paper, we show how to provide persistent database sessions to client applications across server failures, without the application itself needing to take measures for its recoverability. This offers improved application availability and reduces the application programming task of coping with system errors. Our approach is based on (i) capturing client applications interactions with the database server and (ii) materializing database session state as persistent database tables that are logged on the database server. We exploit a virtual database session. Our procedures detect database server failure and re-map the virtual session to a new session into which we install the saved old session state once the server has recovered. This integrates database server recovery and transparent session recovery. The result is persistent client-server database sessions that survive a server crash without the client application being aware of the outage, except for possible timing considerations. We demonstrate the viability of this approach by describing the design and implementation of Phoenix/ODBC, a prototype system that provides persistent ODBC database sessions; and present early results from a performance evaluation on the costs to persist and recover ODBC database sessions.

ieee international conference on escience | 2008

The Trident Scientific Workflow Workbench

Roger S. Barga; Jared Jackson; Nelson Araujo; Dean Guo; Nitin Gautam; Yogesh Simmhan

In our demonstration we present Trident, a scientific workflow workbench built on top of a commercial workflow system to leverage existing functionality to the extent possible. Trident is being developed in collaboration with the scientific computing community for use in a number of ongoing eScience projects that make use of scientific workflows, in particular the Pan-STARRS sky survey project and the Ocean Observatory Initiative. In our demonstration of Trident we will illustrate the ability to utilize both local and cloud resources for storage and execution, as well as services such as provenance, monitoring, logging and scheduling workflows over clusters. Our goal is to release Trident in early 2009 as an open source accelerator for others to use for eScience projects and to continue extending with support for new workflow features and services.

Archive | 2007

Scientific versus Business Workflows

Roger S. Barga; Dennis Gannon

The formal concept of a workflow has existed in the business world for a long time. An entire industry of tools and technology devoted to workflow management has been developed and marketed to meet the needs of commercial enterprises. The Workflow Management Coalition (WfMC) has existed for over ten years and has developed a large set of reference models, documents, and standards. Why has the scientific community not adopted these existing standards? While it is not uncommon for the scientific community to reinvent technology rather than purchase existing solutions, there are issues involved in the technical applications that are unique to science, and we will attempt to characterize some of these here. There are, however, many core concepts that have been developed in the business workflow community that directly relate to science, and we will outline them below.

international conference on data engineering | 2002

Recovery guarantees for general multi-tier applications

Roger S. Barga; David B. Lomet; Gerhard Weikum

Database recovery does not mask failures to applications and users. Recovery is needed that considers data, messages and application components. Special cases have been studied, but clear principles for recovery guarantees in general multi-tier applications such as Web-based e-services are missing. We develop a framework for recovery guarantees that masks almost all failures. The main concept is an interaction contract between two components, a pledge as to message and state persistence, and contract release. Contracts are composed into system-wide agreements so that a set of components is provably recoverable with exactly-once message delivery and execution, except perhaps for crash-interrupted user input or output. Our implementation techniques reduce the data logging cost, allow effective log truncation, and provide independent recovery for critical server components. Interaction contracts form the basis for our Phoenix/COM project on persistent components. Our frameworks utility is demonstrated with a case study of a web-based e-service.

2009 Third International Conference on Advanced Engineering Computing and Applications in Sciences | 2009

Building the Trident Scientific Workflow Workbench for Data Management in the Cloud

Yogesh Simmhan; Roger S. Barga; Catharine van Ingen; Edward D. Lazowska; Alexander S. Szalay

Scientific workflows have gained popularity for modeling and executing in silico experiments by scientists for problem-solving. These workflows primarily engage in computation and data transformation tasks to perform scientific analysis in the Science Cloud. Increasingly workflows are gaining use in managing the scientific data when they arrive from external sensors and are prepared for becoming science ready and available for use in the Cloud. While not directly part of the scientific analysis, these workflows operating behind the Cloud on behalf of the -data valets¿ play an important role in end-to-end management of scientific data products. They share several features with traditional scientific workflows: both are data intensive and use Cloud resources. However, they also differ in significant respects, for example, in the reliability required, scheduling constraints and the use of provenance collected. In this article, we investigate these two classes of workflows – Science Application workflows and Data Preparation workflows – and use these to drive common and distinct requirements from workflow systems for eScience in the Cloud. We use workflow examples from two collaborations, the NEPTUNE oceanography project and the Pan-STARRS astronomy project, to draw out our comparison. Our analysis of these workflows classes can guide the evolution of workflow systems to support emerging applications in the Cloud and the Trident Scientific Workbench is one such workflow system that has directly benefitted from this to meet the needs of these two eScience projects

international conference on e-science | 2009

DryadLINQ for Scientific Analyses

Jaliya Ekanayake; Thilina Gunarathne; Geoffrey C. Fox; Atilla Soner Balkir; Christophe Poulain; Nelson Araujo; Roger S. Barga

Applying high level parallel runtimes to data/compute intensive applications is becoming increasingly common. The simplicity of the MapReduce programming model and the availability of open source MapReduce runtimes such as Hadoop, are attracting more users to the MapReduce programming model. Recently, Microsoft has released DryadLINQ for academic use, allowing users to experience a new programming model and a runtime that is capable of performing large scale data/compute intensive analyses. In this paper, we present our experience in applying DryadLINQ for a series of scientific data analysis applications, identify their mapping to the DryadLINQ programming model, and compare their performances with Hadoop implementations of the same applications.

ieee international conference on cloud computing technology and science | 2010

Performing Large Science Experiments on Azure: Pitfalls and Solutions

Wei Lu; Jared Jackson; Jaliya Ekanayake; Roger S. Barga; Nelson Araujo

Carrying out science at extreme scale is the next generational challenge facing the broad field of scientific research. Cloud computing offers to potential for an increasing number of researchers to have ready access to the large scale compute resources required to tackle new challenges in their field. Unfortunately barriers of complexity remain for researchers untrained in cloud programming. In this paper we examine how cloud based architectures can be used to solve large scale research experiments in a manner that is easily accessible for researchers with limited programming experience, using their existing computational tools. We examine the top challenges identified in our own large-scale science experiments running on the Windows Azure platform and then describe a Cloud-based parameter sweep prototype (dubbed Cirrus) which provides a framework of solutions for each challenge.

international conference on data engineering | 2007

Categorization and Optimization of Synchronization Dependencies in Business Processes

Qinyi Wu; Calton Pu; Akhil Sahai; Roger S. Barga

The current approach for modeling synchronization in business processes relies on sequencing constructs, such as sequence, parallel etc. However, sequencing constructs obfuscate the true source of dependencies in a business process. Moreover, because of the nested structure and scattered code that results from using sequencing constructs, it is hard to add or delete additional constraints without over-specifying necessary constraints or invalidating existing ones. We propose a dataflow programming approach in which dependencies are explicitly modeled to guide activity scheduling. We first give a systematic categorization of dependencies: data, control, service and cooperation. Each dimension models dependency from its own point of view. Then we show that dependencies of various kinds can be first merged and then optimized to generate a minimal dependency set, which guarantees high concurrency and minimal maintenance cost for process execution.

ieee congress on services | 2008

Trident: Scientific Workflow Workbench for Oceanography

Roger S. Barga; Jared Jackson; Nelson Araujo; Dean Guo; Nitin Gautam; Keith Grochow; Edward D. Lazowska

We introduce Trident, a scientific workflow workbench that is built on top of a commercial workflow system to leverage existing functionality. Trident is being developed in collaboration with the scientific community for oceanography, but the workbench itself can be used for any science project for scientific workflow.

Explore More