Ann L. Chervenak | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Ann L. Chervenak is active.

Explore More

Publication

Featured researches published by Ann L. Chervenak.

Journal of Network and Computer Applications | 2000

The data grid: Towards an architecture for the distributed management and analysis of large scientific datasets

Ann L. Chervenak; Ian T. Foster; Carl Kesselman; Charles Salisbury; Steven Tuecke

In an increasing number of scientific disciplines, large data collections are emerging as important community resources. In this paper, we introduce design principles for a data management architecture called the data grid. We describe two basic services that we believe are fundamental to the design of a data grid, namely, storage systems and metadata management. Next, we explain how these services can be used to develop higher-level services for replica management and replica selection. We conclude by describing our initial implementation of data grid functionality.

parallel computing | 2002

Data management and transfer in high-performance computational grid environments

Bill Allcock; Joe Bester; John Bresnahan; Ann L. Chervenak; Ian T. Foster; Carl Kesselman; Sam Meder; Veronika Nefedova; Steven Tuecke

Abstract An emerging class of data-intensive applications involve the geographically dispersed extraction of complex scientific information from very large collections of measured or computed data. Such applications arise, for example, in experimental physics, where the data in question is generated by accelerators, and in simulation science, where the data is generated by supercomputers. So-called Data Grids provide essential infrastructure for such applications, much as the Internet provides essential services for applications such as e-mail and the Web. We describe here two services that we believe are fundamental to any Data Grid: reliable, high-speed transport and replica management. Our high-speed transport service, GridFTP, extends the popular FTP protocol with new features required for Data Grid applications, such as striping and partial file access. Our replica management service integrates a replica catalog with GridFTP transfers to provide for the creation, registration, location, and management of dataset replicas. We present the design of both services and also preliminary performance results. Our implementations exploit security and other services provided by the Globus Toolkit.

conference on high performance computing (supercomputing) | 2002

Giggle: A Framework for Constructing Scalable Replica Location Services

Ann L. Chervenak; Ewa Deelman; Ian T. Foster; Leanne Guy; Wolfgang Hoschek; Adriana Iamnitchi; Carl Kesselman; Peter Z. Kunszt; Matei Ripeanu; Bob Schwartzkopf; Heinz Stockinger; Kurt Stockinger; Brian Tierney

In wide area computing systems, it is often desirable to create remote read-only copies (replicas) of files. Replication can be used to reduce access latency, improve data locality, and/or increase robustness, scalability and performance for distributed applications. We define a replica location service (RLS) as a system that maintains and provides access to information about the physical locations of copies. An RLS typically functions as one component of a data grid architecture. This paper makes the following contributions. First, we characterize RLS requirements. Next, we describe a parameterized architectural framework, which we name Giggle (for GIGa-scale Global Location Engine), within which a wide range of RLSs can be defined. We define several concrete instantiations of this framework with different performance characteristics. Finally, we present initial performance results for an RLS prototype, demonstrating that RLS systems can be constructed that meet performance goals.

ieee conference on mass storage systems and technologies | 2001

Secure, Efficient Data Transport and Replica Management for High-Performance Data-Intensive Computing

Bill Allcock; Joseph Bester; John Bresnahan; Ann L. Chervenak; Carl Kesselman; Sam Meder; Veronika Nefedova; Steven Tuecke; Ian T. Foster

An emerging class of data-intensive applications involve the geographically dispersed extraction of complex scientific information from very large collections of measured or computed data. Such applications arise, for example, in experimental physics, where the data in question is generated by accelerators, and in simulation science, where the data is generated by supercomputers. So-called Data Grids provide essential infrastructure for such applications, much as the Internet provides essential services for applications such as e-mail and the Web. We describe here two services that we believe are fundamental to any Data Grid: reliable, high-speed transport and replica management. Our high-speed transport service, GridFTP, extends the popular FTP protocol with new features required for Data Grid applications, such as striping and partial file access. Our replica management service integrates a replica catalog with GridFTP transfers to provide for the creation, registration, location, and management of dataset replicas. We present the design of both services and also preliminary performance results. Our implementations exploit security and other services provided by the Globus Toolkit.

workflows in support of large-scale science | 2008

Characterization of scientific workflows

Shishir Bharathi; Ann L. Chervenak; Ewa Deelman; Gaurang Mehta; Mei-Hui Su; Karan Vahi

Researchers working on the planning, scheduling and execution of scientific workflows need access to a wide variety of scientific workflows to evaluate the performance of their implementations. We describe basic workflow structures that are composed into complex workflows by scientific communities. We provide a characterization of workflows from five diverse scientific applications, describing their composition and data and computational requirements. We also describe the effect of the size of the input datasets on the structure and execution profiles of these workflows. Finally, we describe a workflow generator that produces synthetic, parameterizable workflows that closely resemble the workflows that we characterize. We make these workflows available to the community to be used as benchmarks for evaluating various workflow systems and scheduling algorithms.

conference on high performance computing (supercomputing) | 2003

A Metadata Catalog Service for Data Intensive Applications

Gurmeet Singh; Shishir Bharathi; Ann L. Chervenak; Ewa Deelman; Carl Kesselman; Mary Manohar; Sonal Patil; Laura Pearlman

Advances in computational, storage and network technologies as well as middle ware such as the Globus Toolkit allow scientists to expand the sophistication and scope of data-intensive applications. These applications produce and analyze terabytes and petabytes of data that are distributed in millions of files or objects. To manage these large data sets efficiently, metadata or descriptive information about the data needs to be managed. There are various types of metadata, and it is likely that a range of metadata services will exist in Grid environments that are specialized for particular types of metadata cataloguing and discovery. In this paper, we present the design of a Metadata Catalog Service (MCS) that provides a mechanism for storing and accessing descriptive metadata and allows users to query for data items based on desired attributes. We describe our experience in using the MCS with several applications and present a scalability study of the service.

Future Generation Computer Systems | 2013

Characterizing and profiling scientific workflows

Gideon Juve; Ann L. Chervenak; Ewa Deelman; Shishir Bharathi; Gaurang Mehta; Karan Vahi

Researchers working on the planning, scheduling, and execution of scientific workflows need access to a wide variety of scientific workflows to evaluate the performance of their implementations. This paper provides a characterization of workflows from six diverse scientific applications, including astronomy, bioinformatics, earthquake science, and gravitational-wave physics. The characterization is based on novel workflow profiling tools that provide detailed information about the various computational tasks that are present in the workflow. This information includes I/O, memory and computational characteristics. Although the workflows are diverse, there is evidence that each workflow has a job type that consumes the most amount of runtime. The study also uncovered inefficiency in a workflow component implementation, where the component was re-reading the same data multiple times.

arXiv: Computational Engineering, Finance, and Science | 2005

The Earth System Grid: Supporting the Next Generation of Climate Modeling Research

David E. Bernholdt; Shishir Bharathi; David Brown; Kasidit Chanchio; Meili Chen; Ann L. Chervenak; Luca Cinquini; Bob Drach; Ian T. Foster; Peter Fox; José I. García; Carl Kesselman; Rob S. Markel; Don Middleton; Veronika Nefedova; Line C. Pouchard; Arie Shoshani; Alex Sim; Gary Strand; Dean N. Williams

Understanding the Earths climate system and how it might be changing is a preeminent scientific challenge. Global climate models are used to simulate past, present, and future climates, and experiments are executed continuously on an array of distributed supercomputers. The resulting data archive, spread over several sites, currently contains upwards of 100 TB of simulation data and is growing rapidly. Looking toward mid-decade and beyond, we must anticipate and prepare for distributed climate research data holdings of many petabytes. The Earth System Grid (ESG) is a collaborative interdisciplinary project aimed at addressing the challenge of enabling management, discovery, access, and analysis of these critically important datasets in a distributed and heterogeneous computational environment. The problem is fundamentally a Grid problem. Building upon the Globus toolkit and a variety of other technologies, ESG is developing an environment that addresses authentication, authorization for data access, large-scale data transport and management, services and abstractions for high-performance remote data access, mechanisms for scalable data replication, cataloging with rich semantic and syntactic information, data discovery, distributed monitoring, and Web-based portals for using the system.

high performance distributed computing | 2004

Performance and scalability of a replica location service

Ann L. Chervenak; Naveen Palavalli; Shishir Bharathi; Carl Kesselman; Robert Schwartzkopf

We describe the implementation and evaluate the performance of a replica location service that is part of the Globus Toolkit Version 3.0. A replica location service (RLS) provides a mechanism for registering the existence of replicas and discovering them. Features of our implementation include the use of soft state update protocols to populate a distributed index and optional Bloom filter compression to reduce the size of these updates. Our results demonstrate that RLS performance scales well for individual servers with millions of entries and up to 100 requesting threads. We also show that the distributed RLS index scales well when using Bloom filter compression for wide area updates.

cluster computing and the grid | 2008

Data Management Challenges of Data-Intensive Scientific Workflows

Ewa Deelman; Ann L. Chervenak

Scientific workflows play an important role in todays science. Many disciplines rely on workflow technologies to orchestrate the execution of thousands of computational tasks. Much research to-date focuses on efficient, scalable, and robust workflow execution, especially in distributed environments. However, many challenges remain in the area of data management related to workflow creation, execution, and result management. In this paper we examine some of these issues in the context of the entire workflow lifecycle.

Explore More