Is this you? Create Your Porfile

Mei-Hui Su

University of Southern California

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Mei-Hui Su is active.

Explore More

Publication

Featured researches published by Mei-Hui Su.

Scientific Programming | 2005

Pegasus: A framework for mapping complex scientific workflows onto distributed systems

Ewa Deelman; Gurmeet Singh; Mei-Hui Su; Jim Blythe; Yolanda Gil; Carl Kesselman; Gaurang Mehta; Karan Vahi; G. Bruce Berriman; John C. Good; Anastasia C. Laity; Joseph C. Jacob; Daniel S. Katz

This paper describes the Pegasus framework that can be used to map complex scientific workflows onto distributed resources. Pegasus enables users to represent the workflows at an abstract level without needing to worry about the particulars of the target execution systems. The paper describes general issues in mapping applications and the functionality of Pegasus. We present the results of improving application performance through workflow restructuring which clusters multiple tasks in a workflow into single entities. A real-life astronomy application is used as the basis for the study.

Lecture Notes in Computer Science | 2004

Pegasus: Mapping Scientific Workflows onto the Grid

Ewa Deelman; Jim Blythe; Yolanda Gil; Carl Kesselman; Gaurang Mehta; Sonal Patil; Mei-Hui Su; Karan Vahi; Miron Livny

In this paper we describe the Pegasus system that can map complex workflows onto the Grid. Pegasus takes an abstract description of a workflow and finds the appropriate data and Grid resources to execute the workflow. Pegasus is being released as part of the GriPhyN Virtual Data Toolkit and has been used in a variety of applications ranging from astronomy, biology, gravitational-wave science, and high-energy physics. A deferred planning mode of Pegasus is also introduced.

workflows in support of large-scale science | 2008

Characterization of scientific workflows

Shishir Bharathi; Ann L. Chervenak; Ewa Deelman; Gaurang Mehta; Mei-Hui Su; Karan Vahi

Researchers working on the planning, scheduling and execution of scientific workflows need access to a wide variety of scientific workflows to evaluate the performance of their implementations. We describe basic workflow structures that are composed into complex workflows by scientific communities. We provide a characterization of workflows from five diverse scientific applications, describing their composition and data and computational requirements. We also describe the effect of the size of the input datasets on the structure and execution profiles of these workflows. Finally, we describe a workflow generator that produces synthetic, parameterizable workflows that closely resemble the workflows that we characterize. We make these workflows available to the community to be used as benchmarks for evaluating various workflow systems and scheduling algorithms.

Review of Scientific Instruments | 2001

A high-throughput x-ray microtomography system at the Advanced Photon Source

Yuxin Wang; Francesco De Carlo; Derrick C. Mancini; Ian McNulty; Brian Tieman; John Bresnahan; Ian T. Foster; Joseph A. Insley; Peter Lane; Gregor von Laszewski; Carl Kesselman; Mei-Hui Su; Marcus Thiebaux

~Received 14 November 2000; accepted for publication 23 January 2001!A third-generation synchrotron radiation source provides enough brilliance to acquire completetomographic data sets at 100 nm or better resolution in a few minutes. To take advantage of suchhigh-brilliance sources at the Advanced Photon Source, we have constructed a pipelined dataacquisition and reconstruction system that combines a fast detector system, high-speed datanetworks, and massively parallel computers to rapidly acquire the projection data and perform thereconstruction and rendering calculations. With the current setup, a data set can be obtained andreconstructed in tens of minutes. A specialized visualization computer makes renderedthree-dimensional~3D! images available to the beamline users minutes after the data acquisition iscompleted. This system is capable of examining a large number of samples at sub-mm 3D resolutionor studying the full 3D structure of a dynamically evolving sample on a 10 min temporal scale. Inthe near future, we expect to increase the spatial resolution to below 100 nm by using zone-platex-ray focusing optics and to improve the time resolution by the use of a broadband x-raymonochromator and a faster detector system.

statistical and scientific database management | 2004

Grid-based metadata services

Ewa Deelman; Gurmeet Singh; Malcolm P. Atkinson; Ann L. Chervenak; N.P. Chue Hong; Carl Kesselman; Sonal Patil; Laurie Anne Pearlman; Mei-Hui Su

Data sets being managed in grid environments today are growing at a rapid rate, expected to reach 100s of petabytes in the near future. Managing such large data sets poses challenges for efficient data access, data publication and data discovery. In this paper we focus on the data publication and discovery process through the use of descriptive metadata. This metadata describe the properties of individual data items and collections. We discuss issues of metadata services in service rich environments, such as the grid. We describe the requirements and the architecture for such services in the context of grid and the available grid services. We present a data model that can capture the complexity of the data publication and discovery process. Based on that model we identify a set of interfaces and operations that need to be provided to support metadata management. We present a particular implementation of a grid metadata service, basing it on existing grid services technologies. Finally we examine alternative implementations of that service.

Bulletin of the American Meteorological Society | 2009

The Earth System Grid: Enabling Access to Multimodel Climate Simulation Data

Dean N. Williams; Rachana Ananthakrishnan; David E. Bernholdt; S. Bharathi; D. Brown; M. Chen; A. L. Chervenak; L. Cinquini; R. Drach; I. T. Foster; P. Fox; Dan Fraser; J. A. Garcia; S. Hankin; P. Jones; D. E. Middleton; J. Schwidder; R. Schweitzer; Robert Schuler; A. Shoshani; F. Siebenlist; A. Sim; Warren G. Strand; Mei-Hui Su; N. Wilhelmi

By leveraging current technologies to manage distributed climate data in a unified virtual environment, the Earth System Grid (ESG) project is promoting data sharing between international research centers and diverse users. In transforming these data into a collaborative community resource, ESG is changing the way global climate research is conducted. Since ESGs production beginnings in 2004, its most notable accomplishment was to efficiently store and distribute climate simulation data of some 20 global coupled ocean-atmosphere models to the scores of scientific contributors to the Fourth Assessment Report (AR4) of the Intergovernmental Panel on Climate Change (IPCC); the IPCC collective scientific achievement was recognized by the award of a 2007 Nobel Peace Prize. Other international climate stakeholders such as the North American Regional Climate Change Assessment Program (NARCCAP) and the developers of the Community Climate System Model (CCSM) and of the Climate Science Computational End Station (CC...

grid computing | 2007

Data placement for scientific applications in distributed environments

Ann L. Chervenak; Ewa Deelman; Miron Livny; Mei-Hui Su; Robert Schuler; Shishir Bharathi; Gaurang Mehta; Karan Vahi

Scientific applications often perform complex computational analyses that consume and produce large data sets. We are concerned with data placement policies that distribute data in ways that are advantageous for application execution, for example, by placing data sets so that they may be staged into or out of computations efficiently or by replicating them for improved performance and reliability. In particular, we propose to study the relationship between data placement services and workflow management systems. In this paper, we explore the interactions between two services used in large-scale science today. We evaluate the benefits of prestaging data using the Data Replication Service versus using the native data stage-in mechanisms of the Pegasus workflow management system. We use the astronomy application, Montage, for our experiments and modify it to study the effect of input data size on the benefits of data prestaging. As the size of input data sets increases, prestaging using a data placement service can significantly improve the performance of the overall analysis.

Archive | 2007

Pegasus: Mapping Large-Scale Workflows to Distributed Resources

Ewa Deelman; Gaurang Mehta; Gurmeet Singh; Mei-Hui Su; Karan Vahi

Many scientific advances today are derived from analyzing large amounts of data. The computations themselves can be very complex and consume significant resources. Scientific efforts are also not conducted by individual scientists; rather, they rely on collaborations that encompass many researchers from various organizations. The analysis is often composed of several individual application components designed by different scientists. To describe the desired analysis, the components are assembled in a workflow where the dependencies between them are defined and the data needed for the analysis are identified. To support the scale of the applications, many resources are needed in order to provide adequate performance. These resources are often drawn from a heterogeneous pool of geographically distributed compute and data resources. Running large-scale, collaborative applications in such environments has many challenges. Among them are systematic management of the applications, their components, and the data, as well as successful and efficient execution on the distributed resources.

Proceedings of the 15th ACM Mardi Gras conference on From lightweight mash-ups to lambda grids: Understanding the spectrum of distributed computing requirements, applications, tools, infrastructures, interoperability, and the incremental adoption of key capabilities | 2008

Workflow task clustering for best effort systems with Pegasus

Gurmeet Singh; Mei-Hui Su; Karan Vahi; Ewa Deelman; G. Bruce Berriman; John C. Good; Daniel S. Katz; Gaurang Mehta

Many scientific workflows are composed of fine computational granularity tasks, yet they are composed of thousands of them and are data intensive in nature, thus requiring resources such as the TeraGrid to execute efficiently. In order to improve the performance of such applications, we often employ task clustering techniques to increase the computational granularity of workflow tasks. The goal is to minimize the completion time of the workflow by reducing the impact of queue wait times. In this paper, we examine the performance impact of the clustering techniques using the Pegasus workflow management system. Experiments performed using an astronomy workflow on the NCSA TeraGrid cluster show that clustering can achieve a significant reduction in the workflow completion time (up to 97%).

international conference on parallel processing | 2005

A comparison of two methods for building astronomical image mosaics on a grid

Daniel S. Katz; Joseph C. Jacob; Ewa Deelman; Carl Kesselman; Gurmeet Singh; Mei-Hui Su; G.B. Berriman; John C. Good; Anastasia C. Laity; Thomas A. Prince

This paper compares two methods for running an application composed of a set of modules on a grid. The set of modules (collectively called Montage) generates large astronomical image mosaics by composing multiple small images. The workflow that describes a particular run of Montage can be expressed as a directed acyclic graph (DAG), or as a short sequence of parallel (MPI) and sequential programs. In the first case, Pegasus can be used to run the workflow. In the second case, a short shell script that calls each program can be run. In this paper, we discuss the Montage modules, the workflow run for a sample job, and the two methods of actually running the workflow. We examine the run time for each method and compare the portions that differ between the two methods.

Explore More