Is this you? Create Your Porfile

Matthew Wolf

Georgia Institute of Technology

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Matthew Wolf is active.

Explore More

Publication

Featured researches published by Matthew Wolf.

Concurrency and Computation: Practice and Experience | 2014

Hello ADIOS: the challenges and lessons of developing leadership class I/O frameworks

Qing Liu; Jeremy Logan; Yuan Tian; Hasan Abbasi; Norbert Podhorszki; Jong Youl Choi; Scott Klasky; Roselyne Tchoua; Jay F. Lofstead; Ron A. Oldfield; Manish Parashar; Nagiza F. Samatova; Karsten Schwan; Arie Shoshani; Matthew Wolf; Kesheng Wu; Weikuan Yu

Applications running on leadership platforms are more and more bottlenecked by storage input/output (I/O). In an effort to combat the increasing disparity between I/O throughput and compute capability, we created Adaptable IO System (ADIOS) in 2005. Focusing on putting users first with a service oriented architecture, we combined cutting edge research into new I/O techniques with a design effort to create near optimal I/O methods. As a result, ADIOS provides the highest level of synchronous I/O performance for a number of mission critical applications at various Department of Energy Leadership Computing Facilities. Meanwhile ADIOS is leading the push for next generation techniques including staging and data processing pipelines. In this paper, we describe the startling observations we have made in the last half decade of I/O research and development, and elaborate the lessons we have learned along this journey. We also detail some of the challenges that remain as we look toward the coming Exascale era. Copyright

international conference on autonomic computing | 2011

A flexible architecture integrating monitoring and analytics for managing large-scale data centers

Chengwei Wang; Karsten Schwan; Vanish Talwar; Greg Eisenhauer; Liting Hu; Matthew Wolf

To effectively manage large-scale data centers and utility clouds, operators must understand current system and application behaviors. This requires continuous, real-time monitoring along with on-line analysis of the data captured by the monitoring system, i.e., integrated monitoring and analytics -- Monalytics [28]. A key challenge with such integration is to balance the costs incurred and associated delays, against the benefits attained from identifying and reacting to, in a timely fashion, undesirable or non-performing system states. This paper presents a novel, flexible architecture for Monalytics in which such trade-offs are easily made by dynamically constructing software overlays called Distributed Computation Graphs (DCGs) to implement desired analytics functions. The prototype of Monalytics implementing this flexible architecture is evaluated with motivating use cases in small scale data center experiments, and a series of analytical models is used to understand the above trade-offs at large scales. Results show that the approach provides the flexibility needed to meet the demands of autonomic management at large scale with considerably better performance/cost than traditional and brute force solutions.

conference on high performance computing (supercomputing) | 2002

SmartPointers: Personalized Scientific Data Portals In Your Hand

Matthew Wolf; Zhongtang Cai; Weiyun Huang; Karsten Schwan

The SmartPointer system provides a paradigm for utilizing multiple light-weight client endpoints in a real-time scientific visualization infrastructure. Together, the client and server infrastructure form a new type of data portal for scientific computing. The clients can be used to personalize data for the needs of the individual scientist. This personalization of a shared dataset is designed to allow multiple scientists, each with their laptops or iPaqs to explore the dataset from different angles and with different personalized filters. As an example, iPaq clients can display 2D derived data functions which can be used to dynamically update and annotate the shared data space, which might be visualized separately on a large immersive display such as a CAVE. Measurements are presented for such a system, built upon the ECho middleware system developed at Georgia Tech.

international parallel and distributed processing symposium | 2013

FlexIO: I/O Middleware for Location-Flexible Scientific Data Analytics

Fang Zheng; Hongbo Zou; Greg Eisenhauer; Karsten Schwan; Matthew Wolf; Jai Dayal; Tuan-Anh Nguyen; Jianting Cao; Hasan Abbasi; Scott Klasky; Norbert Podhorszki; Hongfeng Yu

Increasingly severe I/O bottlenecks on High-End Computing machines are prompting scientists to process simulation output data online while simulations are running and before storing data on disk. There are several options to place data analytics along the I/O path: on compute nodes, on separate nodes dedicated to analytics, or after data is stored on persistent storage. Since different placements have different impact on performance and cost, there is a consequent need for flexibility in the location of data analytics. The FlexIO middleware described in this paper makes it easy for scientists to obtain such flexibility, by offering simple abstractions and diverse data movement methods to couple simulation with analytics. Various placement policies can be built on top of FlexIO to exploit the trade-offs in performing analytics at different levels of the I/O hierarchy. Experimental results demonstrate that FlexIO can support a variety of simulation and analytics workloads at large scale through flexible placement options, efficient data movement, and dynamic deployment of data manipulation functionalities.

ieee international conference on high performance computing data and analytics | 2013

GoldRush: resource efficient in situ scientific data analytics using fine-grained interference aware execution

Fang Zheng; Hongfeng Yu; Can Hantaş; Matthew Wolf; Greg Eisenhauer; Karsten Schwan; Hasan Abbasi; Scott Klasky

Severe I/O bottlenecks on High End Computing platforms call for running data analytics in situ. Demonstrating that there exist considerable resources in compute nodes un-used by typical high end scientific simulations, we leverage this fact by creating an agile runtime, termed GoldRush, that can harvest those otherwise wasted, idle resources to efficiently run in situ data analytics. GoldRush uses fine-grained scheduling to “steal” idle resources, in ways that minimize interference between the simulation and in situ analytics. This involves recognizing the potential causes of on-node resource contention and then using scheduling methods that prevent them. Experiments with representative science applications at large scales show that resources harvested on compute nodes can be leveraged to perform useful analytics, significantly improving resource efficiency, reducing data movement costs incurred by alternate solutions, and posing negligible impact on scientific simulations.

petascale data storage workshop | 2009

...and eat it too: high read performance in write-optimized HPC I/O middleware file formats

Milo Polte; Jay F. Lofstead; John M. Bent; Garth A. Gibson; Scott Klasky; Qing Liu; Manish Parashar; Norbert Podhorszki; Karsten Schwan; Meghan Wingate; Matthew Wolf

As HPC applications run on increasingly high process counts on larger and larger machines, both the frequency of checkpoints needed for fault tolerance [14] and the resolution and size of Data Analysis Dumps are expected to increase proportionally. In order to maintain an acceptable ratio of time spent performing useful computation work to time spent performing I/O, write bandwidth to the underlying storage system must increase proportionally to this increase in the checkpoint and computation size. Unfortunately, popular scientific self-describing file formats such as netCDF [8] and HDF5 [3] are designed with a focus on portability and flexibility. Extra care and careful crafting of the output structure and API calls is required to optimize for write performance using these APIs. To provide sufficient write bandwidth to continue to support the demands of scientific applications, the HPC community has developed a number of I/O middleware layers, that structure output into write-optimized file formats. However, the obvious concern with any write optimized file format would be a corresponding penalty on reads. In the log-structured filesystem [13], for example, a file generated by random writes could be written efficiently, but reading the file back sequentially later would result in very poor performance. Simulation results require efficient read-back for visualization and analytics, and though most checkpoint files are never used, the efficiency of a restart is very important in the face of inevitable failures. The utility of write speed improving middleware would be greatly diminished if it sacrificed acceptable read performance. In this paper we examine the read performance of two write-optimized middleware layers on large parallel machines and compare it to reading data natively in popular file formats.

high performance distributed computing | 2003

Resource-aware stream management with the customizable dproc distributed monitoring mechanisms

Sandip Agarwala; Christian Poellabauer; Jiantao Kong; Karsten Schwan; Matthew Wolf

Monitoring the resources of distributed systems is essential to the successful deployment and execution of grid applications, particularly when such applications have well-defined QoS requirements. The dproc system-level monitoring mechanisms implemented for standard Linux kernels have several key components. First, utilizing the familiar /proc filesystem, dproc extends this interface with resource information collected from both local and remote hosts. Second, to predictably capture and distribute monitoring information, dproc uses a kernel-level group communication facility, termed KECho, which is based on events and event channels. Third and the focus of this paper is dprocs run-time customizability for resource monitoring, which includes the generation and deployment of monitoring functionality within remote operating system kernels. Using dproc, we show that: (a) data streams can be customized according to a clients resource availabilities (dynamic stream management); (b) by dynamically varying distributed monitoring (dynamic filtering of monitoring information), appropriate balance can be maintained between monitoring overheads and application quality; and (c) by performing monitoring at kernel-level, the information captured enables decision making that takes into account the multiple resources used by applications.

cluster computing and the grid | 2014

Flexpath: Type-Based Publish/Subscribe System for Large-Scale Science Analytics

Jai Dayal; Drew Bratcher; Greg Eisenhauer; Karsten Schwan; Matthew Wolf; Xuechen Zhang; Hasan Abbasi; Scott Klasky; Norbert Podhorszki

As high-end systems move toward exascale sizes, a new model of scientific inquiry being developed is one in which online data analytics run concurrently with the high end simulations producing data outputs. Goals are to gain rapid insights into the ongoing scientific processes, assess their scientific validity, and/or initiate corrective or supplementary actions by launching additional computations when needed. The Flexpath system presented in this paper addresses the fundamental problem of how to structure and efficiently implement the communications between high end simulations and concurrently running online data analytics, the latter comprised of componentized dynamic services and service pipelines. Using a type-based publish/subscribe approach, Flexpath encourages diversity by permitting analytics services to differ in their computational and scaling characteristics and even in their internal execution models. Flex path uses direct and MxN connections between interacting services to reduce data movements, to allow for runtime connectivity changes to accommodate component arrivals/departures, and to support the multiple underlying communication protocols used for analytics workflows in which simulation outputs are processed by analytics services residing on the same nodes where they are generated, on the same machine, and/or on attached or remote analytics engines. This paper describes the design and implementation of Flexpath, and evaluates it with two widely used scientific applications and their associated data analytics methods.

international conference on cluster computing | 2007

LIVE data workspace: A flexible, dynamic and extensible platform for petascale applications

Hasan Abbasi; Matthew Wolf; Karsten Schwan

The data needs of current and future PetaScale applications have increased over the last half decade to the extent that appropriate data management has become a crucial requirement. This concerns not only the storage of data produced by the new class of PetaScale applications, but also the data exchanges needed for coupling applications with concurrent analysis, online data visualization for validation, and others. To address such dynamic code coupling, we introduce the concept of an extensible, dynamic, and flexible data workspace, termed LIVE. In contrast to the data exchanges programmed with MPI, MPI-IO, or grid software, LIVE focuses on data exchanges carried out without a priori knowledge of potential data requirements. Examples include exchanges required by ad-hoc or dynamically determined methods for data validation, for general data analysis tasks, or for data visualization. Run on an execution environment comprised of integrated dynamic discovery and on-line management services, LIVE is used to create a dasiadata workspacepsila for a working molecular dynamics code base utilized by mechanical and materials engineers at Georgia Tech, for multi-scale materials modeling. Measurements of both this applicationpsilas data workspace and of the basic primitives in the LIVE framework demonstrate that the environmentpsilas substantial flexibility has minimal impact on overall performance, and in fact, that it improves performance in a number of usage scenarios. In particular, for a visualization pipeline example derived from our collaborators, we see a slight improvement over a solution based on MPI-IO, and a further improvement of up to 5% by utilizing LIVEpsilas ability to overlap communication with user-specified computation.

international middleware conference | 2012

VScope: middleware for troubleshooting time-sensitive data center applications

Chengwei Wang; Infantdani Abel Rayan; Greg Eisenhauer; Karsten Schwan; Vanish Talwar; Matthew Wolf; Chad Marcus Huneycutt

Data-Intensive infrastructures are increasingly used for on-line processing of live data to guide operations and decision making. VScope is a flexible monitoring and analysis middleware for troubleshooting such large-scale, time-sensitive, multi-tier applications. With VScope, lightweight anomaly detection and interaction tracking methods can be run continuously throughout an applications execution. The runtime events generated by these methods can then initiate more detailed and heavier weight analyses which are dynamically deployed in the places where they may be most likely fruitful for root cause diagnosis and mitigation. We comprehensively evaluate VScope prototype in a virtualized data center environment with over 1000 virtual machines (VMs), and apply VScope to a representative on-line log processing application. Experimental results show that VScope can deploy and operate a variety of on-line analytics functions and metrics with a few seconds at large scale. Compared to traditional logging approaches, VScope based troubleshooting has substantially lower perturbation and generates much smaller log data volumes. It can also resolve complex cross-tier or cross-software-level issues unsolvable solely by application-level or per-tier mechanisms.

Explore More