Douglas Thain | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Douglas Thain is active.

Explore More

Publication

Featured researches published by Douglas Thain.

ieee international conference on cloud computing technology and science | 2010

A Comparison and Critique of Eucalyptus, OpenNebula and Nimbus

Peter Sempolinski; Douglas Thain

Eucalyptus, Open Nebula and Nimbus are three major open-source cloud-computing software platforms. The overall function of these systems is to manage the provisioning of virtual machines for a cloud providing infrastructure-as-a-service. These various open-source projects provide an important alternative for those who do not wish to use a commercially provided cloud. We provide a comparison and analysis of each of these systems. We begin with a short summary comparing the current raw feature set of these projects. After that, we deepen our analysis by describing how these cloud management frameworks relate to the many other software components required to create a functioning cloud computing system. We also analyse the overall structure of each of these projects and address how the differing features and implementations reflect the different goals of each of these projects. Lastly, we discuss some of the common challenges that emerge in setting up any of these frameworks and suggest avenues of further research and development. These include the problem of fair scheduling in absence of money, eviction or preemption, the difficulties of network configuration, and the frequent lack of clean abstractions.

international parallel and distributed processing symposium | 2008

Qthreads: An API for programming with millions of lightweight threads

Kyle Bruce Wheeler; Richard C. Murphy; Douglas Thain

Large scale hardware-supported multithreading, an attractive means of increasing computational power, benefits significantly from low per-thread costs. Hardware support for lightweight threads is a developing area of research. Each architecture with such support provides a unique interface, hindering development for them and comparisons between them. A portable abstraction that provides basic lightweight thread control and synchronization primitives is needed. Such an abstraction would assist in exploring both the architectural needs of large scale threading and the semantic power of existing languages. Managing thread resources is a problem that must be addressed if massive parallelism is to be popularized. The qthread abstraction enables development of large-scale multithreading applications on commodity architectures. This paper introduces the qthread API and its Unix implementation, discusses resource management, and presents performance results from the HPCCG benchmark.

high performance distributed computing | 2001

The Kangaroo approach to data movement on the Grid

Douglas Thain; Jim Basney; Se-Chang Son; Miron Livny

Access to remote data is one of the principal challenges of Grid computing. While performing I/O, Grid applications must be prepared for server crashes, performance variations and exhausted resources. To achieve high throughput in such a hostile environment, applications need a resilient service that moves data while hiding errors and latencies. We illustrate this idea with Kangaroo, a simple data movement system that makes opportunistic use of disks and networks to keep applications running. We demonstrate that Kangaroo can achieve better end-to-end performance than traditional data movement techniques, even though its individual components do not achieve high performance.

international parallel and distributed processing symposium | 2008

All-pairs: An abstraction for data-intensive cloud computing

Christopher Moretti; Jared Bulosan; Douglas Thain; Patrick J. Flynn

Although modern parallel and distributed computing systems provide easy access to large amounts of computing power, it is not always easy for non-expert users to harness these large systems effectively. A large workload composed in what seems to be the obvious way by a naive user may accidentally abuse shared resources and achieve very poor performance. To address this problem, we propose that production systems should provide end users with high-level abstractions that allow for the easy expression and efficient execution of data intensive workloads. We present one example of an abstraction - all-pairs - that fits the needs of several data-intensive scientific applications. We demonstrate that an optimized all-pairs abstraction is both easier to use than the underlying system, and achieves performance orders of magnitude better than the obvious but naive approach, and twice as fast as a hand-optimized conventional approach.

The Grid 2 (2)#R##N#Blueprint for a New Computing Infrastructure | 2004

Chapter 19 – Building Reliable Clients and Services

Douglas Thain; Miron Livny

Publisher Summary This chapter presents a range of principles and techniques that can be used to construct Grid services and clients that execute reliably (from a client perspective) despite various classes of failures. Grid computing is a partnership between clients and servers. Grid clients have more responsibilities than do traditional clients and must be equipped with powerful mechanisms for dealing with and recovering from failures, whether they occur in the context of remote execution, work management, or data output. When clients are powerful, servers must accommodate them by using careful protocols. Many challenges remain in the design and implementation of Grid computing systems. Although todays Grids are accessible to technologists and other users willing to suffer through experimental and incomplete systems, many obstacles must be overcome before using large-scale systems without special knowledge. Grids intended for users of ordinary competence must be designed with as much attention paid to the consequences of failures as the potential benefits of success.

conference on high performance computing (supercomputing) | 2001

Gathering at the Well: Creating Communities for Grid I/O

Douglas Thain; John M. Bent; Andrea C. Arpaci-Dusseau; Remzi H. Arpaci-Dusseau; Miron Livny

Grid applications have demanding I/O needs. Schedulers must bring jobs and data in close proximity in order to satisfy throughput, scalability, and policy requirements. Most systems accomplish this by making either jobs or data mobile. We propose a system that allows jobs and data to meet by binding execution and storage sites together into I/O communities which then participate in the wide-area system. The relationships between participants in a community may be expressed by the ClassAd framework. Extensions to the framework allow community members to express indirect relations. We demonstrate our implementation of I/O communities by improving the performance of a key high-energy physics simulation on an international distributed system.

international conference on management of data | 2012

Makeflow: a portable abstraction for data intensive computing on clusters, clouds, and grids

Michael Albrecht; Patrick Donnelly; Peter Bui; Douglas Thain

In recent years, there has been a renewed interest in languages and systems for large scale distributed computing. Unfortunately, most systems available to the end user use a custom description language tightly coupled to a specific runtime implementation, making it difficult to transfer applications between systems. To address this problem we introduce Makeflow, a simple system for expressing and running a data-intensive workflow across multiple execution engines without requiring changes to the application or workflow description. Makeflow allows any user familiar with basic Unix Make syntax to generate a workflow and run it on one of many supported execution systems. Furthermore, in order to assess the performance characteristics of the various execution engines available to users and assist them in selecting one for use we introduce Workbench, a suite of benchmarks designed for analyzing common workflow patterns. We evaluate Workbench on two physical architectures -- the first a storage cluster with local disks and a slower network and the second a high performance computing cluster with a central parallel filesystem and fast network -- using a variety of execution engines. We conclude by demonstrating three applications that use Makeflow to execute data intensive applications consisting of thousands of jobs.

IEEE Transactions on Parallel and Distributed Systems | 2010

All-Pairs: An Abstraction for Data-Intensive Computing on Campus Grids

Christopher Moretti; Hoang Bui; Karen Hollingsworth; Brandon Rich; Patrick J. Flynn; Douglas Thain

Today, campus grids provide users with easy access to thousands of CPUs. However, it is not always easy for nonexpert users to harness these systems effectively. A large workload composed in what seems to be the obvious way by a naive user may accidentally abuse shared resources and achieve very poor performance. To address this problem, we argue that campus grids should provide end users with high-level abstractions that allow for the easy expression and efficient execution of data-intensive workloads. We present one example of an abstraction-All-Pairs-that fits the needs of several applications in biometrics, bioinformatics, and data mining. We demonstrate that an optimized All-Pairs abstraction is both easier to use than the underlying system, achieve performance orders of magnitude better than the obvious but naive approach, and is both faster and more efficient than a tuned conventional approach. This abstraction has been in production use for one year on a 500 CPU campus grid at the University of Notre Dame and has been used to carry out a groundbreaking analysis of biometric data.

high performance distributed computing | 2003

Pipeline and batch sharing in grid workloads

Douglas Thain; John M. Bent; Andrea C. Arpaci-Dusseau; Remzi H. Arpaci-Dusseau; Miron Livny

We present a study of six batch-pipeline scientific workloads that are candidates for execution on computational grids. Whereas other studies focus on the behavior of single applications, this study characterizes workloads composed of pipelines of sequential processes that use file storage for communication and also share measurements of the memory, CPU, and I/O requirements of individual components as well as analyses of I/O sharing within complete batches. We conclude with a discussion of the ramifications of these workloads for end-to-end scalability and overall system design.

Journal of Grid Computing | 2009

Chirp: a practical global filesystem for cluster and Grid computing

Douglas Thain; Christopher Moretti; Jeffrey Hemmes

Traditional distributed filesystem technologies designed for local and campus area networks do not adapt well to wide area Grid computing environments. To address this problem, we have designed the Chirp distributed filesystem, which is designed from the ground up to meet the needs of Grid computing. Chirp is easily deployed without special privileges, provides strong and flexible security mechanisms, tunable consistency semantics, and clustering to increase capacity and throughput. We demonstrate that many of these features also provide order-of-magnitude performance increases over wide area networks. We describe three applications in bioinformatics, biometrics, and gamma ray physics that each employ Chirp to attack large scale data intensive problems.

Explore More