Peter Bui
University of Notre Dame
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Peter Bui.
international conference on management of data | 2012
Michael Albrecht; Patrick Donnelly; Peter Bui; Douglas Thain
In recent years, there has been a renewed interest in languages and systems for large scale distributed computing. Unfortunately, most systems available to the end user use a custom description language tightly coupled to a specific runtime implementation, making it difficult to transfer applications between systems. To address this problem we introduce Makeflow, a simple system for expressing and running a data-intensive workflow across multiple execution engines without requiring changes to the application or workflow description. Makeflow allows any user familiar with basic Unix Make syntax to generate a workflow and run it on one of many supported execution systems. Furthermore, in order to assess the performance characteristics of the various execution engines available to users and assist them in selecting one for use we introduce Workbench, a suite of benchmarks designed for analyzing common workflow patterns. We evaluate Workbench on two physical architectures -- the first a storage cluster with local disks and a slower network and the second a high performance computing cluster with a central parallel filesystem and fast network -- using a variety of execution engines. We conclude by demonstrating three applications that use Makeflow to execute data intensive applications consisting of thousands of jobs.
workflows in support of large-scale science | 2010
Andrew Thrasher; Rory Carmichael; Peter Bui; Li Yu; Douglas Thain; Scott J. Emrich
In this paper we discuss challenges of common bioinformatics applications when deployed outside their initial development environments. We propose a three-tiered approach to mitigate some of these issues by leveraging an encapsulation tool, a high-level workflow language, and a portable intermediary. As a case study, we apply this approach to refactor a custom EST analysis pipeline. The Starch tool encapsulates program dependencies to simplify task specification and deployment. The Weaver language provides abstractions for distributed computing and naturally encourages code modularity. The Makeflow workflow engine provides a batch system agnostic engine to execute compiled Weaver code. To illustrate the benefits of our framework, we compare implementations, show their performance, and discuss benefits derived from our new workflow approach relative to traditional bioinformatics development.
ieee international conference on cloud computing technology and science | 2010
Patrick Donnelly; Peter Bui; Douglas Thain
The Hadoop file system is a large scale distributed file system used to manage and quickly process extremely large data sets. We want to utilize Hadoop to assist with data-intensive workloads in a distributed campus grid environment. Unfortunately, the Hadoop file system is not designed to work in such an environment easily or securely. We present a solution that bridges the Chirp distributed file system to Hadoop for simple access to large data sets. Chirp layers on top of Hadoop many grid computing desirables including simple deployment without special privileges, easy access via Parrot, and strong and flexible security Access Control Lists (ACL). We discuss the challenges involved in using Hadoop on a campus grid and evaluate the performance of the combined systems.
Concurrency and Computation: Practice and Experience | 2014
Irena Lanc; Peter Bui; Douglas Thain; Scott J. Emrich
The advent of new sequencing technologies has generated extremely large amounts of information. To successfully apply bioinformatics tools to such large datasets, they need to exhibit scalability and ideally elasticity in diverse computing environments. We describe the application of previously obtained lessons to a new workflow with and without shared file storage. Because the original workflows have an intractable sequential running times on large datasets, we propose lessons and results for refactoring bioinformatics tools for elastic scaling on personal clouds. Our case studies describe the various challenges faced when constructing such a workflow, from dealing with failure detection, to managing dependencies, to handling the quirks of the underlying operating systems. The practice of scaling bioinformatics tools is increasingly commonplace. As such, this hands‐on application of refactoring techniques can serve as a valuable guide. Significantly, our customized Makeflow framework enabled generalizable deployment on a wider variety of systems while substantially reducing wall clock runtimes using hundreds of cores. Copyright
international conference on cluster computing | 2013
Peter Bui; Travis Boettcher; Nicholas Jaeger; Jeffrey Westphal
With distributed and parallel computing becoming increasingly important in both industrial and scientific endeavors, it is imperative that students are introduced to the challenges and methods of high performance and high throughput computing. Because these topics are often absent in standard undergraduate computer science curriculums, it is necessary to encourage and support independent research and study involving distributed and parallel computing. In this paper, we present three undergraduate research projects that utilize distributed computing clusters: animation rendering, photo processing, and image transcoding. We describe the challenges faced in each project, examine the solutions developed by the students, and then evaluate the performance and behavior of each system. At the end, we reflect on our experience using distributed clusters in undergraduate research and offer six general guidelines for mentoring and pursing distributed computing research projects with undergraduates. Overall, these projects effectively promote skills in high performance and high throughput computing while enhancing the undergraduate educational experience.
Distributed and Parallel Databases | 2012
Hoang Bui; Peter Bui; Patrick J. Flynn; Douglas Thain
As scientific research becomes more data intensive, there is an increasing need for scalable, reliable, and high performance storage systems. Such data repositories must provide both data archival services and rich metadata, and cleanly integrate with large scale computing resources. ROARS is a hybrid approach to distributed storage that provides both large, robust, scalable storage and efficient rich metadata queries for scientific applications. In this paper, we present the design and implementation of ROARS, focusing primarily on the challenge of maintaining data integrity across long time scales. We evaluate the performance of ROARS on a storage cluster, comparing to the Hadoop distributed file system and a centralized file server. We observe that ROARS has read and write performance that scales with the number of storage nodes, and integrity checking that scales with the size of the largest node. We demonstrate the ability of ROARS to function correctly through multiple system failures and reconfigurations. ROARS has been in production use for over three years as the primary data repository for a biometrics research lab at the University of Notre Dame.
2015 IEEE Blocks and Beyond Workshop (Blocks and Beyond) | 2015
Chris Johnson; Peter Bui
Madeup is a programming language for making things up-literally. Programmers write sequences of commands to move and turn through space, tracing out shapes with algorithms and mathematical operations. The language is designed to teach computation from a tangible, first-person perspective and help students integrate computation back into the physical world. We describe the language in general and reflect specifically on our recent implementation of its block interface.
technical symposium on computer science education | 2016
Chris Johnson; Heather Amthauer; Ryan Hardt; Peter Bui
Madeup is a text- and blocks-based programming language for making things up---literally. Programmers write sequences of commands to move and turn through space, tracing out printable 3D shapes with algorithms and mathematical operations. The language is designed to teach computation from a tangible, first-person perspective and help students integrate computation back into the physical world. In this workshop, we empower educators to use the freely-available and browser-based Madeup programming environment in their classrooms. Participants should expect to learn actively.
technical symposium on computer science education | 2015
Chris Johnson; Peter Bui
Madeup is a programming language for making things up. Its speakers walk paths through space to generate printable 3D models. The language is designed to teach computation from a tangible, first-person perspective and help students integrate computation back into the physical world. Madeup is inspired largely by Seymour Papert, whose goal was to provide learners objects to think with. Madeup joins a significant crowd of existing introductory teaching tools. What sets Madeup apart from many of these other projects is its physical product. The model that a programmer creates does not remain virtual. It can be printed, felt, carried in a pocket, and handed to a parent or friend - all of which may make computation more real and relevant in the eyes of the programmer. In this poster we demonstrate the language and how we have used it in outreach with local schools and libraries.
high performance distributed computing | 2010
Peter Bui; Li Yu; Douglas Thain