John Paul Walters | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where John Paul Walters is active.

Explore More

Publication

Featured researches published by John Paul Walters.

advanced information networking and applications | 2008

A Comparison of Virtualization Technologies for HPC

Vipin Chaudhary; Minsuk Cha; John Paul Walters; S. Guercio; Steven M. Gallo

Virtualization is a common strategy for improving the utilization of existing computing resources, particularly within data centers. However, its use for high performance computing (HPC) applications is currently limited despite its potential for both improving resource utilization as well as providing resource guarantees to its users. This paper systematically evaluates various VMs for computationally intensive HPC applications using various standard benchmarks. Using VMWare Server, xen, and OpenVZ we examine the suitability of full virtualization, paravirtualization, and operating system-level virtualization in terms of network utilization SMP performance, file system performance, and MPI scalability. We show that the operating system-level virtualization provided by OpenVZ provides the best overall performance, particularly for MPI scalability.

international conference on cluster computing | 2011

Heterogeneous Cloud Computing

Stephen P. Crago; Kyle Dunn; Patrick Eads; Lorin Hochstein; Dong-In Kang; Mikyung Kang; Devendra Modium; Karandeep Singh; Jinwoo Suh; John Paul Walters

Current cloud computing infrastructure typically assumes a homogeneous collection of commodity hardware, with details about hardware variation intentionally hidden from users. In this paper, we present our approach for extending the traditional notions of cloud computing to provide a cloud-based access model to clusters that contain a heterogeneous architectures and accelerators. We describe our ongoing work extending the Open Stack cloud computing stack to support heterogeneous architectures and accelerators, and our experiences running Open Stack on our local heterogeneous cluster test bed.

international parallel and distributed processing symposium | 2009

Evaluating the use of GPUs in liver image segmentation and HMMER database searches

John Paul Walters; Vidyananth Balu; Suryaprakash Kompalli; Vipin Chaudhary

In this paper we present the results of parallelizing two life sciences applications, Markov random fields-based (MRF) liver segmentation and HMMERs Viterbi algorithm, using GPUs. We relate our experiences in porting both applications to the GPU as well as the techniques and optimizations that are most beneficial. The unique characteristics of both algorithms are demonstrated by implementations on an NVIDIA 8800 GTX Ultra using the CUDA programming environment. We test multiple enhancements in our GPU kernels in order to demonstrate the effectiveness of each strategy. Our optimized MRF kernel achieves over 130× speedup, and our hmmsearch implementation achieves up to 38× speedup. We show that the differences in speedup between MRF and hmmsearch is due primarily to the frequency at which the hmmsearch must read from the GPUs DRAM.

IEEE Transactions on Parallel and Distributed Systems | 2009

Replication-Based Fault Tolerance for MPI Applications

John Paul Walters; Vipin Chaudhary

As computational clusters increase in size, their mean time to failure reduces drastically. Typically, checkpointing is used to minimize the loss of computation. Most checkpointing techniques, however, require central storage for storing checkpoints. This results in a bottleneck and severely limits the scalability of checkpointing, while also proving to be too expensive for dedicated checkpointing networks and storage systems. We propose a scalable replication-based MPI checkpointing facility. Our reference implementation is based on LAM/MPI; however, it is directly applicable to any MPI implementation. We extend the existing state of fault-tolerant MPI with asynchronous replication, eliminating the need for central or network storage. We evaluate centralized storage, a Sun-X4500-based solution, an EMC storage area network (SAN), and the Ibrix commercial parallel file system and show that they are not scalable, particularly after 64 CPUs. We demonstrate the low overhead of our checkpointing and replication scheme with the NAS Parallel Benchmarks and the High-Performance LINPACK benchmark with tests up to 256 nodes while demonstrating that checkpointing and replication can be achieved with a much lower overhead than that provided by current techniques. Finally, we show that the monetary cost of our solution is as low as 25 percent of that of a typical SAN/parallel-file-system-equipped storage system.

advanced information networking and applications | 2006

Accelerating the HMMER sequence analysis suite using conventional processors

John Paul Walters; Bashar Qudah; Vipin Chaudhary

Due to the ever-increasing size of sequence databases it has become clear that faster techniques must be employed to effectively perform biological sequence analysis in a reasonable amount of time. Exploiting the inherent parallelism between sequences is a common strategy. In this paper we enhance both the fine-grained and course-grained parallelism within the HMMER sequence analysis suite. Our strategies are complementary to one another and, where necessary, can be used as drop-in replacements to the strategies already provided within HMMER. We use conventional processors (Intel Pentium IV Xeon) as well as the freely available MPICH parallel programming environment. Our results show that the MPICH implementation greatly outperforms the PVM HMMER implementation, and our SSE2 implementation also lends greater computational power at no cost to the user.

international conference on cloud computing | 2014

GPU Passthrough Performance: A Comparison of KVM, Xen, VMWare ESXi, and LXC for CUDA and OpenCL Applications

John Paul Walters; Andrew J. Younge; Dong In Kang; Ke Thia Yao; Mikyung Kang; Stephen P. Crago; Geoffrey C. Fox

As more scientific workloads are moved into the cloud, the need for high performance accelerators increases. Accelerators such as GPUs offer improvements in both performance and power efficiency over traditional multi-core processors, however, their use in the cloud has been limited. Today, several common hypervisors support GPU passthrough, but their performance has not been systematically characterized. In this paper we show that low overhead GPU passthrough is achievable across 4 major hypervisors and two processor microarchitectures. We compare the performance of two generations of NVIDIA GPUs within the Xen, VMWare ESXi, and KVM hypervisors, and we also compare the performance to that of Linux Containers (LXC). We show that GPU passthrough to KVM achieves 98 -- 100% of the base systems performance across two architectures, while Xen and VMWare achieve 96 -- 99% of the base systems performance, respectively. In addition, we describe several valuable lessons learned through our analysis and share the advantages and disadvantages of each hypervisor/GPU passthrough solution.

signal processing systems | 2007

MPI-HMMER-Boost: Distributed FPGA Acceleration

John Paul Walters; Xiandong Meng; Vipin Chaudhary; Timothy F. Oliver; Leow Yuan Yeow; Bertil Schmidt; Darran Nathan; Joseph Landman

HMMER, based on the profile Hidden Markov Model (HMM) is one of the most widely used sequence database searching tools, allowing researchers to compare HMMs to sequence databases or sequences to HMM databases. Such searches often take many hours and consume a great number of CPU cycles on modern computers. We present a cluster-enabled hardware/software-accelerated implementation of the HMMER search tool hmmsearch. Our results show that combining the parallel efficiency of a cluster with one or more high-speed hardware accelerators (FPGAs) can significantly improve performance for even the most time consuming searches, often reducing search times from several hours to minutes.

The Journal of Supercomputing | 2009

A fault-tolerant strategy for virtualized HPC clusters

John Paul Walters; Vipin Chaudhary

Virtualization is a common strategy for improving the utilization of existing computing resources, particularly within data centers. However, its use for high performance computing (HPC) applications is currently limited despite its potential for both improving resource utilization as well as providing resource guarantees to its users. In this article, we systematically evaluate three major virtual machine implementations for computationally intensive HPC applications using various standard benchmarks. Using VMWare Server, Xen, and OpenVZ, we examine the suitability of full virtualization (VMWare), paravirtualization (Xen), and operating system-level virtualization (OpenVZ) in terms of network utilization, SMP performance, file system performance, and MPI scalability. We show that the operating system-level virtualization provided by OpenVZ provides the best overall performance, particularly for MPI scalability. With the knowledge gained by our VM evaluation, we extend OpenVZ to include support for checkpointing and fault-tolerance for MPI-based virtual server distributed computing.

international conference on parallel processing | 2006

An adaptive heterogeneous software DSM

John Paul Walters; Hai Jiang; Vipin Chaudhary

This paper presents a mechanism to run parallel applications in heterogeneous, dynamic environments while maintaining thread synchrony. A heterogeneous software DSM is used to provide synchronization constructs similar to Pthreads, while providing for individual thread mobility. An asymmetric data conversion scheme is adopted to restore thread states among different computers during thread migration. Within this framework we create a mechanism capable of maintaining the distributed state between migrated (and possibly heterogeneous) threads. We show that thread synchrony can be maintained with minimal overhead and minimal burden to the programmer

ieee aerospace conference | 2011

Software-based fault tolerance for the Maestro many-core processor

John Paul Walters; Robert Kost; Karandeep Singh; Jinwoo Suh; Stephen P. Crago

The current generation of radiation-hardened general-purpose processors, such as the RAD750, lag far behind their commercial counterparts in terms of performance. To combat this, a new many-core processor was designed that would allow space applications to leverage up to 49 general-purpose processing cores for high performance space applications. The Maestro processor, based on Tileras TILE64 chip, is the result of this effort. Maestro is rad-hard by design, but there is still the possibility of both hardware and software errors.

Explore More