John Feo
Pacific Northwest National Laboratory
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by John Feo.
Journal of Parallel and Distributed Computing | 1990
John Feo; David C. Cann; Rodney R. Oldehoeft
Abstract Sisal (Streams and Iterations in Single Assignment Language) is a general-purpose applicative language intended for use on both conventional and novel multiprocessor systems. In this report we discuss the projects objectives, philosophy, and accomplishments and state our future plans. Four significant results of the Sisal project are compilation techniques for high-performance parallel applicative computation, a microtasking environment that supports dataflow on conventional shared-memory architectures, execution times comparable to those of Fortran, and cost-effective speedup on shared-memory multiprocessors.
parallel computing | 1988
John Feo
Abstract This paper presents and analyzes the computational and parallel complexity of the Livermore Loops. The Loops represent the type of computational kernels typically found in large-scale scientific computing and have been used to benchmark computer system since the mid-60s. On parallel systems, a processs computational structure can greatly affect its efficiency. If the loops are to be used to benchmark such systems, their computations must be understood thoroughly, so that efficient implementations may be written. This paper addresses that concern.
international conference on e-science | 2009
Ian Gorton; Zhenyu Huang; Yousu Chen; Benson K. Kalahar; Shuangshuang Jin; Daniel G. Chavarría-Miranda; Douglas J. Baxter; John Feo
Operating the electrical power grid to prevent power black-outs is a complex task. An important aspect of this is contingency analysis, which involves understanding and mitigating potential failures in power grid elements such as transmission lines. When taking into account the potential for multiple simultaneous failures (known as the N-x contingency problem), contingency analysis becomes a massively computational task. In this paper we describe a novel hybrid computational approach to contingency analysis. This approach exploits the unique graph processing performance of the Cray XMT in conjunction with a conventional massively parallel compute cluster to identify likely simultaneous failures that could cause widespread cascading power failures that have massive economic and social impact on society. The approach has the potential to provide the first practical and scalable solution to the N-x contingency problem. When deployed in power grid operations, it will increase the grid operator’s ability to deal effectively with outages and failures with power grid components while preserving stable and safe operation of the grid. The paper describes the architecture of our solution and presents preliminary performance results that validate the efficacy of our approach.
Proceedings of IEEE International Symposium on Parallel Algorithms Architecture Synthesis | 1997
Jean-Luc Gaudiot; Wim Bohm; Walid A. Najjar; Tom DeBoni; John Feo; Patrick Miller
Programming a massively-parallel machine is a daunting task for any human programmer, and parallelization may even be impossible for any compiler. Instead, the functional programming paradigm may prove to be an ideal solution by providing an implicitly parallel interface to the programmer. We describe the Sisal (Stream and Iteration in a Single Assignment Language) project. Its goal is to provide a general-purpose user interface for a wide range of parallel processing platforms.
ieee high performance extreme computing conference | 2013
Tim Mattson; David A. Bader; Jonathan W. Berry; Aydin Buluç; Jack J. Dongarra; Christos Faloutsos; John Feo; John R. Gilbert; Joseph E. Gonzalez; Bruce Hendrickson; Jeremy Kepner; Charles E. Leiserson; Andrew Lumsdaine; David A. Padua; Stephen W. Poole; Steven P. Reinhardt; Michael Stonebraker; Steve Wallach; Andrew Yoo
It is our view that the state of the art in constructing a large collection of graph algorithms in terms of linear algebraic operations is mature enough to support the emergence of a standard set of primitive building blocks. This paper is a position paper defining the problem and announcing our intention to launch an open effort to define this standard.
parallel computing | 2012
ímit V. Çatalyürek; John Feo; Assefaw Hadish Gebremedhin; Mahantesh Halappanavar; Alex Pothen
We explore the interplay between architectures and algorithm design in the context of shared-memory platforms and a specific graph problem of central importance in scientific and high-performance computing, distance-1 graph coloring. We introduce two different kinds of multithreaded heuristic algorithms for the stated, NP-hard, problem. The first algorithm relies on speculation and iteration, and is suitable for any shared-memory system. The second algorithm uses dataflow principles, and is targeted at the non-conventional, massively multithreaded Cray XMT system. We study the performance of the algorithms on the Cray XMT and two multi-core systems, Sun Niagara 2 and Intel Nehalem. Together, the three systems represent a spectrum of multithreading capabilities and memory structure. As testbed, we use synthetically generated large-scale graphs carefully chosen to cover a wide range of input types. The results show that the algorithms have scalable runtime performance and use nearly the same number of colors as the underlying serial algorithm, which in turn is effective in practice. The study provides insight into the design of high performance algorithms for irregular problems on many-core architectures.
computing frontiers | 2007
Jarek Nieplocha; Andres Marquez; John Feo; Daniel G. Chavarría-Miranda; George Chin; Chad Scherrer; Nathaniel Beagley
The resurgence of current and upcoming multithreaded architectures and programming models led us to conduct a detailed study to understand the potential of these platforms to increase the performance of data-intensive, irregular scientific applications. Our study is based on a power system state estimation application and a novel anomaly detection application applied to network traffic data. We also conducted a detailed evaluation of the platforms using microbenchmarks in order to gain insight into their architectural capabilities and their interaction with programming models and application software. The evaluation was performed on the Cray MTA-2 and the Sun Niagar.
ieee international symposium on parallel distributed processing workshops and phd forum | 2010
Eric Goodman; David J. Haglin; Chad Scherrer; Daniel G. Chavarría-Miranda; Jace A. Mogill; John Feo
Two of the most commonly used hashing strategies-linear probing and hashing with chaining-are adapted for efficient execution on a Cray XMT. These strategies are designed to minimize memory contention. Datasets that follow a power law distribution cause significant performance challenges to shared memory parallel hashing implementations. Experimental results show good scalability up to 128 processors on two power law datasets with different data types: integer and string. These implementations can be used in a wide range of applications.
Journal of Parallel and Distributed Computing | 1993
Richard Wolski; John Feo
Abstract Program partitioning and scheduling are essential steps in programming non-shared-memory computer systems. Partitioning is the separation of program operations into sequential tasks, and scheduling is the assignment of tasks to processors. To be effective, automatic methods require an accurate representation of the model of computation and the target architecture. Current partitioning methods assume today′s most prevalent models - macro dataflow and a homogeneous/two-level multicomputer system. Based on communication channels, neither model represents well the emerging class of NUMA multiprocessor computer systems consisting of hierarchical read/write memories. Consequently, the partitions generated by extant methods do not execute well on these systems. In this paper, we extend the conventional graph representation of the macro-dataflow model to enable mapping heuristics to consider the complex conununication options supported by NUMA architectures. We describe two such heuristics. Simulated execution times of program graphs show that our model and heuristics generate higher quality program mappings than current methods for NUMA architectures.
conference on high performance computing (supercomputing) | 1990
David C. Cann; John Feo
The authors compare the performance of SISAL, an application language for parallel numerical computations, and Fortran. The intent is to show that applicative programs, when compiled using a set of powerful yet simple optimization techniques, can achieve sequential execution speeds comparable to Fortran, and automatically utilize conventional shared memory multiprocessors.<<ETX>>