Jerrell Watts
California Institute of Technology
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Jerrell Watts.
Concurrency and Computation: Practice and Experience | 1995
Robert A. van de Geijn; Jerrell Watts
In this paper, we give a straight forward, highly efficient, scalable implementation of common matrix multiplication operations. The algorithms are much simpler than previously published methods, yield better performance, and require less work space. MPI implementations are given, as are performance results on the Intel Paragon system.
ieee international conference on high performance computing data and analytics | 1994
Mike Barnett; Lance Shuler; R.A. van de Geijn; Satya Gupta; David G. Payne; Jerrell Watts
We outline a unified approach for building a library of collective communication operations that performs well on a cross-section of problems encountered in real applications. The target architecture is a two-dimensional mesh with worm-hole routing, but the techniques also apply to higher dimensional meshes and hypercubes. We stress a general approach, addressing the need for implementations that perform well for various sized vectors and grid dimensions, including non-power-of-two grids. This requires the development of general techniques for building hybrid algorithms. Finally, the approach also supports collective communication within a group of nodes, which is required by many scalable algorithms. Results from the Intel Paragon system are included.<<ETX>>
Journal of Parallel and Distributed Computing | 1996
Mike Barnett; David G. Payne; Robert A. van de Geijn; Jerrell Watts
We address the problem of broadcasting on two-dimensional mesh architectures with an arbitrary (non-power-of-two) number of nodes in each dimension. It is assumed that such mesh architectures employ cut-through or wormhole routing. The primary focus is on avoiding network conflicts in the various proposed algorithms. We give algorithms for performing a conflict-free minimum-spanning tree broadcast, a pipelined algorithm that is similar to Ho and Johnssons EDST algorithm for hypercubes, and a novelscatter?collectapproach that is a natural choice for communication libraries due to its simplicity. Results obtained on the Intel Paragon system are included.
conference on high performance computing (supercomputing) | 1994
Michael Barnett; Satya Gupta; David G. Payne; Lance Shuler; Robert A. van de Geijn; Jerrell Watts
We report on a project to develop a unified approach for building a library of collective communication operations that performs well on a cross-section of problems encountered in real applications. The target architecture is a two-dimensional mesh with worm-hole routing, but the techniques are more general. The approach differs from traditional library implementations in that we address the need for implementations that perform well for various sized vectors and grid dimensions, including non-power-of-two grids. We show how a general approach to hybrid algorithms yields performance across the entire range of vector lengths. Moreover, many scalable implementations of application libraries require collective communication within groups of nodes. Our approach yields the same kind of performance for group collective communication. Results from the Intel Paragon system are included.<<ETX>>
Parallel Processing Letters | 1995
Jerrell Watts; Robert A. van de Geijn
We address the problem of performing a pipelined broadcast on a mesh architecture. Meshes require a different approach than other topologies, and their very nature puts a tighter bound on the performance that one can hope to achieve. By using the appropriate techniques, however, one can obtain excellent performance for sufficiently long messages. The resulting algorithm will work on meshes of any dimension with any number of nodes. Our model assumes that the mesh is a torus and/or that it has bidirectional links and uses wormhole routing. Performance data from the Cray T3D are included.
IEEE Parallel & Distributed Technology: Systems & Applications | 1996
Stephen Taylor; Jerrell Watts; Marc Rieffel; Michael E. Palmer
The article describes basic programming techniques and technology to support large scale irregular applications on hybrid architectures. This support maintains applications investments by providing portability, scalability, and maintainability. An application is developed in terms of a concurrent graph library. The concurrent graph library provides a clear conceptual framework for developing large scale, irregular applications on hybrid parallel architectures. It allows adaptive refinement of computations, automatic load balancing and interactive, on the fly visualization.
international workshop on parallel algorithms for irregularly structured problems | 1996
Jerrell Watts; Marc Rieffel; Stephen Taylor
In this paper, we present a cohesive, practical load balancing framework that addresses many shortcomings of existing strategies. These techniques are portable to a broad range of prevalent architectures, including massively parallel machines such as the Cray T3D and Intel Paragon, shared memory systems such as the SGI Power Challenge, and networks of workstations. This scheme improves on earlier work in this area and can be analyzed using well-understood techniques. The algorithm operates using nearest-neighbor communication and inherently maintains existing locality in the application. A simple software interface allows the programmer to use load balancing with very little effort. Unlike many previous efforts in this arena, the techniques have been applied to large-scale industrial applications, one of which is described herein.
Archive | 1995
Prasenjit Mitra; David G. Payne; Lance Shuler; Robert A. van de Geijn; Jerrell Watts
Archive | 1994
Mike Barnett; Satya Gupta; David G. Payne; Lance Shuler; Robert A. van de Geijn; Jerrell Watts
Archive | 1998
Jerrell Watts; Marc Rieffel; Stephen Taylor