Is this you? Create Your Porfile

Jerrell Watts

California Institute of Technology

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Jerrell Watts is active.

Explore More

Publication

Featured researches published by Jerrell Watts.

Concurrency and Computation: Practice and Experience | 1995

SUMMA: Scalable Universal Matrix Multiplication Algorithm

Robert A. van de Geijn; Jerrell Watts

In this paper, we give a straight forward, highly efficient, scalable implementation of common matrix multiplication operations. The algorithms are much simpler than previously published methods, yield better performance, and require less work space. MPI implementations are given, as are performance results on the Intel Paragon system.

ieee international conference on high performance computing data and analytics | 1994

Interprocessor collective communication library (InterCom)

Mike Barnett; Lance Shuler; R.A. van de Geijn; Satya Gupta; David G. Payne; Jerrell Watts

We outline a unified approach for building a library of collective communication operations that performs well on a cross-section of problems encountered in real applications. The target architecture is a two-dimensional mesh with worm-hole routing, but the techniques also apply to higher dimensional meshes and hypercubes. We stress a general approach, addressing the need for implementations that perform well for various sized vectors and grid dimensions, including non-power-of-two grids. This requires the development of general techniques for building hybrid algorithms. Finally, the approach also supports collective communication within a group of nodes, which is required by many scalable algorithms. Results from the Intel Paragon system are included.<<ETX>>

Journal of Parallel and Distributed Computing | 1996

Broadcasting on Meshes with Wormhole Routing

Mike Barnett; David G. Payne; Robert A. van de Geijn; Jerrell Watts

We address the problem of broadcasting on two-dimensional mesh architectures with an arbitrary (non-power-of-two) number of nodes in each dimension. It is assumed that such mesh architectures employ cut-through or wormhole routing. The primary focus is on avoiding network conflicts in the various proposed algorithms. We give algorithms for performing a conflict-free minimum-spanning tree broadcast, a pipelined algorithm that is similar to Ho and Johnssons EDST algorithm for hypercubes, and a novelscatter?collectapproach that is a natural choice for communication libraries due to its simplicity. Results obtained on the Intel Paragon system are included.

conference on high performance computing (supercomputing) | 1994

Building a high-performance collective communication library

Michael Barnett; Satya Gupta; David G. Payne; Lance Shuler; Robert A. van de Geijn; Jerrell Watts

We report on a project to develop a unified approach for building a library of collective communication operations that performs well on a cross-section of problems encountered in real applications. The target architecture is a two-dimensional mesh with worm-hole routing, but the techniques are more general. The approach differs from traditional library implementations in that we address the need for implementations that perform well for various sized vectors and grid dimensions, including non-power-of-two grids. We show how a general approach to hybrid algorithms yields performance across the entire range of vector lengths. Moreover, many scalable implementations of application libraries require collective communication within groups of nodes. Our approach yields the same kind of performance for group collective communication. Results from the Intel Paragon system are included.<<ETX>>

Parallel Processing Letters | 1995

A pipelined broadcast for multidimensional meshes

Jerrell Watts; Robert A. van de Geijn

We address the problem of performing a pipelined broadcast on a mesh architecture. Meshes require a different approach than other topologies, and their very nature puts a tighter bound on the performance that one can hope to achieve. By using the appropriate techniques, however, one can obtain excellent performance for sufficiently long messages. The resulting algorithm will work on meshes of any dimension with any number of nodes. Our model assumes that the mesh is a torus and/or that it has bidirectional links and uses wormhole routing. Performance data from the Cray T3D are included.

IEEE Parallel & Distributed Technology: Systems & Applications | 1996

The concurrent graph: basic technology for irregular problems

Stephen Taylor; Jerrell Watts; Marc Rieffel; Michael E. Palmer

The article describes basic programming techniques and technology to support large scale irregular applications on hybrid architectures. This support maintains applications investments by providing portability, scalability, and maintainability. An application is developed in terms of a concurrent graph library. The concurrent graph library provides a clear conceptual framework for developing large scale, irregular applications on hybrid parallel architectures. It allows adaptive refinement of computations, automatic load balancing and interactive, on the fly visualization.

international workshop on parallel algorithms for irregularly structured problems | 1996

Practical Dynamic Load Balancing for Irregular Problems

Jerrell Watts; Marc Rieffel; Stephen Taylor

In this paper, we present a cohesive, practical load balancing framework that addresses many shortcomings of existing strategies. These techniques are portable to a broad range of prevalent architectures, including massively parallel machines such as the Cray T3D and Intel Paragon, shared memory systems such as the SGI Power Challenge, and networks of workstations. This scheme improves on earlier work in this area and can be analyzed using well-understood techniques. The algorithm operates using nearest-neighbor communication and inherently maintains existing locality in the application. A simple software interface allows the programmer to use load balancing with very little effort. Unlike many previous efforts in this arena, the techniques have been applied to large-scale industrial applications, one of which is described herein.

Archive | 1995