Rodney W. Johnson | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Rodney W. Johnson is active.

Explore More

Publication

Featured researches published by Rodney W. Johnson.

international conference on supercomputing | 1994

An approach to communication-efficient data redistribution

S. D. Kaushik; Chua-Huang Huang; Rodney W. Johnson; P. Sadayappan

We address the development of efficient methods for performing data redistribution of arrays on distributed-memory machines. Data redistribution is important for the distributed-memory implementation of data parallel languages such as High Performance Fortran. An algebraic representation of regular data distributions is used to develop an analytical model for evaluating the communication cost of data redistribution. Using this algebraic representation and the analytical model, an approach to communication-efficient data redistribution is developed. Implementation results on the Intel iPSC/860 are reported.

conference on high performance computing (supercomputing) | 1993

Efficient transposition algorithms for large matrices

S. D. Kaushik; Chua-Huang Huang; John R. Johnson; Rodney W. Johnson; P. Sadayappan

The authors present transposition algorithms for matrices that do not fit in main memory. Transposition is interpreted as a permutation of the vector obtained by mapping a matrix to linear memory. Algorithms are derived from factorizations of this permutation, using a class of permutations related to the tensor product. Using this formulation of transposition, the authors first obtain several known algorithms and then they derive a new algorithm which reduces the number of disk accesses required. The new algorithm was compared to existing algorithms using an implementation on the Intel iPSC/860. This comparison shows the benefits of the new algorithm.

Journal of Parallel and Distributed Computing | 1986

A Framework for Generating Distributed-Memory Parallel Programs for Block Recursive Algorithms

Sandeep K. S. Gupta; Chua-Huang Huang; P. Sadayappan; Rodney W. Johnson

A framework for synthesizing communication-efficient distributed-memory parallel programs for block recursive algorithms such as the fast Fourier transform (FFT) and Strassens matrix multiplication is presented. This framework is based on an algebraic representation of the algorithms, which involves the tensor (Kronecker) product and other matrix operations. This representation is useful in analyzing the communication implications of computation partitioning and data distributions. The programs are synthesized under two different target program models. These two models are based on different ways of managing the distribution of data for optimizing communication. The first model uses point-to-point interprocessor communication primitives, whereas the second model uses data redistribution primitives involving collective all-to-many communication. These two program models are shown to be suitable for different ranges of problem size. The methodology is illustrated by synthesizing communication-efficient programs for the FFT. This framework has been incorporated into the EXTENT system for automatic generation of parallel/vector programs for block recursive algorithms.

conference on high performance computing (supercomputing) | 1994

EXTENT: a portable programming environment for designing and implementing high-performance block recursive algorithms

D. L. Dai; Sandeep K. S. Gupta; S. D. Kaushik; J. H. Lu; R. V. Singh; Chua-Huang Huang; P. Sadayappan; Rodney W. Johnson

Presents EXTENT (EXpert system for TENsor product formula Translation) which is a programming environment for the automatic generation of parallel/vector programs from tensor product formulas. A tensor (Kronecker) product based programming methodology is used for designing high-performance programs on various architectures. In this programming methodology, block recursive algorithms such as the fast Fourier transform and Strassens matrix multiplication algorithm are expressed as tensor product formulas involving tensor product and other matrix operations. A tensor product formula can be systematically translated into parallel and/or vector code for various parallel architectures. A prototype system which generates programs for the Cray Y-MP, Cray T3D and Intel Paragon has been developed. Performance results for some generated programs are presented.<<ETX>>

Scientific Programming | 1995

A tensor product formulation of Strassen's matrix multiplication algorithm with memory reduction

Bharat Kumar; Chua-Huang Huang; P. Sadayappan; Rodney W. Johnson

In this article, we present a program generation strategy of Strassens matrix multiplication algorithm using a programming methodology based on tensor product formulas. In this methodology, block recursive programs such as the fast Fourier Transforms and Strassens matrix multiplication algorithm are expressed as algebraic formulas involving tensor products and other matrix operations. Such formulas can be systematically translated to high-performance parallel/vector codes for various architectures. In this article, we present a nonrecursive implementation of Strassens algorithm for shared memory vector processors such as the Cray Y-MP. A previous implementation of Strassens algorithm synthesized from tensor product formulas required working storage of size O(7

international parallel and distributed processing symposium | 1992