S. D. Kaushik
Ohio State University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by S. D. Kaushik.
international conference on parallel processing | 1993
Sandeep K. S. Gupta; S. D. Kaushik; S. Mufti; Sanjay Sharma; Chua-Huang Huang; P. Sadayappan
Efficient generation of communication sets and local index sets is important for evaluation of array expressions in scientific languages such as Fortran-90 and High Performance Fortran implemented on distributed-memory machines. We show that for arrays affinely aligned with templates that are distributed on multiple processors with a block-cyclic distribution, the local memory access sequence and communication sets can be efficiently enumerated using closed forms. First, closed form solutions are presented for arrays that are aligned with identity template that are distributed using block or cyclic distributions.
international conference on supercomputing | 1994
S. D. Kaushik; Chua-Huang Huang; Rodney W. Johnson; P. Sadayappan
We address the development of efficient methods for performing data redistribution of arrays on distributed-memory machines. Data redistribution is important for the distributed-memory implementation of data parallel languages such as High Performance Fortran. An algebraic representation of regular data distributions is used to develop an analytical model for evaluating the communication cost of data redistribution. Using this algebraic representation and the analytical model, an approach to communication-efficient data redistribution is developed. Implementation results on the Intel iPSC/860 are reported.
international parallel processing symposium | 1995
S. D. Kaushik; Chua-Huang Huang; J. Ramanujam; P. Sadayappan
Array redistribution is used in languages such as High Performance Fortran to allow programmers to dynamically change the distribution of arrays across processors. Distributed-memory implementations of several scientific applications require array redistribution. In this paper, efficient methods for performing array redistribution are presented. Precise closed forms for determining the processors involved in the communication and the data elements to be communicated are developed for two special cases of array redistribution involving block-cyclically distributed arrays. The general array redistribution problem involving block-cyclically distributed arrays can be expressed in terms of these special cases. Using the closed forms, a cost model for estimating the communication overhead for array redistribution is developed. A multi-phase approach for reducing the communication cost of array redistribution is presented. Experimental results on the Cray T3D to evaluate the multi-phase approach are provided.<<ETX>>
conference on high performance computing (supercomputing) | 1993
S. D. Kaushik; Chua-Huang Huang; John R. Johnson; Rodney W. Johnson; P. Sadayappan
The authors present transposition algorithms for matrices that do not fit in main memory. Transposition is interpreted as a permutation of the vector obtained by mapping a matrix to linear memory. Algorithms are derived from factorizations of this permutation, using a class of permutations related to the tensor product. Using this formulation of transposition, the authors first obtain several known algorithms and then they derive a new algorithm which reduces the number of disk accesses required. The new algorithm was compared to existing algorithms using an implementation on the Intel iPSC/860. This comparison shows the benefits of the new algorithm.
conference on high performance computing (supercomputing) | 1994
D. L. Dai; Sandeep K. S. Gupta; S. D. Kaushik; J. H. Lu; R. V. Singh; Chua-Huang Huang; P. Sadayappan; Rodney W. Johnson
Presents EXTENT (EXpert system for TENsor product formula Translation) which is a programming environment for the automatic generation of parallel/vector programs from tensor product formulas. A tensor (Kronecker) product based programming methodology is used for designing high-performance programs on various architectures. In this programming methodology, block recursive algorithms such as the fast Fourier transform and Strassens matrix multiplication algorithm are expressed as tensor product formulas involving tensor product and other matrix operations. A tensor product formula can be systematically translated into parallel and/or vector code for various parallel architectures. A prototype system which generates programs for the Cray Y-MP, Cray T3D and Intel Paragon has been developed. Performance results for some generated programs are presented.<<ETX>>
languages and compilers for parallel computing | 1995
S. D. Kaushik; Chua-Huang Huang; P. Sadayappan
In languages such as High Performance Fortran (HPF), array statements are used for expressing data parallelism. In compiling array statements for distributed-memory machines, efficient enumeration of local index sets and communication sets is important. The virtual processor approach, among several other methods, has been proposed for efficient enumeration of these index sets. In this paper, using simple mathematical properties of regular sections, we extend the virtual processor approach to address the memory allocation and index set enumeration problems for array statements involving arrays mapped using the two-level mapping supported by HPF. Performance results on the Cray T3D are presented to demonstrate the efficacy of the extensions and identify various tradeoffs associated with the proposed method.
international parallel and distributed processing symposium | 1992
Sandeep K. S. Gupta; S. D. Kaushik; Chua-Huang Huang; John R. Johnson; Rodney W. Johnson; P. Sadayappan
The authors present an algebraic theory, based on the tensor product for describing the semantics of regular data distributions such as block, cyclic, and block-cyclic distributions. These distributions have been proposed in high performance Fortran, an ongoing effort for developing a Fortran extension for massively parallel computing. This algebraic theory has been used for designing and implementing block recursive algorithms on shared-memory and vector multiprocessors. In the present work, the authors extend this theory to generate programs with explicit data distribution commands from tensor product formulas. A methodology to generate data distributions that optimize communication is described. This methodology is demonstrated by generating efficient programs with data distribution for the fast Fourier transform.<<ETX>>
conference on high performance computing (supercomputing) | 1992
S. D. Kaushik; Sanjay Sharma; Chua-Huang Huang; Jeremy R. Johnson; Rodney W. Johnson; P. Sadayappan
The authors present an algebraic theory based on tensor products for modeling direct interconnection networks. This theory has been used for designing and implementing block recursive numerical algorithms on shared-memory vector multiprocessors. This theory can be used for mapping algorithms expressed in tensor product form onto distributed-memory architectures. The authors focus on the modeling of direct interconnection networks. Rings, n-dimensional meshes, and hypercubes are represented in tensor product form. Algorithm mapping using tensor product formulation is demonstrated by mapping matrix transposition and matrix multiplication onto different networks.<<ETX>>
languages and compilers for parallel computing | 1994
S. D. Kaushik; Chua-Huang Huang; P. Sadayappan
In compiling array statements for distributed-memory machines, efficient generation of local index sets and communication sets is important. Several techniques for enumerating these sets for block-cyclically distributed arrays have been presented in the literature. When sufficient compile-time information is not available, generation of the structures which facilitate efficient enumeration of these sets, is performed at run-time. In this paper, we address the incremental generation of local index sets and communication sets to reduce the runtime cost of array statement execution. We develop techniques for performing the incremental generation using the virtual processor approach for execution of array statements involving block-cyclically distributed arrays.
languages and compilers for parallel computing | 1993
S. D. Kaushik; Chua-Huang Huang; Rodney W. Johnson; P. Sadayappan
In this paper, we address the issue of automatic generation of disk-based algorithms from tensor product formulas. Disk-based algorithms are required in scientific applications which work with large data sets that do not fit entirely into main memory. Tensor products have been used for designing and implementing block recursive algorithms on shared-memory, vector and distributed-memory multiprocessors. We extend this theory to generate disk-based code from tensor product formulas. The methodology is based on generating algebraically equivalent tensor product formulas which have better disk performance. We demonstrate this methodology by generating disk-based code for the fast Fourier transform.