Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Eleanor Chu is active.

Publication


Featured researches published by Eleanor Chu.


parallel computing | 1987

Gaussian elimination with partial pivoting and load balancing on a multiprocessor

Eleanor Chu; Alan George

Abstract A row-oriented implementation of Gaussian elimination with partial pivoting on a local-memory multiprocessor is described. In the absence of pivoting, the initial data loading of the node processors leads to a balanced computation. However, if interchanges occur, the computational loads on the processors may become unbalanced, leading to inefficiency. A simple load-balancing scheme is described which is inexpensive and which maintains computational balance in the presence of pivoting. Using some reasonable assumptions about the probability of pivoting occurring, an analysis of the communication costs of the algorithm is developed, along with an analysis of the computation performed in each node processor. This model is then used to derive the expected speedup of the algorithm. Finally, experiments using an Intel hypercube are presented in order to demonstrate the extent to which the analytical model predicts the performance.


Siam Journal on Scientific and Statistical Computing | 1990

QR factorization of a dense matrix on a hypercube multiprocessor

Eleanor Chu; Alan George

In this article a new algorithm for computing the QR factorization of a rectangular matrix on a hypercube multiprocessor is described. The hypercube network is configured as a two-dimensional subcube-grid in the proposed scheme. A global communication scheme that uses redundant computation to maintain data proximity is employed, and the mapping strategy is such that for a fixed number of processors the processor idle time is small and either constant or grows linearly with the dimension of the matrix. A complexity analysis shows what the aspect ratio of the configured grid should be in terms of the shape of the matrix and the relative speeds of communication and computation. Numerical experiments performed on an Intel Hypercube multiprocessor support the theoretical results.


parallel computing | 1989

QR factorization of a dense matrix on a shared-memory multiprocessor

Eleanor Chu; Alan George

A new algorithm for computing an orthogonal decomposition of a rectangular m × n matrix A on a shared-memory parallel computer is described. The algorithm uses Givens rotations, and has the feature that its synchronization cost is low. In particular, for a multiprocessor having p processors, an analysis of the algorithm shows that this cost is O(n2/p) if m/p ⪰ n, and O(mn/p2) of m/p <. Note that in the latter case, the synchronization cost is smaller than O(n2/p). Therefore, the synchronization cost of the algorithm proposed in this article is bounded by O(n2/p) when m ⪰ n. This is important for machines where synchronization cost is high, and when m⪢n. Analysis and experiments show that the algorithm is effective in balancing the load and producing high efficiency (speedup).


parallel computing | 1993

Parallel matrix inversion on a subcube-grid☆

Eleanor Chu; Alan George; Darcy Quesnel

Abstract In this paper we propose a new medium-grain parallel algorithm for computing a matrix inverse on a hypercube multiprocessor. The algorithm implements Gauss-Jordan inversion with column interchanges. The hypercube network is configured as a two-dimensional subcube-grid to support submatrix partitionings. For some algorithms on some types of hypercubes, submatrix partitionings are known to have communication advantages not shared by partitions limited to rows or columns We show that such advantages can be extended to Gauss-Jordan inversion on an Intel iPSC/860, the most current third-generation of hypercubes, and that there is little extra programming effort to include it in the subcube-grid library used in various other matrix computations. An actual aggregate execution rate of 200 MFLOPS (Million Floating-point Operation Per Second) is achieved when inverting a 2000 × 2000 matrix (in double-precision Fortran 77) using 64 iPSC/860 processors configured as an 8 × 8 subcube-grid.


SIAM Journal on Matrix Analysis and Applications | 1990

Sparse orthogonal decomposition on a hypercube multiprocessor

Eleanor Chu; Alan George

In this article the orthogonal decomposition of large sparse matrices on a hypercube multiprocessor is considered. The proposed algorithm offers a parallel implementation of the general row merging scheme for sparse Givens transformations recently developed by Joseph Liu. The proposed parallel algorithm is novel in several aspects. First, a new mapping strategy whose goal is to reduce the communication cost and balance the work load during the entire computing process is proposed. Second, a new sequential algorithm for merging two upper trapezoidal matrices (possibly of different dimensions) is described, wherein the order of computation is different from the standard Givens scheme, and is more suitable for parallel implementation. Third, it is shown that the hypercube network can be employed as a multi-loop multiprocessor. The performance of the parallel algorithm applied to a model problem is analyzed and computation/communication complexity results are presented. Finally it is shown that the parallel s...


ACM Signum Newsletter | 1985

A note on estimating the error in Gaussian elimination without pivoting

Eleanor Chu; Alan George

This article deals with the problem of estimating the error in the computed solution to a system of equations when that solution is obtained by using Gaussian elimination without pivoting. The corresponding problem, where either partial or complete pivoting is used, has received considerable attention, and efficient and reliable methods have been developed. However, in the context of solving large sparse systems, it is often very attractive to apply Gaussian elimination without pivoting, even though it cannot be guaranteed a-priori that the computation is numerically stable. When this is done, it is important to be able to determine when serious numerical errors have occurred, and to be able to estimate the error in the computed solution. In this paper a method for achieving this goal is described. Results of a large number of numerical experiments suggest that the method is both inexpensive and reliable.


SIAM Journal on Scientific Computing | 1993

Parallel algorithms and subcube embedding on a hypercube

Eleanor Chu; Alan George

It is well known that the connection in a hypercube multiprocessor is rich enough to allow the embedding of a variety of topologies within it. For a given problem, the best choice of topology is naturally the one that incurs the least amount of communication and allows parallel execution of as many tasks as possible. In a previous paper we proposed efficient parallel algorithms for performing QR factorization on a hypercube multiprocessor, where the hypercube network is configured as a two-dimensional subcube-grid with an aspect ratio optimally chosen for each problem. In view of the very substantial net saving in execution time and storage usage obtained in performing QR factorization on an optimally configured subcube-grid, similar strategies are developed in this work to provide highly efficient implementations for three fundamental numerical algorithms: Gaussian elimination with partial pivoting, QR factorization with column pivoting, and multiple least squares updating. Timing results on Intel iPSC/2...


Archive | 1999

An In-Place Radix-2 DIT FFT for Input in Natural Order

Eleanor Chu; Alan George

The NR, RN, and NN algorithms implementing DIF (decimation-in-frequency) FFT were presented in Chapters 4, 5, and 6. Corresponding to them, there are also three variants of the DIT (decimation-in-time) FFT, and they are developed in this and the following two chapters. The three DIT FFT algorithms will be presented using the notation developed in the previous chapters. Accordingly, they are referred to as DITNR, DITRN, and DITNN FFT algorithms. The DITNR and DITRN algorithms implement in-place DIT FFT on naturally ordered and bit-reversed input data, whereas the DITNN algorithm allows repeated permutation of the intermediate results and can thus produce naturally ordered output from naturally ordered input. Since both DIF FFT and DIT FFT implement the same Discrete Fourier Transform, one may argue intuitively that the final result which overwrites an input element xk must remain unchanged in either implementation, and that the many results obtained previously in Chapters 4, 5, and 6 for the three DIF FFT algorithms should apply to the corresponding DIT FFT. However, to make this chapter self-contained, it is useful to develop these iterative DIT FFT algorithms from its recursive definition, and this approach is adopted here. Since the concepts introduced before for the DIF FFT will not be repeated, it is recommended that Chapters 4, 5, and 6 be studied before Chapter 7.


Archive | 1999

Inside the FFT Black Box: Serial and Parallel Fast Fourier Transform Algorithms

Eleanor Chu; Alan George


Archive | 1978

User guide for sparspak: waterloo sparse linear equations package

Eleanor Chu; Alan George; Joseph W. H. Liu; Ng Esmond

Collaboration


Dive into the Eleanor Chu's collaboration.

Top Co-Authors

Avatar

Alan George

University of Waterloo

View shared research outputs
Top Co-Authors

Avatar

Alan George

University of Waterloo

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge