Joel H. Saltz | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Joel H. Saltz is active.

Explore More

Publication

Featured researches published by Joel H. Saltz.

Journal of Parallel and Distributed Computing | 1990

Run-time scheduling and execution of loops on message passing machines

Joel H. Saltz; Kathleen Crowley; Ravi Mirchandaney; Harry Berryman

Abstract We examine the effectiveness of optimizations aimed to allowing distributed machine to efficiently compute inner loops over globally defined data structures. Our optimizations are specifically targeted toward loops in which some array references are made through a level of indirection. Unstructured mesh codes and sparse matrix solvers are examplese of programs with kernels of this sort. Experimental data that quantify the performance obtainable using the methods discussed here are included.

international conference on supercomputing | 1988

Principles of runtime support for parallel processors

Ravi Mirchandaney; Joel H. Saltz; R. M. Smith; D. M. Nico; Kay Crowley

There exists substantial data level parallelism in scientific problems. The PARTY runtime system is an attempt to obtain efficient parallel implementations for scientific computations, particularly those where the data dependencies are manifest only at runtime. This can preclude compiler based detection of certain types of parallelism. The automated system is structured as follows: An appropriate level of granularity is first selected for the computations. A directed acyclic graph representation of the program is generated on which various aggregation techniques may be employed in order to generate efficient schedules. These schedules are then mapped onto the target machine. We describe some initial results from experiments conducted on the Intel Hypercube and the Encore Multimax that indicate the usefulness of our approach.

IEEE Transactions on Computers | 1988

Dynamic remapping of parallel computations with varying resource demands

David M. Nicol; Joel H. Saltz

The issue of deciding when to invoke a global load remapping mechanism is studied. Such a decision policy must effectively weigh the costs of remapping against the performance benefits, and should be general enough to apply automatically to a wide range of computations. The authors propose a general mapping decision heuristic, then study its effectiveness and its anticipated behavior on two very different models of load evolution. Assuming only that the remapping cost is known, this policy dynamically minimizes system degradation (including the cost of remapping) for each computation step. This policy is quite simple, choosing to remap when the first local minimum in the degradation function is detected. Simulations show that the decision obtained provides significantly better performance than that achieved by never remapping. The authors also observe that the average intermapping frequency is quite close to the optimal fixed remapping frequency. >

International Journal of Parallel Programming | 1989

Parallel processing of biological sequence comparison algorithms

Elizabeth E. Edmiston; Nolan G. Core; Joel H. Saltz; Roger M. Smith

Comparison of biological (DNA or protein) sequences provides insight into molecular structure, function, and homology, and is increasingly important as the available databases become larger and more numerous. One method of increasing the speed of the calculations is to perform them in parallel. We present the results of initial investigations using the Intel iPSC/1 hypercube and the Connection Machine (CM-I) for these comparisons. Since these machines have very different architectures, the issues and performance trade-offs discussed have a wide applicability for the parallel processing of biological sequence comparisons.

hypercube concurrent computers and applications | 1989

An experimental study of methods for parallel preconditioned Krylov methods

Doug Baxter; Joel H. Saltz; Martin H. Schultz; Stan Eisenstat; Kay Crowley

High performance multiprocessor architectures differ both in the number of processors, and in the delay costs for synchronization and communication. In order to obtain good performance on a given architecture for a given problem, adequate parallelization, good balance of load and an appropriate choice of granularity are essential. We discuss the implementation of parallel version of PCGPAK for both shared memory architectures and hypercubes. Our parallel implementation is sufficiently efficient to allow us to complete the solution of our test problems on 16 processors of the Encore Multimax/320 in an amount of time that is a small multiple of that required by a single head of a Cray X/MP, despite the fact that the peak performance of the Multimax processors is not even close to the supercomputer range. We illustrate the effectiveness of our approach on a number of model problems from reservoir engineering and mathematics.

international conference on supercomputing | 1989

The doconsider loop

Joel H. Saltz; Ravi Mirchandaney; Kathleen Crowley

There exist many problems in which substantial parallelism is available but where the parallelism cannot be exploited using doall or doacross loops [lo] [4]. doall loops do not impose any ordering on loop iterations while doacross loops impose a partial execution order in the sense that some of the iterations are forced to wait for the partial or complete execution of some previous iterations. We propose a new type of loop, i.e., doconsider. The doconsiderloop allows loop iterations to be ordered in new ways that preserve dependency relations and increase concurrency. Often, these sorts of index reorderings can be done at very low cost and can have substantial benefits.

parallel computing | 1988

Towards developing robust algorithms for solving partial differential equations on MIMD machines

Joel H. Saltz; Vijay K Naik

Abstract Methods are proposed for efficient computation of numerical algorithms on a wide variety of MIMD machines. These techniques reorganize the data dependency patterns so that the processor utilization is imporved. The model problem examined finds the time-accurate solution to a parabolic partial differential equation discretized in space and implicitly marched forward in time. The algorithms investigated are extensions of Jacobi and SOR. The extensions consist of iterating over a window of several timesteps, allowing efficient overlap of computation with communication. The methods suggested here increase the degree to which work can be performed while data are communicated between processors. The effect of the window size and of domain partitioning on the system performance is examined both analytically and experimentally by implementing the algorithm on a simulated multiprocessor system.

acm symposium on parallel algorithms and architectures | 1989

Run-time parallelization and scheduling of loops

Doug Baxter; Ravi Mirchandaney; Joel H. Saltz

Abstract : This paper extends the class of problems that can be effectively compiled by parallelizing compilers. This is accomplished with the doconsider construct which would allow these compilers to parallelize many problems in which substantial loop-level parallelism is available but cannot be detected by standard compile-time analysis. The authors describe the experimentally analyze mechanisms used to parallelize the work required for these types of loops. In each of these methods, a new loop structure is produced by modifying the loop to be parallelized. Also presented are the rules by which these loop transformations may be automated in order that they be included in language compilers. The main application area of our research involves problems in scientific computations and engineering. The workload used in the experiments includes a mixture of real problems as well as synthetically generated inputs. From extensive tests on the Encore Multimax/320, the authors have reached the conclusion that for the types of workloads we have investigated, self-execution almost always performs better than pre-scheduling. Further, the improvement in performance that accrues as a result of global topological sorting of indices as opposed to the less expensive local sorting, is not very significant in the case of self-execution.

Journal of Parallel and Distributed Computing | 1990

Krylov methods preconditioned with incompletely factored matrices on the CM-2

Harry Berryman; Joel H. Saltz; William Gropp; Ravi Mirchandaney

Abstract In the work presented here, we measured the performance of the components of the key iterative kernel of a preconditioned Krylov space iterative linear system solver. In some sense, these numbers can be regarded as best case timings for these kernels. We timed sweeps over meshes, sparse triangular solves, and inner products on a large three-dimensional model problem over a cube-shaped domain discretized with a seven-point template. The performance of the CM-2 is highly dependent on the use of very specialized programs. These programs mapped a regular problem domain onto the processor topology in a careful manner and used the optimized local NEWS communications network. We also document rather dramatic deterioration in performance when these ideal conditions no longer apply. A synthetic work load generator was developed to produce and solve a parameterized family of increasingly irregular problems.

Computers and Biomedical Research | 1989

Supercomputers and biological sequence comparison algorithms

Nolan G. Core; Elizabeth W. Edmiston; Joel H. Saltz; Roger M. Smith

Comparison of biological (DNA or protein) sequences provides insight into molecular structure, function, and homology and is increasingly important as the available databases become larger and more numerous. One method of increasing the speed of the calculations is to perform them in parallel. We present the results of initial investigations using two dynamic programming algorithms on the Intel iPSC hypercube and the Connection Machine as well as an inexpensive, heuristically-based algorithm on the Encore Multimax.

Explore More