Ravi Mirchandaney | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Ravi Mirchandaney is active.

Explore More

Publication

Featured researches published by Ravi Mirchandaney.

Journal of Parallel and Distributed Computing | 1990

Run-time scheduling and execution of loops on message passing machines

Joel H. Saltz; Kathleen Crowley; Ravi Mirchandaney; Harry Berryman

Abstract We examine the effectiveness of optimizations aimed to allowing distributed machine to efficiently compute inner loops over globally defined data structures. Our optimizations are specifically targeted toward loops in which some array references are made through a level of indirection. Unstructured mesh codes and sparse matrix solvers are examplese of programs with kernels of this sort. Experimental data that quantify the performance obtainable using the methods discussed here are included.

IEEE Transactions on Computers | 1989

Analysis of the effects of delays on load sharing

Ravi Mirchandaney; Donald F. Towsley; John A. Stankovic

The authors study the performance characteristics of simple load-sharing algorithms for distributed systems. In the systems under consideration, it is assumed that nonnegligible delays are encountered in transferring tasks from one node to another and in gathering remote state information. Because of these delays, the state information gathered by the load-sharing algorithms is out of date by the time the load-sharing decisions are taken. The authors analyze the effects of these delays on the performance of three algorithms, called forward, reverse, and symmetric. They formulate queueing-theoretic models for each of the algorithms operating in a homogeneous system under the assumption that the task arrival process at each node is Poisson and the service times and task transfer times are exponentially distributed. Each of the models is solved using the matrix-geometric solution technique, and the important performance metrics are derived and studied. >

Journal of Parallel and Distributed Computing | 1990

Adaptive load sharing in heterogeneous distributed systems

Ravi Mirchandaney; Donald F. Towsley; John A. Stankovic

In this paper, we study the performance characteristics of simple load sharing algorithms for heterogeneous distributed systems. We assume that nonnegligible delays are encountered in transferring jobs from one node to another. We analyze the effects of these delays on the performance of two threshold-based algorithms called Forward and Reverse. We formulate queuing theoretic models for each of the algorithms operating in heterogeneous systems under the assumption that the job arrival process at each node in Poisson and the service times and job transfer times are exponentially distributed. The models are solved using the Matrix-Geometric solution technique. These models are used to study the effects of different parameters and algorithm variations on the mean job response time: e.g., the effects of varying the thresholds, the impact of changing the probe limit, the impact of biasing the probing, and the optimal response times over a large range of loads and delays. Wherever relevant, the results of the models are compared with the M/M/ 1 model, representing no load balancing (hereafter referred to as NLB), and the M/M/K model, which is an achievable lower bound (hereafter referred to as LB).

international conference on supercomputing | 1988

Principles of runtime support for parallel processors

Ravi Mirchandaney; Joel H. Saltz; R. M. Smith; D. M. Nico; Kay Crowley

There exists substantial data level parallelism in scientific problems. The PARTY runtime system is an attempt to obtain efficient parallel implementations for scientific computations, particularly those where the data dependencies are manifest only at runtime. This can preclude compiler based detection of certain types of parallelism. The automated system is structured as follows: An appropriate level of granularity is first selected for the computations. A directed acyclic graph representation of the program is generated on which various aggregation techniques may be employed in order to generate efficient schedules. These schedules are then mapped onto the target machine. We describe some initial results from experiments conducted on the Intel Hypercube and the Encore Multimax that indicate the usefulness of our approach.

international conference on distributed computing systems | 1989

Adaptive load sharing in heterogeneous systems

Ravi Mirchandaney; Donald F. Towsley; John A. Stankovic

The performance characteristics of simple load-sharing algorithms are studied for heterogeneous distributed systems. It is assumed that non-negligible delays are encountered in transforming jobs from one node to another and in gathering remote state information. The effects of these delays on the performance of two algorithms called Forward and Reverse are analyzed. Queuing theoretic models are formulated for each of the algorithms operating in heterogeneous systems under the assumption that the job arrival process at each node is Poisson and the service times and job transfer time are exponentially distributed. The models are solved using the matrix-geometric solution technique. The models are tested with regard to the effects of varying thresholds, the impact of changing the probe limit, and the determination of the optimal response times over a large range of loads and delays. Wherever relevant, the results of the models are compared with M/M/1, random assignment, and the M/M/K models.<<ETX>>

international conference on supercomputing | 1989

The doconsider loop

Joel H. Saltz; Ravi Mirchandaney; Kathleen Crowley

There exist many problems in which substantial parallelism is available but where the parallelism cannot be exploited using doall or doacross loops [lo] [4]. doall loops do not impose any ordering on loop iterations while doacross loops impose a partial execution order in the sense that some of the iterations are forced to wait for the partial or complete execution of some previous iterations. We propose a new type of loop, i.e., doconsider. The doconsiderloop allows loop iterations to be ordered in new ways that preserve dependency relations and increase concurrency. Often, these sorts of index reorderings can be done at very low cost and can have substantial benefits.

acm symposium on parallel algorithms and architectures | 1989

Run-time parallelization and scheduling of loops

Doug Baxter; Ravi Mirchandaney; Joel H. Saltz

Abstract : This paper extends the class of problems that can be effectively compiled by parallelizing compilers. This is accomplished with the doconsider construct which would allow these compilers to parallelize many problems in which substantial loop-level parallelism is available but cannot be detected by standard compile-time analysis. The authors describe the experimentally analyze mechanisms used to parallelize the work required for these types of loops. In each of these methods, a new loop structure is produced by modifying the loop to be parallelized. Also presented are the rules by which these loop transformations may be automated in order that they be included in language compilers. The main application area of our research involves problems in scientific computations and engineering. The workload used in the experiments includes a mixture of real problems as well as synthetically generated inputs. From extensive tests on the Encore Multimax/320, the authors have reached the conclusion that for the types of workloads we have investigated, self-execution almost always performs better than pre-scheduling. Further, the improvement in performance that accrues as a result of global topological sorting of indices as opposed to the less expensive local sorting, is not very significant in the case of self-execution.

Journal of Parallel and Distributed Computing | 1990

Krylov methods preconditioned with incompletely factored matrices on the CM-2

Harry Berryman; Joel H. Saltz; William Gropp; Ravi Mirchandaney

Abstract In the work presented here, we measured the performance of the components of the key iterative kernel of a preconditioned Krylov space iterative linear system solver. In some sense, these numbers can be regarded as best case timings for these kernels. We timed sweeps over meshes, sparse triangular solves, and inner products on a large three-dimensional model problem over a cube-shaped domain discretized with a seven-point template. The performance of the CM-2 is highly dependent on the use of very specialized programs. These programs mapped a regular problem domain onto the processor topology in a careful manner and used the optimized local NEWS communications network. We also document rather dramatic deterioration in performance when these ideal conditions no longer apply. A synthetic work load generator was developed to produce and solve a parameterized family of increasingly irregular problems.

international conference on parallel processing | 1990