Ramesh Subramonian | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Ramesh Subramonian is active.

Explore More

Publication

Featured researches published by Ramesh Subramonian.

acm sigplan symposium on principles and practice of parallel programming | 1993

LogP: towards a realistic model of parallel computation

David E. Culler; Richard M. Karp; David A. Patterson; Abhijit Sahay; Klaus E. Schauser; Eunice E. Santos; Ramesh Subramonian; Thorsten von Eicken

A vast body of theoretical research has focused either on overly simplistic models of parallel computation, notably the PRAM, or overly specific models that have few representatives in the real world. Both kinds of models encourage exploitation of formal loopholes, rather than rewarding development of techniques that yield performance across a range of current and future parallel machines. This paper offers a new parallel machine model, called LogP, that reflects the critical technology trends underlying parallel computers. it is intended to serve as a basis for developing fast, portable parallel algorithms and to offer guidelines to machine designers. Such a model must strike a balance between detail and simplicity in order to reveal important bottlenecks without making analysis of interesting problems intractable. The model is based on four parameters that specify abstractly the computing bandwidth, the communication bandwidth, the communication delay, and the efficiency of coupling communication and computation. Portable parallel algorithms typically adapt to the machine configuration, in terms of these parameters. The utility of the model is demonstrated through examples that are implemented on the CM-5.

SIAM Journal on Computing | 1992

Work-optimal asynchronous algorithms for shared memory parallel computers

Charles U. Martel; Arvin Park; Ramesh Subramonian

This paper develops shared memory algorithms for asynchronous processor systems that require the same expected work as the best PRAM algorithms. These algorithms operate efficiently under general asynchronous processor behavior (where individual processor speeds are allowed to vary widely over time). This paper achieves these results by employing a methodology that uses randomization to schedule subtasks of a parallel program. The resulting algorithms allow processors to (i) have arbitrary asynchronous behavior, (ii) have fail-stop errors, (iii) join a computation at any time, and (iv) have no unique identifiers.This paper develops a performance metric for asynchronous parallel computations, called work, which is the total number of instructions (including busy-waiting instructions) performed by a collection of parallel processors during a computation. The main result is to compute any associative function of n variables with

Journal of Algorithms | 1994

On the Complexity of Certified Write-All Algorithms

Charles U. Martel; Ramesh Subramonian

O(n)

acm symposium on parallel algorithms and architectures | 1992

Designing synchronous algorithms for asynchronous processors

Ramesh Subramonian

expected work, using up to

foundations of software technology and theoretical computer science | 1992

On the Complexity of Certified Write All Algorithms

Charles U. Martel; Ramesh Subramonian

n/\log n\log^* n

Proceedings ICCI `92: Fourth International Conference on Computing and Information | 1992

Writing sequential programs for parallel processors: implementation experience

Ramesh Subramonian

asynchronous processors, an...

Communications of The ACM | 1996

LogP: Towards a Realistic Model of Parallel Computation

David E. Culler; Richard M. Karp; David A. Patterson; Abhijit Sahay; Klaus E. Schauser; Eunice E. Santos; Ramesh Subramonian; Thorsten von Eicken

An asynchronous PRAM allows processors to run at different and unpredictable speeds. Thus a fundamental problem in designing asynchronous PRAM algorithms is constructing a synchronization primitive which determines that a set of tasks has been completed. The certified write-all problem (CWA) is: given an array A1..n and a flag f which are both initialized to zero, set all elements of A to one, and then set f to one. A solution to the certified write-all problem can be used as a synchronization primitive in a wide variety of settings. This paper investigates the complexity of CWA algorithms by presenting several new algorithms and lower bound proofs. We present a new randomized CWA algorithm which uses expected O(n) work using up to n/log n processors. We show that this algorithm is optimal in both work and processor utilization by proving an ?(n + p log n) lower bound on the expected work done by a p processor randomized CWA algorithm. Our CWA algorithm uses concurrent reads and concurrent writes. We show that this is necessary by proving that no concurrent read exclusive write (CREW) asynchronous PRAM can solve the CWA problem. However, for a fail-stop PRAM, where processors operate synchronously until they fail, we present a randomized CREW CWA algorithm. This algorithm also uses O(n) expected work using up to n/log n CREW fail-stop processors.

Archive | 1990

How to emulate synchrony

Ramesh Subramonian; Charles U. Martel

The PRAM model has proven to be a fertile ground for algorithm development. However, it assumes that processors operate synchronously, whereas most shared-memory multiprocessors are asynchronous and are likely to remain so. This has motivated the development of simulations of PRAM programs on asynchronous PRAMs. However, such simulations induce either a time or work penalty. Avoiding this penalty has meant designing specifically asynchronous algorithms. To date, the design of these asynchronous algorithms. To date, the design of these asynchronous algorithms has been ad-hoc and non-intuitive. We show how many algorithms, designed and analyzed assuming synchrony, can be easily and systematically converted so that the same work and time bounds are maintained under arbitrary asynchrony. The existence of lower bounds indicates that there exist problems for which the same work and time bounds cannot be maintained. However, this paper shows that in far more cases than hitherto thought possible, asynchrony does not induce a time or work penalty. We suggest a radically new approach to the problem of cache coherence. We show how appropriate architectural support motivates the design of algorithms which are immune to cache incoherence.

international conference on parallel processing | 1990

Asynchronous PRAM Algorithms for List Ranking and Transitive Closure.

Charles U. Martel; Ramesh Subramonian

An Asynchronous PRAM allows processors to run at different and unpredictable speeds. Thus a fundamental problem in designing asynchronous PRAM algorithms is constructing a synchronization primitive which determines that a set of tasks has been completed. The Certified Write All Problem (CWA) is: given an array A[1..n] and a flag f which are both initialized to zero, set all elements of A to one, and then set f to one. A solution to the Certified Write All problem can be used as a synchronization primitive in a wide variety of settings. This paper investigates the complexity of CWA algorithms by presenting several new algorithms and lower bound proofs.

Communications of The ACM | 1996

A practical model of parallel computation

David E. Culler; Richard M. Karp; David A. Patterson; Abhijit Sahay; Eunice E. Santos; Klaus E. Schauser; Ramesh Subramonian; Thorsten von Eicken

The PRAM model has proven to be a fertile ground for algorithm development. However, it assumes that processors operate synchronously, whereas shared-memory multiprocessors are asynchronous and are likely to remain so. This has motivated development of asynchronous PRAM models. Observing the dependence constraints implicit in a computation allows the author to design a large class of synchronous algorithms which run correctly on asynchronous processors without degradation in either time or work bounds. Of great practical interest is his use of randomization for dynamic load-balancing, dealing with asynchrony and providing transparent parallelization. This allows him to give a sequential specification for a parallel algorithm. This, in turn allows him to write sequential programs for parallel processors. He discusses his experience implementing asynchronous PRAM algorithms on shared memory multiprocessors.<<ETX>>

Explore More