Suvas Vajracharya | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Suvas Vajracharya is active.

Explore More

Publication

Featured researches published by Suvas Vajracharya.

international parallel processing symposium | 1994

Efficient barriers for distributed shared memory computers

Dirk Grunwald; Suvas Vajracharya

Barrier algorithms are central to the performance of numerous algorithms on scalable, high-performance architectures. Numerous barrier algorithms have been suggested and studied for non-uniform memory access (NUMA) architectures, but less work has been done for cache only memory access (COMA) or attraction memory architectures such as the KSR-1. We present two new barrier algorithms that offer the best performance we have recorded on the KSR-1 distributed cache multiprocessor. We discuss the trade-offs and the performance of seven algorithms on two architectures. The new barrier algorithms adapt well to a hierarchical caching memory model and take advantage of parallel communication offered by most multiprocessor interconnection networks. Performance results are shown for a 256-processor KSR-1 and a 20-processor Sequent Symmetry.<<ETX>>

conference on high performance computing (supercomputing) | 1997

Loop Re-Ordering and Pre-Fetching at Run-time

Suvas Vajracharya; Dirk Grunwald

The order in which loop iterations are executed can have a large impact on the number of cache misses that an applications takes. A new loop order that preserves the semantics of the old order but has a better cache data re-use, improves the performance of that application. Several compiler techniques exist to transform loops such that the order of iterations reduces cache misses. We introduce a run-time method to determine the order based on a dependence-driven execution. In a dependence-driven execution, an execution traverses the iteration space by following the dependence arcs between the iterations.

languages and compilers for parallel computing | 1996

Dependence Driven Execution for Data Parallelism

Suvas Vajracharya; Dirk Grunwald

This paper proposes an efficient run-time system to schedule general nested loops on multiprocessors. The work extends existing one-dimensional loop scheduling strategies such as static scheduling, affinity scheduling and various dynamic scheduling methods. The extensions are twofold. First, multiple independent loops as found in different branches of parbegin/parend constructs can be scheduled simultaneously. Secondly, multidimensional loops with dependencies and conditions can be aggressively scheduled. The ability to schedule multidimensional loops with dependencies is made possible by providing a dependence vector as an input to the scheduler. Based on this application-specific input, a continuation-passing run-time system using non-blocking threads efficiently orchestrates the parallelism on shared memory MIMD and DSM multi-computers. The run-time system uses a dependence-driven execution which is similar to data-driven and message-driven executions in that it is asynchronous. This asynchrony allows a high degree of parallelism.

hawaii international conference on system sciences | 1996

Application of an object-oriented parallel run-time system to a Grand Challenge 3D multi-grid code

Clive F. Baillie; Dirk Grunwald; Suvas Vajracharya

We have taken a Grand Challenge 3D multi-grid code, QGMG, initially developed on the Cray C-90 and subsequently parallelized for MPPs, and implemented it using the DUDE object-oriented, runtime system which combines both task and data parallelism. The QGMG code is a challenging application for two reasons. Firstly, as in all multigrid solvers, the most straightforward implementation requires that most of the processors idle at barrier synchronisations. Secondly, the QGMG code is an example of an application that requires both task and data parallelism: two multigrids (task parallelism) must be solved and each multigrid solver contains data parallelism. To address these challenges, DUDE loosens the requirement that all processes must wait at barriers, and provides integrated task parallelism and data parallelism. We describe the QGMG code and the DUDE object-oriented, runtime system in detail, explaining how we parallelized this Grand Challenge application.

languages and compilers for parallel computing | 1996

Dependence-driven run-time system

Suvas Vajracharya; Dirk Grunwald

parallel and distributed processing techniques and applications | 1999

A Programming Model for Clusters of SMPs.

Suvas Vajracharya; Peter H. Beckman; Steve Karmesin; Katarzyna Keahey; R. R. Oldehoeft; Craig Edward Rasmussen

Archive | 1997

Runtime loop optimizations for locality and parallelism

Suvas Vajracharya; Dirk Grunwald

parallel and distributed processing techniques and applications | 1997

Exploiting Temporal Locality Using a Dependence Driven Execution.

Suvas Vajracharya; Dirk Grunwald

Archive | 1995

Application of an Object-Oriented Parallel Run-Time System to a Grand Challenge 3d Multi-Grid Code ; CU-CS-780-95

Cilve Baillie; Dirk Grunwald; Suvas Vajracharya

Archive | 1995

The DUDE Runtime System: An Object-Oriented Macro-Dataflow Approach To Integrated Task and Object Pa

Dirk Grunwald; Suvas Vajracharya

Explore More

Collaboration

Dive into the Suvas Vajracharya's collaboration.

Top Co-Authors

Dirk Grunwald

University of Colorado Boulder

View shared research outputs

Top Co-Authors

Clive F. Baillie

University of Colorado Boulder

View shared research outputs

Top Co-Authors

Katarzyna Keahey

University of Tennessee

View shared research outputs

Top Co-Authors

Peter H. Beckman

Argonne National Laboratory

View shared research outputs

Top Co-Authors

R. R. Oldehoeft

Los Alamos National Laboratory

View shared research outputs

Top Co-Authors

Steve Karmesin

Los Alamos National Laboratory

View shared research outputs

Explore More