Suvas Vajracharya
University of Colorado Boulder
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Suvas Vajracharya.
international parallel processing symposium | 1994
Dirk Grunwald; Suvas Vajracharya
Barrier algorithms are central to the performance of numerous algorithms on scalable, high-performance architectures. Numerous barrier algorithms have been suggested and studied for non-uniform memory access (NUMA) architectures, but less work has been done for cache only memory access (COMA) or attraction memory architectures such as the KSR-1. We present two new barrier algorithms that offer the best performance we have recorded on the KSR-1 distributed cache multiprocessor. We discuss the trade-offs and the performance of seven algorithms on two architectures. The new barrier algorithms adapt well to a hierarchical caching memory model and take advantage of parallel communication offered by most multiprocessor interconnection networks. Performance results are shown for a 256-processor KSR-1 and a 20-processor Sequent Symmetry.<<ETX>>
conference on high performance computing (supercomputing) | 1997
Suvas Vajracharya; Dirk Grunwald
The order in which loop iterations are executed can have a large impact on the number of cache misses that an applications takes. A new loop order that preserves the semantics of the old order but has a better cache data re-use, improves the performance of that application. Several compiler techniques exist to transform loops such that the order of iterations reduces cache misses. We introduce a run-time method to determine the order based on a dependence-driven execution. In a dependence-driven execution, an execution traverses the iteration space by following the dependence arcs between the iterations.
languages and compilers for parallel computing | 1996
Suvas Vajracharya; Dirk Grunwald
This paper proposes an efficient run-time system to schedule general nested loops on multiprocessors. The work extends existing one-dimensional loop scheduling strategies such as static scheduling, affinity scheduling and various dynamic scheduling methods. The extensions are twofold. First, multiple independent loops as found in different branches of parbegin/parend constructs can be scheduled simultaneously. Secondly, multidimensional loops with dependencies and conditions can be aggressively scheduled. The ability to schedule multidimensional loops with dependencies is made possible by providing a dependence vector as an input to the scheduler. Based on this application-specific input, a continuation-passing run-time system using non-blocking threads efficiently orchestrates the parallelism on shared memory MIMD and DSM multi-computers. The run-time system uses a dependence-driven execution which is similar to data-driven and message-driven executions in that it is asynchronous. This asynchrony allows a high degree of parallelism.
hawaii international conference on system sciences | 1996
Clive F. Baillie; Dirk Grunwald; Suvas Vajracharya
We have taken a Grand Challenge 3D multi-grid code, QGMG, initially developed on the Cray C-90 and subsequently parallelized for MPPs, and implemented it using the DUDE object-oriented, runtime system which combines both task and data parallelism. The QGMG code is a challenging application for two reasons. Firstly, as in all multigrid solvers, the most straightforward implementation requires that most of the processors idle at barrier synchronisations. Secondly, the QGMG code is an example of an application that requires both task and data parallelism: two multigrids (task parallelism) must be solved and each multigrid solver contains data parallelism. To address these challenges, DUDE loosens the requirement that all processes must wait at barriers, and provides integrated task parallelism and data parallelism. We describe the QGMG code and the DUDE object-oriented, runtime system in detail, explaining how we parallelized this Grand Challenge application.
languages and compilers for parallel computing | 1996
Suvas Vajracharya; Dirk Grunwald
parallel and distributed processing techniques and applications | 1999
Suvas Vajracharya; Peter H. Beckman; Steve Karmesin; Katarzyna Keahey; R. R. Oldehoeft; Craig Edward Rasmussen
Archive | 1997
Suvas Vajracharya; Dirk Grunwald
parallel and distributed processing techniques and applications | 1997
Suvas Vajracharya; Dirk Grunwald
Archive | 1995
Cilve Baillie; Dirk Grunwald; Suvas Vajracharya
Archive | 1995
Dirk Grunwald; Suvas Vajracharya