Silvius Rus | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Silvius Rus is active.

Explore More

Publication

Featured researches published by Silvius Rus.

languages and compilers for parallel computing | 2001

STAPL: an adaptive, generic parallel C++ library

Ping An; Alin Jula; Silvius Rus; Steven Saunders; Timmie G. Smith; Gabriel Tanase; Nathan Thomas; Nancy M. Amato; Lawrence Rauchwerger

The Standard Template Adaptive Parallel Library (STAPL) is a parallel library designed as a superset of the ANSI C++ Standard Template Library (STL). It is sequentially consistent for functions with the same name, and executes on uni- or multi-processor systems that utilize shared or distributed memory. STAPL is implemented using simple parallel extensions of C++ that currently provide a SPMD model of parallelism, and supports nested parallelism. The library is intended to be general purpose, but emphasizes irregular programs to allow the exploitation of parallelism for applications which use dynamically linked data structures such as particle transport calculations, molecular dynamics, geometric modeling, and graph algorithms. STAPL provides several different algorithms for some library routines, and selects among them adaptively at runtime. STAPL can replace STL automatically by invoking a preprocessing translation phase. In the applications studied, the performance of translated code was within 5% of the results obtained using STAPL directly. STAPL also provides functionality to allow the user to further optimize the code and achieve additional performance gains. We present results obtained using STAPL for a molecular dynamics code and a particle transport code.

International Journal of Parallel Programming | 2003

Hybrid analysis: static & dynamic memory reference analysis

Silvius Rus; Lawrence Rauchwerger; Jay Hoeflinger

We present a novel Hybrid Analysis technology which can efficiently and seamlessly integrate all static and run-time analysis of memory references into a single framework that is capable of performing all data dependence analysis and can generate necessary information for most associated memory related optimizations. We use HA to perform automatic parallelization by extracting run-time assertions from any loop and generating appropriate run-time tests that range from a low cost scalar comparison to a full, reference by reference run-time analysis. Moreover we can order the run-time tests in increasing order of complexity (overhead) and thus risk the minimum necessary overhead. We accomplish this by both extending compile time IP analysis techniques and by incorporating speculative run-time techniques when necessary. Our solution is to bridge “free” compile time techniques with exhaustive run-time techniques through a continuum of simple to complex solutions. We have implemented our framework in the Polaris compiler by introducing an innovative intermediate representation called RT_LMAD and a run-time library that can operate on it. Based on the experimental results obtained to date we hope to automatically parallelize most and possibly all PERFECT codes, a significant accomplishment.

Lecture Notes in Computer Science | 2003

MPI on bluegene/L: Designing an efficient general purpose messaging solution for a large cellular system

George S. Almasi; Charles J. Archer; José G. Castaños; Manish Gupta; Xavier Martorell; José E. Moreira; William Gropp; Silvius Rus; Brian R. Toonen

The BlueGene/L computer uses system-on-a-chip integration and a highly scalable 65,536-node cellular architecture to deliver 360 Tflops of peak computing power. Efficient operation of the machine requires a fast, scalable, and standards compliant MPI library. In this paper, we discuss our efforts to port the MPICH2 library to BlueGene/L.

international conference on supercomputing | 2007

Sensitivity analysis for automatic parallelization on multi-cores

Silvius Rus; Maikel Pennings; Lawrence Rauchwerger

Sensitivity Analysis (SA) is a novel compiler technique that complements, and integrates with, static automatic parallelization analysis for the cases when relevant program behavior is input sensitive. In this paper we show how SA can extract all the input dependent, statically unavailable, conditions for which loops can be dynamically parallelized. SA generates a sequence of sufficient conditions which, when evaluated dynamically in order of their complexity, can each validate the dynamic parallel execution of the corresponding loop. For example, SA can first attempt to validate parallelization by checking simple conditions related to loop bounds. If such simple conditions cannot be met, then validating dynamic parallelization may require evaluating conditions related to the entire memory reference trace of a loop, thus decreasing the benefits of parallel execution. We have implemented Sensitivity Analysis in the Polaris compiler and evaluated its performance using 22 industry standard benchmark codes running on two multicore systems. In most cases we have obtained speedups superior to the Intel Ifort compiler because with SA we could complement static analysis with minimum cost dynamic analysis and extract most of the available coarse grained parallelism.

international conference on parallel architectures and compilation techniques | 2004

The Value Evolution Graph and its Use in Memory Reference Analysis

Silvius Rus; Dongmin Zhang; Lawrence Rauchwerger

We introduce a framework for the analysis of memory reference sets addressed by induction variables without closed forms. This framework relies on a new data structure, the value evolution graph (VEG), which models the global flow of values taken by induction variable with and without closed forms. We describe the application of our framework to array data-flow analysis, privatization, and dependence analysis. This results in the automatic parallelization of loops that contain arrays addressed by induction variables without closed forms. We implemented this framework in the Polaris research compiler. We present experimental results on a set of codes from the PERFECT, SPEC, and NCSA benchmark suites.

languages and compilers for parallel computing | 2005

Scalable array SSA and array data flow analysis

Silvius Rus; Guobin He; Lawrence Rauchwerger

Static Single Assignment (SSA) has become the intermediate program representation of choice in most modern compilers because it enables efficient data flow analysis of scalars and thus leads to better scalar optimizations. Unfortunately not much progress has been achieved in applying the same techniques to array data flow analysis, a very important and potentially powerful technology. In this paper we propose to improve the applicability of previous efforts in array SSA through the use of a symbolic memory access descriptor that can aggregate the accesses to the elements of an array over large, interprocedural program contexts. We then show the power of our new representation by using it to implement a basic data flow algorithm, reaching definitions. Finally we apply this analysis to array constant propagation and show performance improvement (speedups) for benchmark codes.

languages and compilers for parallel computing | 2008

Implementation of Sensitivity Analysis for Automatic Parallelization

Silvius Rus; Maikel Pennings; Lawrence Rauchwerger

Sensitivity Analysis (SA) is a novel compiler technique that complements, and integrates with, static automatic parallelization analysis for the cases when program behavior is input sensitive. SA can extract all the input dependent, statically unavailable, conditions for which loops can be dynamically parallelized. SA generates a sequence of sufficient conditions which, when evaluated dynamically in order of their complexity, can each validate the dynamic parallel execution of the corresponding loop. While SAs principles are fairly simple, implementing it in a real compiler and obtaining good experimental results on benchmark codes is a difficult task. In this paper we present some of the most important implementation issues that we had to overcome in order to achieve a fairly successful automatic parallelizer. We present techniques related to validating dependence removing transformations, e.g., privatization or pushback parallelization, and static and dynamic evaluation of complex conditions for loop parallelization. We concern ourselves with multi-version and parallel code generation as well as the use of speculative parallelization when other, less costly options fail. We present a summary table of the contributions of our techniques to the successful parallelization of 22 industry benchmark codes. We also report speedups and parallel coverage of these codes on two multicore based systems and compare them to results obtained by the Ifort compiler.

ieee international conference on high performance computing data and analytics | 2004

Automatic parallelization using the value evolution graph

Silvius Rus; Dongmin Zhang; Lawrence Rauchwerger

We introduce a framework for the analysis of memory reference sets addressed by induction variables without closed forms. This framework relies on a new data structure, the Value Evolution Graph (VEG), which models the global flow of scalar and array values within a program. We describe the application of our framework to array data-flow analysis, privatization, and dependence analysis. This results in the automatic parallelization of loops that contain arrays indexed by induction variables without closed forms. We implemented this framework in the Polaris research compiler. We present experimental results on a set of codes from the PERFECT, SPEC, and NCSA benchmark suites.

languages and compilers for parallel computing | 2001