Steven J. Deitz | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Steven J. Deitz is active.

Explore More

Publication

Featured researches published by Steven J. Deitz.

conference on high performance computing (supercomputing) | 2000

A Comparative Study of the NAS MG Benchmark across Parallel Languages and Architectures

Bradford L. Chamberlain; Steven J. Deitz; Lawrence Snyder

Hierarchical algorithms such as multigrid applications form an important cornerstone for scientific computing. In this study, we take a first step toward evaluating parallel language support for hierarchical applications by comparing implementations of the NAS MG benchmark in several parallel programming languages: Co-Array Fortran, High Performance Fortran, Single Assignment C, and ZPL. We evaluate each language in terms of its portability, its performance, and its ability to express the algorithm clearly and concisely. Experimental platforms include the Cray T3E, IBM SP, SGI Origin, Sun Enterprise 5500, and a high-performance Linux cluster. Our findings indicate that while it is possible to achieve good portability, performance, and expressiveness, most languages currently fall short in at least one of these areas. We find a strong correlation between expressiveness and a language’s support for a global view of computation, and we identify key factors for achieving portable performance in multigrid applications.

international conference on supercomputing | 2001

Eliminating redundancies in sum-of-product array computations

Steven J. Deitz; Bradford L. Chamberlain; Lawrence Snyder

Array programming languages such as Fortran 90, High Performance Fortran and ZPL are well-suited to scientific computing because they free the scientist from the responsibility of managing burdensome low-level details that complicate programming in languages like C and Fortran 77. However, these burdensome details are critical to performance, thus necessitating aggressive compilation techniques for their optimization. In this paper, we present a new compiler optimization called Array Subexpression Elimination (ASE) that lets a programmer take advantage of the expressibility afforded by array languages and achieve enviable portability and performance. We design a set of micro-benchmarks that model an important class of computations known as stencils and we report on our implementation of this optimization in the context of this micro-benchmark suite. Our results include a 125% improvement on one of these benchmarks and a 50% average speedup across the suite. Also we show a speedup of 32% improvement on the ZPL port of the NAS MG Parallel Benchmark and a 29% speedup over the hand-optimized Fortran version. Further, the compilation time is only negligibly affected.

The Journal of Supercomputing | 2002

High-level Language Support for User-defined Reductions

Steven J. Deitz; Bradford L. Chamberlain; Lawrence Snyder

The optimized handling of reductions on parallel supercomputers or clusters of workstations is critical to high performance because reductions are common in scientific codes and a potential source of bottlenecks. Yet in many high-level languages, a mechanism for writing efficient reductions remains surprisingly absent. Further, when such mechanisms do exist, they often do not provide the flexibility a programmer needs to achieve a desirable level of performance. In this paper, we present a new language construct for arbitrary reductions that lets a programmer achieve a level of performance equal to that achievable with the highly flexible, but low-level combination of Fortran and MPI. We have implemented this construct in the ZPL language and evaluate it in the context of the initialization of the NAS MG benchmark. We show a 45 times speedup over the same code written in ZPL without this construct. In addition, performance on a large number of processors surpasses that achieved in the NAS implementation showing that our mechanism provides programmers with the needed flexibility.

high level parallel programming models and supportive environments | 2004

Abstractions for dynamic data distribution

Steven J. Deitz; Bradford L. Chamberlain; Lawrence Snyder

Processor layout and data distribution are important to performance-oriented parallel computation, yet high-level language support that helps programmers address these issues is often inadequate. This paper presents a trio of abstract high-level language constructs - grids, distributions, and regions - that let programmers manipulate processor layout and data distribution. Grids abstract processor sets, regions abstract index sets, and distributions abstract mappings from index sets to processor sets; each of these is a first-class concept, supporting dynamic data reallocation and redistribution as well as dynamic manipulation of the processor set. This paper illustrates uses of these constructs in the solutions to several motivating parallel programming problems.

acm sigplan symposium on principles and practice of parallel programming | 2003

The design and implementation of a parallel array operator for the arbitrary remapping of data

Steven J. Deitz; Bradford L. Chamberlain; Sung-Eun Choi; Lawrence Snyder

Gather and scatter are data redistribution functions of long-standing importance to high performance computing. In this paper, we present a highly-general array operator with powerful gather and scatter capabilities unmatched by other array languages. We discuss an efficient parallel implementation, introducing three new optimizations---schedule compression, dead array reuse, and direct communication---that reduce the costs associated with the operators wide applicability. In our implementation of this operator in ZPL, we demonstrate performance comparable to the hand-coded Fortran + MPI versions of the NAS FT and CG benchmarks.

high performance computing systems and applications | 2002

Compiler support for automatic checkpointing

Sung-Eun Choi; Steven J. Deitz

Checkpointing is a key technology for applications on large cluster computer systems. As cluster sizes grow, component failures will become a normal part of operation, and applications will have to deal more directly with repeated failures during program runs. We describe automatic checkpointing in the ZPL compiler and its advantages over traditional library or system-based approaches that have no information about application behavior. We show that even naive compiler-inserted checkpoints can significantly reduce the size of the checkpoint recovery data, up to 73% in our application suite. We also introduce the notion of checkpoint ranges, a range of code where processors can perform a local checkpoint at any time during the range. The compiler guarantees that these local checkpoints form a globally consistent checkpoint without global coordination by ensuring that there are no in-flight messages during the checkpoint range. Checkpoint ranges help further alleviate any additional network congestion caused by checkpointing.

international parallel and distributed processing symposium | 2006

Iterators in Chapel

Mackale Joyner; Bradford L. Chamberlain; Steven J. Deitz

A long-held tenet of software engineering is that algorithms and data structures should be specified orthogonally in order to minimize the impact that changes to one will have on the other. Unfortunately, this principle is often not well-supported in scientific and parallel codes due to the lack of abstractions for factoring iteration away from computation in traditional scientific languages. The result is a fragile situation in which complex loop nests are used to express parallelism and maximize performance, yet must be maintained individually as the algorithm and data structures evolve. In this paper, we introduce the iterator concept in the Chapel parallel programming language, designed to address this problem and provide a means for factoring iteration away from computation. The paper illustrates iterators using several examples, compares our approach with those taken in other languages, and describes our implementation in the Chapel compiler

ieee international symposium on parallel & distributed processing, workshops and phd forum | 2011

Translating Chapel to Use FREERIDE: A Case Study in Using an HPC Language for Data-Intensive Computing

Bin Ren; Gagan Agrawal; Bradford L. Chamberlain; Steven J. Deitz

In the last few years, the growing significance of data-intensive computing has been closely tied to the emergence and popularity of new programming paradigms for this class of applications, including Map-Reduce, and new high-level languages for data-intensive computing. The ultimate goal of these efforts in data-intensive computing has been to achieve parallelism with as little effort as possible, while supporting high efficiency and scalability. While these are also the goals that the parallel language/compiler community has tried meeting for the past several decades, the development of languages and programming systems for data-intensive computing has largely been in isolation to the developments in general parallel programming. Such independent developments in the two areas, i.e., data-intensive computing and high productivity languages lead to the following questions: I) Are HPC languages suitable for expressing data-intensive computations? and if so, II.a) What are the issues in using them for effective parallel programming? or, if not, II.b) What characteristics of data-intensive computations force the need for separate language support?. This paper takes a case study to address these questions. Particularly, we study the suitability of Chapel for expressing data-intensive computations. We also examine compilation techniques required for directly invoking a data-intensive middleware from Chapels compilation system. The data-intensive middleware we use in this effort is FREERIDE that has been developed at Ohio State. We show how certain transformations can enable efficient invocation of the FREERIDE functions from the Chapel compiler. Our experiments show that after certain optimizations, the performance of the version of Chapel compiler that invokes FREERIDE functions is quite comparable to the performance of hand-written data-intensive applications.

usenix conference on hot topics in parallelism | 2010