Bradford L. Chamberlain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Bradford L. Chamberlain is active.

Explore More

Publication

Featured researches published by Bradford L. Chamberlain.

ieee international conference on high performance computing data and analytics | 2007

Parallel Programmability and the Chapel Language

Bradford L. Chamberlain; David Callahan; Hans P. Zima

In this paper we consider productivity challenges for parallel programmers and explore ways that parallel language design might help improve end-user productivity. We offer a candidate list of desirable qualities for a parallel programming language, and describe how these qualities are addressed in the design of the Chapel language. In doing so, we provide an overview of Chapels features and how they help address parallel productivity. We also survey current techniques for parallel programming and describe ways in which we consider them to fall short of our idealized productive programming model.

acm sigplan symposium on principles and practice of parallel programming | 2008

Software transactional memory for large scale clusters

Robert L. Bocchino; Vikram S. Adve; Bradford L. Chamberlain

While there has been extensive work on the design of software transactional memory (STM) for cache coherent shared memory systems, there has been no work on the design of an STM system for very large scale platforms containing potentially thousands of nodes. In this work, we present Cluster-STM, an STM designed for high performance on large-scale commodity clusters. Our design addresses several novel issues posed by this domain, including aggregating communication, managing locality, and distributing transactional metadata onto the nodes. We also re-evaluate several STM design choices previously studied for cache-coherent machines and conclude that, in some cases, different choices are appropriate on clusters. Finally, we show that our design scales well up to 512 processors. This is because on a cluster, the main barrier to STM scalability is the remote communication overhead imposed by the STM operations, and our design aggregates most of that communication with the communication of the underlying data.

international parallel and distributed processing symposium | 2013

Exploring Traditional and Emerging Parallel Programming Models Using a Proxy Application

Ian Karlin; Abhinav Bhatele; Jeff Keasler; Bradford L. Chamberlain; Jonathan D. Cohen; Zachary DeVito; Riyaz Haque; Dan Laney; Edward A. Luke; Felix Wang; David F. Richards; Martin Schulz; Charles H. Still

Parallel machines are becoming more complex with increasing core counts and more heterogeneous architectures. However, the commonly used parallel programming models, C/C++ with MPI and/or OpenMP, make it difficult to write source code that is easily tuned for many targets. Newer language approaches attempt to ease this burden by providing optimization features such as automatic load balancing, overlap of computation and communication, message-driven execution, and implicit data layout optimizations. In this paper, we compare several implementations of LULESH, a proxy application for shock hydrodynamics, to determine strengths and weaknesses of different programming models for parallel computation. We focus on four traditional (OpenMP, MPI, MPI+OpenMP, CUDA) and four emerging (Chapel, Charm++, Liszt, Loci) programming models. In evaluating these models, we focus on programmer productivity, performance and ease of applying optimizations.

computational science and engineering | 1998

The case for high-level parallel programming in ZPL

Bradford L. Chamberlain; Sung-Eun Choi; E.C. Lewis; Lawrence Snyder; W.D. Weathersby; Calvin Lin

Message passing programs are efficient, but fall short on convenience and portability. ZPL is a high level language that offers competitive performance and portability, as well as programming conveniences lacking in low level approaches. ZPL runs on a variety of parallel and sequential computers. We describe the problems with message passing and describe how ZPL simplifies the task of programming for parallel computers-without sacrificing efficiency.

conference on high performance computing (supercomputing) | 2000

A Comparative Study of the NAS MG Benchmark across Parallel Languages and Architectures

Bradford L. Chamberlain; Steven J. Deitz; Lawrence Snyder

Hierarchical algorithms such as multigrid applications form an important cornerstone for scientific computing. In this study, we take a first step toward evaluating parallel language support for hierarchical applications by comparing implementations of the NAS MG benchmark in several parallel programming languages: Co-Array Fortran, High Performance Fortran, Single Assignment C, and ZPL. We evaluate each language in terms of its portability, its performance, and its ability to express the algorithm clearly and concisely. Experimental platforms include the Cray T3E, IBM SP, SGI Origin, Sun Enterprise 5500, and a high-performance Linux cluster. Our findings indicate that while it is possible to achieve good portability, performance, and expressiveness, most languages currently fall short in at least one of these areas. We find a strong correlation between expressiveness and a language’s support for a global view of computation, and we identify key factors for achieving portable performance in multigrid applications.

high level parallel programming models and supportive environments | 1998

ZPL's WYSIWYG performance model

Bradford L. Chamberlain; Calvin Lin; Sung-Eun Choi; Lawrence Snyder; E.C. Lewis; W.D. Weathersby

ZPL is a parallel array language designed for high performance scientific and engineering computations. Unlike other parallel languages, ZPL is founded on a machine model (the CTA) that accurately abstracts contemporary MIMD parallel computers. This makes it possible to correlate ZPL programs with machine behavior. As a result, programmers can reason about how code will perform on a typical parallel machine and thereby make informed decisions between alternative programming solutions. The paper describes ZPLs performance model and its syntactic cues for conveying operation cost. The what you see is what you get (WYSIWYG) nature of ZPL operations is demonstrated on the IBM SP-2, Intel Paragon, SGI Power Challenge, and Cray T3E. Additionally, the model is used to evaluate two algorithms for matrix multiplication. Experiments show that the performance model correctly predicts the faster solution on all four platforms for a range of problem sizes.

ACM Sigapl Apl Quote Quad | 1998

Regions: an abstraction for expressing array computation

Bradford L. Chamberlain; E. Christopher Lewis; Calvin Lin; Lawrence Snyder

Most array languages, including Fortran 90, Matlab, and APL, provide support for referencing arrays by extending the traditional array subscripting construct found in scalar languages. We present an alternative to subscripting that exploits the concept of regions---an index set representation that can be named, manipulated with high-level operators, and syntactically separated from array references. This paper develops the concept of region-based programming and describes its benefits in the context of an idealized array language called RL. We show that regions simplify programming, reduce the likelihood of errors, and enable code reuse. Furthermore, we describe how regions accentuate the locality of array expressions and how this locality is important when targeting parallel computers. We also show how the concepts of region-based programming have been used in ZPL, a fully-implemented practical parallel programming language in use by scientists and engineers. In addition, we contrast region-based programming with the array reference constructs of other array languages.

international parallel and distributed processing symposium | 2012

Performance Portability with the Chapel Language

Albert Sidelnik; Saeed Maleki; Bradford L. Chamberlain; María J. Garzar'n; David A. Padua

It has been widely shown that high-throughput computing architectures such as GPUs offer large performance gains compared with their traditional low-latency counterparts for many applications. The downside to these architectures is that the current programming models present numerous challenges to the programmer: lower-level languages, loss of portability across different architectures, explicit data movement, and challenges in performance optimization. This paper presents novel methods and compiler transformations that increase programmer productivity by enabling users of the language Chapel to provide a single code implementation that the compiler can then use to target not only conventional multiprocessors, but also high-throughput and hybrid machines. Rather than resorting to different parallel libraries or annotations for a given parallel platform, this work leverages a language that has been designed from first principles to address the challenge of programming for parallelism and locality. This also has the advantage of providing portability across different parallel architectures. Finally, this work presents experimental results from the Parboil benchmark suite which demonstrate that codes written in Chapel achieve performance comparable to the original versions implemented in CUDA on both GPUs and multicore platforms.

international conference on supercomputing | 2001

Eliminating redundancies in sum-of-product array computations

Steven J. Deitz; Bradford L. Chamberlain; Lawrence Snyder

Array programming languages such as Fortran 90, High Performance Fortran and ZPL are well-suited to scientific computing because they free the scientist from the responsibility of managing burdensome low-level details that complicate programming in languages like C and Fortran 77. However, these burdensome details are critical to performance, thus necessitating aggressive compilation techniques for their optimization. In this paper, we present a new compiler optimization called Array Subexpression Elimination (ASE) that lets a programmer take advantage of the expressibility afforded by array languages and achieve enviable portability and performance. We design a set of micro-benchmarks that model an important class of computations known as stencils and we report on our implementation of this optimization in the context of this micro-benchmark suite. Our results include a 125% improvement on one of these benchmarks and a 50% average speedup across the suite. Also we show a speedup of 32% improvement on the ZPL port of the NAS MG Parallel Benchmark and a 29% speedup over the hand-optimized Fortran version. Further, the compilation time is only negligibly affected.

conference on high performance computing (supercomputing) | 1997

Portable Performance of Data Parallel Languages

Ton Ngo; Lawrence Snyder; Bradford L. Chamberlain

A portable program yields consistent performance on different platforms. We study the portable performance of three NAS benchmarks compiled with three commercial HPF compilers on the IBM SP2. Each benchmark is evaluated using DO loops and F90 constructs. Base-line comparison is provided by Fortran/MPI and ZPL. The HPF results show some scalable performance but indicate a considerable portability problem. First, relying on the compiler alone for extensive analysis and optimization leads to unpredictable performance. Second, differences in the parallelization strategies often require compiler specific customization. The results suggest that the foremost criteria for portability is a concise performance model.

Explore More