Arch D. Robison | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Arch D. Robison is active.

Explore More

Publication

Featured researches published by Arch D. Robison.

international parallel and distributed processing symposium | 2008

Optimization via Reflection on Work Stealing in TBB

Arch D. Robison; Michael Voss; Alexey Kukanov

Intelreg Threading Building Blocks (Intelreg TBB) is a C++ library for parallel programming. Its templates for generic parallel loops are built upon nested parallelism and a work-stealing scheduler. This paper discusses optimizations where the high-level algorithm inspects or biases stealing. Two optimizations are discussed in detail. The first dynamically optimizes grain size based on observed stealing. The second improves prior work that exploits cache locality by biased stealing. This paper shows that in a task stealing environment, deferring task spawning can improve performance in some contexts. Performance results for simple kernels are presented.

Computing in Science and Engineering | 2013

Composable Parallel Patterns with Intel Cilk Plus

Arch D. Robison

Intel Cilk Plus extends C and C++ to enable writing composable deterministic parallel software that can exploit both the thread and vector parallelism commonly available in modern hardware.

Proceedings of the 2001 joint ACM-ISCOPE conference on Java Grande | 2001

Impact of economics on compiler optimization

Arch D. Robison

Compile-time program optimizations are similar to poetry: more are written than are actually published in commercial compilers. Hard economic reality is that many interesting optimizations have too narrow an audience to justify their cost in a general-purpose compiler, and custom compilers are too expensive to write. An alternative is to allow programmers to define their own compile-time optimizations. This has already happened accidentally for C++, albeit imperfectly, in the form of template metaprogramming. This paper surveys the problems, the accidental success, and what directions future research might take to circumvent current economic limitations of monolithic compilers.

symposium on computer arithmetic | 2005

N-bit unsigned division via n-bit multiply-add

Arch D. Robison

Integer division on modern processors is expensive compared to multiplication. Previous algorithms for performing unsigned division by an invariant divisor, via reciprocal approximation, suffer in the worst case from a common requirement for n+1 bit multiplication, which typically must be synthesized from n-bit multiplication and extra arithmetic operations. This paper presents, and proves, a hybrid of previous algorithms that replaces n+1 bit multiplication with a single fused multiply-add operation on n-bit operands, thus reducing any n-bit unsigned division to the upper n bits of a multiply-add, followed by a single right shift. An additional benefit is that the prerequisite calculations are simple and fast. On the Itanium/spl reg/ 2 processor, the technique is advantageous for as few as two quotients that share a common run-time divisor.

Concurrency and Computation: Practice and Experience | 2005

Using MPI with C# and the Common Language Infrastructure

Jeremiah Willcock; Andrew Lumsdaine; Arch D. Robison

We describe two different libraries for using the Message Passing Interface (MPI) with the C# programming language and the Common Language Infrastructure (CLI). The first library provides C# bindings that closely match the original MPI library specification. The second library presents a fully object‐oriented interface to MPI and exploits modern language features of C#. The interfaces described here use the P/Invoke feature of the CLI to dispatch to a native implementation of MPI, such as LAM/MPI or MPICH. Performance results using the Shared Source CLI demonstrate only a small performance overhead. Copyright

Proceedings of the 2010 Workshop on Parallel Programming Patterns | 2010

Three layer cake for shared-memory programming

Arch D. Robison; Ralph E. Johnson

There are many different styles of parallel programming for shared-memory hardware. Each style has strengths, but can conflict with other styles. How can we use a variety of these styles in one program and minimize their conflict and maximize performance, readability, and flexibility? This paper surveys the relative advantages and disadvantages of three styles (SIMD, fork join, and message passing), shows how to compose them hierarchically, and advises how to choose what goes at each level in the hierarchy.

Archive | 2010