Arch D. Robison
Intel
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Arch D. Robison.
international parallel and distributed processing symposium | 2008
Arch D. Robison; Michael Voss; Alexey Kukanov
Intelreg Threading Building Blocks (Intelreg TBB) is a C++ library for parallel programming. Its templates for generic parallel loops are built upon nested parallelism and a work-stealing scheduler. This paper discusses optimizations where the high-level algorithm inspects or biases stealing. Two optimizations are discussed in detail. The first dynamically optimizes grain size based on observed stealing. The second improves prior work that exploits cache locality by biased stealing. This paper shows that in a task stealing environment, deferring task spawning can improve performance in some contexts. Performance results for simple kernels are presented.
Computing in Science and Engineering | 2013
Arch D. Robison
Intel Cilk Plus extends C and C++ to enable writing composable deterministic parallel software that can exploit both the thread and vector parallelism commonly available in modern hardware.
Proceedings of the 2001 joint ACM-ISCOPE conference on Java Grande | 2001
Arch D. Robison
Compile-time program optimizations are similar to poetry: more are written than are actually published in commercial compilers. Hard economic reality is that many interesting optimizations have too narrow an audience to justify their cost in a general-purpose compiler, and custom compilers are too expensive to write. An alternative is to allow programmers to define their own compile-time optimizations. This has already happened accidentally for C++, albeit imperfectly, in the form of template metaprogramming. This paper surveys the problems, the accidental success, and what directions future research might take to circumvent current economic limitations of monolithic compilers.
symposium on computer arithmetic | 2005
Arch D. Robison
Integer division on modern processors is expensive compared to multiplication. Previous algorithms for performing unsigned division by an invariant divisor, via reciprocal approximation, suffer in the worst case from a common requirement for n+1 bit multiplication, which typically must be synthesized from n-bit multiplication and extra arithmetic operations. This paper presents, and proves, a hybrid of previous algorithms that replaces n+1 bit multiplication with a single fused multiply-add operation on n-bit operands, thus reducing any n-bit unsigned division to the upper n bits of a multiply-add, followed by a single right shift. An additional benefit is that the prerequisite calculations are simple and fast. On the Itanium/spl reg/ 2 processor, the technique is advantageous for as few as two quotients that share a common run-time divisor.
Concurrency and Computation: Practice and Experience | 2005
Jeremiah Willcock; Andrew Lumsdaine; Arch D. Robison
We describe two different libraries for using the Message Passing Interface (MPI) with the C# programming language and the Common Language Infrastructure (CLI). The first library provides C# bindings that closely match the original MPI library specification. The second library presents a fully object‐oriented interface to MPI and exploits modern language features of C#. The interfaces described here use the P/Invoke feature of the CLI to dispatch to a native implementation of MPI, such as LAM/MPI or MPICH. Performance results using the Shared Source CLI demonstrate only a small performance overhead. Copyright
Proceedings of the 2010 Workshop on Parallel Programming Patterns | 2010
Arch D. Robison; Ralph E. Johnson
There are many different styles of parallel programming for shared-memory hardware. Each style has strengths, but can conflict with other styles. How can we use a variety of these styles in one program and minimize their conflict and maximize performance, readability, and flexibility? This paper surveys the relative advantages and disadvantages of three styles (SIMD, fork join, and message passing), shows how to compose them hierarchically, and advises how to choose what goes at each level in the hierarchy.
Archive | 2010
Alexey Kukanov; Arch D. Robison
Archive | 2007
Arch D. Robison; Paul M. Petersen
Archive | 2005
Arch D. Robison
Archive | 2000
Arch D. Robison