Is this you? Create Your Porfile

Eric Holk

Indiana University Bloomington

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Eric Holk is active.

Explore More

Publication

Featured researches published by Eric Holk.

irregular applications: architectures and algorithms | 2015

Dynamic parallelism for simple and efficient GPU graph algorithms

Peter Zhang; Eric Holk; John Matty; Samantha Misurda; Marcin Zalewski; Jonathan Chu; Scott McMillan; Andrew Lumsdaine

Dynamic parallelism allows GPU kernels to launch additional kernels at runtime directly from the GPU. In this paper we show that dynamic parallelism enables relatively simple high-performance graph algorithms for GPUs. We present breadth-first search (BFS) and single-source shortest paths (SSSP) algorithms that use dynamic parallelism to adapt to the irregular and data-driven nature of these problems. Our approach results in simple code that closely follows the high-level description of the algorithms but yields performance competitive with the current state of the art.

conference on object-oriented programming systems, languages, and applications | 2014

Region-based memory management for GPU programming languages: enabling rich data structures on a spartan host

Eric Holk; Ryan R. Newton; Jeremy G. Siek; Andrew Lumsdaine

Graphics processing units (GPUs) can effectively accelerate many applications, but their applicability has been largely limited to problems whose solutions can be expressed neatly in terms of linear algebra. Indeed, most GPU programming languages limit the user to simple data structures - typically only multidimensional rectangular arrays of scalar values. Many algorithms are more naturally expressed using higher level language features, such as algebraic data types (ADTs) and first class procedures, yet building these structures in a manner suitable for a GPU remains a challenge. We present a region-based memory management approach that enables rich data structures in Harlan, a language for data parallel computing. Regions enable rich data structures by providing a uniform representation for pointers on both the CPU and GPU and by providing a means of transferring entire data structures between CPU and GPU memory. We demonstrate Harlans increased expressiveness on several example programs and show that Harlan performs well on more traditional data-parallel problems.

ieee international symposium on parallel & distributed processing, workshops and phd forum | 2013

GPU Programming in Rust: Implementing High-Level Abstractions in a Systems-Level Language

Eric Holk; Milinda Pathirage; Arun Chauhan; Andrew Lumsdaine; Nicholas D. Matsakis

Graphics processing units (GPUs) have the potential to greatly accelerate many applications, yet programming models remain too low level. Many language-based solutions to date have addressed this problem by creating embedded domain-specific languages that compile to CUDA or OpenCL. These targets are meant for human programmers and thus are less than ideal compilation targets. LLVM recently gained a compilation target for PTX, NVIDIAs low-level virtual instruction set for GPUs. This lower-level representation is more expressive than CUDA and OpenCL, making it easier to support advanced language features such as abstract data types or even certain closures. We demonstrate the effectiveness of this approach by extending the Rust programming language with support for GPU kernels. At the most basic level, our extensions provide functionality that is similar to that of CUDA. However, our approach seamlessly integrates with many of Rusts features, making it easy to build a library of ergonomic abstractions for data parallel computing. This approach provides the expressiveness of a high level GPU language like Copperhead or Accelerate, yet also provides the programmer the power needed to create new abstractions when those we have provided are insufficient.

functional high performance computing | 2015

Meta-programming and auto-tuning in the search for high performance GPU code

Michael Vollmer; Bo Joel Svensson; Eric Holk; Ryan R. Newton

Writing high performance GPGPU code is often difficult and time-consuming, potentially requiring laborious manual tuning of low-level details. Despite these challenges, the cost in ignoring GPUs in high performance computing is increasingly large. Auto-tuning is a potential solution to the problem of tedious manual tuning. We present a framework for auto-tuning GPU kernels which are expressed in an embedded DSL, and which expose compile-time parameters for tuning. Our framework allows for kernels to be polymorphic over what search strategy will tune them, and allows search strategies to be implemented in the same meta-language as the kernel-generation code (Haskell). Further, we show how to use functional programming abstractions to enforce regular (hyper-rectangular) search spaces. We also evaluate several common search strategies on a variety of kernels, and demonstrate that the framework can tune both EDSL and ordinary CUDA code.

Proceedings of the first ACM SIGPLAN workshop on Functional art, music, modeling & design | 2013

Visualizing the turing tarpit

Jason Hemann; Eric Holk

Minimal programming languages like Jot generate limited interest outside of the community of languages enthusiasts. This is unfortunate, because the simplicity of these languages endows them with an inherent beauty and provides deep insight into the nature of computation. We present a way of visualizing the behavior of many Jot programs at once, providing interesting images and also hinting at somewhat non-obvious relationships between programs. In the same way that fractals research has yielded new mathematical insights, research into visualization such as that presented here could produce new perspectives on the structure and nature of computation. A gallery containing the visualizations presented herein can be found at http://tarpit.github.io/TarpitGazer.

languages and compilers for parallel computing | 2015

An Embedded DSL for High Performance Declarative Communication with Correctness Guarantees in C

Nilesh Mahajan; Eric Holk; Arun Chauhan; Andrew Lumsdaine

High performance programming using explicit communication calls needs considerable programming expertise to optimize. Tuning for performance often involves using asynchronous calls, running the risk of introducing bugs and making the program harder to debug. Techniques to prove desirable program properties, such as deadlock freedom, invariably incur significant performance overheads. We have developed a domain-specific language, embedded in C++, called Kanor that enables programmers to specify the communication declaratively in the Bulk Synchronous Parallel BSP style. Deadlock freedom is guaranteed for well-formed Kanor programs. We start with operational semantics for a subset of Kanor and prove deadlock freedom and determinism properties based on those semantics. We then show how the declarative nature of Kanor allows us to detect and optimize communication patterns.

functional high performance computing | 2015

Converting data-parallelism to task-parallelism by rewrites: purely functional programs across multiple GPUs

Bo Joel Svensson; Michael Vollmer; Eric Holk; Trevor L. McDonell; Ryan R. Newton

High-level domain-specific languages for array processing on the GPU are increasingly common, but they typically only run on a single GPU. As computational power is distributed across more devices, languages must target multiple devices simultaneously. To this end, we present a compositional translation that fissions data-parallel programs in the Accelerate language, allowing subsequent compiler and runtime stages to map computations onto multiple devices for improved performance---even programs that begin as a single data-parallel kernel.

parallel computing | 2011