Mike Houston | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Mike Houston is active.

Explore More

Publication

Featured researches published by Mike Houston.

international conference on computer graphics and interactive techniques | 2004

Brook for GPUs: stream computing on graphics hardware

Ian Buck; Tim Foley; Daniel Reiter Horn; Jeremy Sugerman; Kayvon Fatahalian; Mike Houston; Pat Hanrahan

In this paper, we present Brook for GPUs, a system for general-purpose computation on programmable graphics hardware. Brook extends C to include simple data-parallel constructs, enabling the use of the GPU as a streaming co-processor. We present a compiler and runtime system that abstracts and virtualizes many aspects of graphics hardware. In addition, we present an analysis of the effectiveness of the GPU as a compute engine compared to the CPU, to determine when the GPU can outperform the CPU for a particular algorithm. We evaluate our system with five applications, the SAXPY and SGEMV BLAS operators, image segmentation, FFT, and ray tracing. For these applications, we demonstrate that our Brook implementations perform comparably to hand-written GPU code and up to seven times faster than their CPU counterparts.

international conference on computer graphics and interactive techniques | 2002

Chromium: a stream-processing framework for interactive rendering on clusters

Greg Humphreys; Mike Houston; Ren Ng; Randall J. Frank; Sean Ahern; P. D. Kirchner; James T. Klosowski

We describe Chromium, a system for manipulating streams of graphics API commands on clusters of workstations. Chromiums stream filters can be arranged to create sort-first and sort-last parallel graphics architectures that, in many cases, support the same applications while using only commodity graphics accelerators. In addition, these stream filters can be extended programmatically, allowing the user to customize the stream transformations performed by nodes in a cluster. Because our stream processing mechanism is completely general, any cluster-parallel rendering algorithm can be either implemented on top of or embedded in Chromium. In this paper, we give examples of real-world applications that use Chromium to achieve good scalability on clusters of workstations, and describe other potential uses of this stream processing technology. By completely abstracting the underlying graphics architecture, network topology, and API command processing semantics, we allow a variety of applications to run in different environments.

conference on high performance computing (supercomputing) | 2006

Sequoia: programming the memory hierarchy

Kayvon Fatahalian; Daniel Reiter Horn; Timothy J. Knight; Larkhoon Leem; Mike Houston; Ji Young Park; Mattan Erez; Manman Ren; Alex Aiken; William J. Dally; Pat Hanrahan

We present Sequoia, a programming language designed to facilitate the development of memory hierarchy aware parallel programs that remain portable across modern machines featuring different memory hierarchy configurations. Sequoia abstractly exposes hierarchical memory in the programming model and provides language mechanisms to describe communication vertically through the machine and to localize computation to particular memory locations within it. We have implemented a complete programming system, including a compiler and runtime systems for cell processor-based blade systems and distributed memory clusters, and demonstrate efficient performance running Sequoia programs on both of these platforms

Journal of Computational Chemistry | 2009

Accelerating molecular dynamic simulation on graphics processing units

Mark S. Friedrichs; Peter Eastman; Vishal Vaidyanathan; Mike Houston; Scott M. LeGrand; Adam L. Beberg; Daniel L. Ensign; Christopher M. Bruns; Vijay S. Pande

We describe a complete implementation of all‐atom protein molecular dynamics running entirely on a graphics processing unit (GPU), including all standard force field terms, integration, constraints, and implicit solvent. We discuss the design of our algorithms and important optimizations needed to fully take advantage of a GPU. We evaluate its performance, and show that it can be more than 700 times faster than a conventional implementation running on a single CPU core.

Communications of The ACM | 2008

A closer look at GPUs

Kayvon Fatahalian; Mike Houston

As the line between GPUs and CPUs begins to blur, its important to understand what makes GPUs tick.

conference on high performance computing (supercomputing) | 2005

ClawHMMER: A Streaming HMMer-Search Implementatio

Daniel Reiter Horn; Mike Houston; Pat Hanrahan

The proliferation of biological sequence data has motivated the need for an extremely fast probabilistic sequence search. One method for performing this search involves evaluating the Viterbi probability of a hidden Markov model (HMM) of a desired sequence family for each sequence in a protein database. However, one of the difficulties with current implementations is the time required to search large databases. Many current and upcoming architectures offering large amounts of compute power are designed with data-parallel execution and streaming in mind. We present a streaming algorithm for evaluating an HMM’s Viterbi probability and refine it for the specific HMM used in biological sequence search. We implement our streaming algorithm in the Brook language, allowing us to execute the algorithm on graphics processors. We demonstrate that this streaming algorithm on graphics processors can outperform available CPU implementations. We also demonstrate this implementation running on a 16 node graphics cluster.

ieee visualization | 2003

Fast volume segmentation with simultaneous visualization using programmable graphics hardware

Anthony J. Sherbondy; Mike Houston; Sandy Napel

Segmentation of structures from measured volume data, such as anatomy in medical imaging, is a challenging data-dependent task. In this paper, we present a segmentation method that leverages the parallel processing capabilities of modern programmable graphics hardware in order to run significantly faster than previous methods. In addition, collocating the algorithm computation with the visualization on the graphics hardware circumvents the need to transfer data across the system bus, allowing for faster visualization and interaction. This algorithm is unique in that it utilizes sophisticated graphics hardware functionality (i.e., floating point precision, render to texture, computational masking, and fragment programs) to enable fast segmentation and interactive visualization.

international conference on parallel architectures and compilation techniques | 2010

Twin peaks: a software platform for heterogeneous computing on general-purpose and graphics processors

Jayanth Gummaraju; Ben Sander; Laurent Morichetti; Benedict R. Gaster; Mike Houston; Bixia Zheng

Modern processors are evolving into hybrid, heterogeneous processors with both CPU and GPU cores used for generalpurpose computation. Several languages such as Brook, CUDA , and more recently OpenCL are being developed to fully harness the potential of these processors. These languages typically involve the control code running on the CPU and the performance-critical, data-parallel kernel code running on the GPUs. In this paper, we present Twin Peaks, a software platform for heterogeneous computing that executes code originally targeted for GPUs effi ciently on CPUs as well. This permits a more balanced execution between the CPU and GPU, and enables portability of code between these architectures and to CPU-only environments. We propose several techniques in the runtime system to efficiently utilize the caches and functional units present in CPUs. Using OpenCL as a canonical language for heterogeneous computing, and running several experiments on real hardware, we show that our techniques enable GPGPU-style code to execute efficiently on multi core CPUs with minimal runtime overhead. These results also show that for maximum performance, it is beneficial for applications to utilize both CPUs and GPUs as accelerator targets. Categories a nd Subject D escriptors: D.1.3 [Programming Techniques] : Concurrent Programming G eneral Terms: Design , Experimentation, Performance. K eywords: GPGPU, Multicore , OpenCL, Programmability, Runtime.

acm sigplan symposium on principles and practice of parallel programming | 2007

Compilation for explicitly managed memory hierarchies

Timothy J. Knight; Ji Young Park; Manman Ren; Mike Houston; Mattan Erez; Kayvon Fatahalian; Alex Aiken; William J. Dally; Pat Hanrahan

We present a compiler for machines with an explicitly managed memory hierarchy and suggest that a primary role of any compiler for such architectures is to manipulate and schedule a hierarchy of bulk operations at varying scales of the application and of the machine. We evaluate the performance of our compiler using several benchmarks running on a Cell processor.

interactive 3d graphics and games | 2003

Non-invasive interactive visualization of dynamic architectural environments

Christopher Niederauer; Mike Houston; Maneesh Agrawala; Greg Humphreys

We present a system for interactively producing exploded views of 3D architectural environments such as multi-story buildings. These exploded views allow viewers to simultaneously see the internal and external structures of such environments. To create an exploded view we analyze the geometry of the environment to locate individual stories. We then use clipping planes and multipass rendering to separately render each story of the environment in exploded form. Our system operates at the graphics driver level and therefore can be applied to existing OpenGL applications, such as first-person multi-player video games, without modification. The resulting visualization allows users to understand the global structure of architectural environments and to observe the actions of dynamic characters and objects interacting within such environments.We present a system for interactively producing exploded views of 3D architectural environments such as multi-story buildings. These exploded views allow viewers to simultaneously see the internal and external structures of such environments. To create an exploded view we analyze the geometry of the environment to locate individual stories. We then use clipping planes and multipass rendering to separately render each story of the environment in exploded form. Our system operates at the graphics driver level and therefore can be applied to existing OpenGL applications, such as first-person multi-player video games, without modification. The resulting visualization allows users to understand the global structure of architectural environments and to observe the actions of dynamic characters and objects interacting within such environments.

Explore More