Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Blake A. Hechtman is active.

Publication


Featured researches published by Blake A. Hechtman.


high-performance computer architecture | 2014

QuickRelease: A throughput-oriented approach to release consistency on GPUs

Blake A. Hechtman; Shuai Che; Derek R. Hower; Yingying Tian; Bradford M. Beckmann; Mark D. Hill; Steven K. Reinhardt; David A. Wood

Graphics processing units (GPUs) have specialized throughput-oriented memory systems optimized for streaming writes with scratchpad memories to capture locality explicitly. Expanding the utility of GPUs beyond graphics encourages designs that simplify programming (e.g., using caches instead of scratchpads) and better support irregular applications with finer-grain synchronization. Our hypothesis is that, like CPUs, GPUs will benefit from caches and coherence, but that CPU-style “read for ownership” (RFO) coherence is inappropriate to maintain support for regular streaming workloads. This paper proposes QuickRelease (QR), which improves on conventional GPU memory systems in two ways. First, QR uses a FIFO to enforce the partial order of writes so that synchronization operations can complete without frequent cache flushes. Thus, non-synchronizing threads in QR can re-use cached data even when other threads are performing synchronization. Second, QR partitions the resources required by reads and writes to reduce the penalty of writes on read performance. Simulation results across a wide variety of general-purpose GPU workloads show that QR achieves a 7% average performance improvement compared to a conventional GPU memory system. Furthermore, for emerging workloads with finer-grain synchronization, QR achieves up to 42% performance improvement compared to a conventional GPU memory system without the scalability challenges of RFO coherence. To this end, QR provides a throughput-oriented solution to provide fine-grain synchronization on GPUs.


international symposium on computer architecture | 2013

Exploring memory consistency for massively-threaded throughput-oriented processors

Blake A. Hechtman; Daniel J. Sorin

We re-visit the issue of hardware consistency models in the new context of massively-threaded throughput-oriented processors (MTTOPs). A prominent example of an MTTOP is a GPGPU, but other examples include Intels MIC architecture and some recent academic designs. MTTOPs differ from CPUs in many significant ways, including their ability to tolerate latency, their memory system organization, and the characteristics of the software they run. We compare implementations of various hardware consistency models for MTTOPs in terms of performance, energy-efficiency, hardware complexity, and programmability. Our results show that the choice of hardware consistency model has a surprisingly minimal impact on performance and thus the decision should be based on hardware complexity, energy-efficiency, and programmability. For many MTTOPs, it is likely that even a simple implementation of sequential consistency is attractive.


international symposium on performance analysis of systems and software | 2013

Evaluating cache coherent shared virtual memory for heterogeneous multicore chips

Blake A. Hechtman; Daniel J. Sorin

Although current homogeneous chips tightly couple the cores with cache-coherent shared virtual memory (CCSVM), this is not the communication paradigm used by any current heterogeneous chip. In this paper, we present a CCSVM design for a CPU/GPU chip, as well as an extension of the pthreads programming model for programming this HMC. We experimentally compare CCSVM/xthreads to a state-of-the-art CPU/GPU chip from AMD that runs OpenCL software. CCSVMs more efficient communication enables far better performance and far fewer DRAM accesses.


architectural support for programming languages and operating systems | 2014

Heterogeneous-race-free memory models

Derek R. Hower; Blake A. Hechtman; Bradford M. Beckmann; Benedict R. Gaster; Mark D. Hill; Steven K. Reinhardt; David A. Wood


Archive | 2014

METHOD FOR MEMORY CONSISTENCY AMONG HETEROGENEOUS COMPUTER COMPONENTS

Derek R. Hower; Mark D. Hill; David A. Wood; Steven K. Reinhardt; Benedict R. Gaster; Blake A. Hechtman; Bradford M. Beckmann


Archive | 2013

HIERARCHICAL WRITE-COMBINING CACHE COHERENCE

Blake A. Hechtman; Bradford M. Beckmann


Archive | 2015

WRITE COMBINING CACHE MICROARCHITECTURE FOR SYNCHRONIZATION EVENTS

Blake A. Hechtman; Bradford M. Beckmann


Archive | 2012

The Limits of Concurrency in Cache Coherence

Blake A. Hechtman; Daniel J. Sorin


arXiv: Distributed, Parallel, and Cluster Computing | 2016

TREES: A CPU/GPU Task-Parallel Runtime with Explicit Epoch Synchronization

Blake A. Hechtman; Andrew D. Hilton; Daniel J. Sorin


Archive | 2014

RUNTIME FOR AUTOMATICALLY LOAD-BALANCING AND SYNCHRONIZING HETEROGENEOUS COMPUTER SYSTEMS WITH SCOPED SYNCHRONIZATION

Blake A. Hechtman; Derek R. Hower

Collaboration


Dive into the Blake A. Hechtman's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar

Derek R. Hower

University of Wisconsin-Madison

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

David A. Wood

University of Wisconsin-Madison

View shared research outputs
Top Co-Authors

Avatar

Mark D. Hill

University of Wisconsin-Madison

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Shuai Che

Advanced Micro Devices

View shared research outputs
Researchain Logo
Decentralizing Knowledge