Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where John M. Danskin is active.

Publication


Featured researches published by John M. Danskin.


high-performance computer architecture | 2016

Selective GPU caches to eliminate CPU-GPU HW cache coherence

Neha Agarwal; David W. Nellans; Eiman Ebrahimi; Thomas F. Wenisch; John M. Danskin; Stephen W. Keckler

Cache coherence is ubiquitous in shared memory multiprocessors because it provides a simple, high performance memory abstraction to programmers. Recent work suggests extending hardware cache coherence between CPUs and GPUs to help support programming models with tightly coordinated sharing between CPU and GPU threads. However, implementing hardware cache coherence is particularly challenging in systems with discrete CPUs and GPUs that may not be produced by a single vendor. Instead, we propose, selective caching, wherein we disallow GPU caching of any memory that would require coherence updates to propagate between the CPU and GPU, thereby decoupling the GPU from vendor-specific CPU coherence protocols. We propose several architectural improvements to offset the performance penalty of selective caching: aggressive request coalescing, CPU-side coherent caching for GPU-uncacheable requests, and a CPU-GPU interconnect optimization to support variable-size transfers. Moreover, current GPU workloads access many read-only memory pages; we exploit this property to allow promiscuous GPU caching of these pages, relying on page-level protection, rather than hardware cache coherence, to ensure correctness. These optimizations bring a selective caching GPU implementation to within 93% of a hardware cache-coherent implementation without the need to integrate CPUs and GPUs under a single hardware coherence protocol.


IEEE Micro | 2017

Ultra-Performance Pascal GPU and NVLink Interconnect

Denis Foley; John M. Danskin

This article introduces Nvidias high-performance Pascal GPU. GP100 features in-package high-bandwidth memory, support for efficient FP16 operations, unified memory, and instruction preemption, and incorporates Nvidias NVLink I/O for high-bandwidth connections between GPUs and between GPUs and CPUs.


ieee hot chips symposium | 2016

Pascal GPU with NVLink

John M. Danskin; Denis Foley

This article consists only of a collection of slides from the authors conference presentation.


Archive | 2001

Modified method and apparatus for improved occlusion culling in graphics systems

Edward Colton Greene; Douglas A. Voorhies; Paolo E. Sabella; John M. Danskin; James M. Van Dyke


Archive | 2006

Parallel array architecture for a graphics processor

John M. Danskin; John S. Montrym; John Erik Lindholm; Steven E. Molnar; Mark J. French


Archive | 2003

Occlusion culling method and apparatus for graphics systems

Edward Colton Greene; Douglas A. Voorhies; Paolo E. Sabella; John M. Danskin; James M. Van Dyke


Archive | 2001

Multi-mode texture compression algorithm

John M. Danskin; Gary M. Tarolli; Murali Sundaresan


Archive | 2004

Real-time display post-processing using programmable hardware

Duncan A. Riach; John M. Danskin; Jonah M. Alben; Michael A. Ogrinc; Anthony Michael Tamasi


Archive | 2003

Multiple data buffers for processing graphics data

Rui M. Bastos; John M. Danskin; Matthew N. Papakipos


Archive | 2007

Delayed frame buffer merging with compression

Jonah M. Alben; John M. Danskin; Henry Packard Moreton

Researchain Logo
Decentralizing Knowledge