Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Lars Nyland is active.

Publication


Featured researches published by Lars Nyland.


ieee international conference on high performance computing data and analytics | 2016

Enabling efficient preemption for SIMT architectures with lightweight context switching

Zhen Lin; Lars Nyland; Huiyang Zhou

Context switching is a key technique enabling preemption and time-multiplexing for CPUs. However, for single-instruction multiple-thread (SIMT) processors such as high-end graphics processing units (GPUs), it is challenging to support context switching due to the massive number of threads, which leads to a huge amount of architectural states to be swapped during context switching. The architectural state of SIMT processors includes registers, shared memory, SIMT stacks and barrier states. Recent works present thread-block-level preemption on SIMT processors to avoid context switching overhead. However, because the execution time of a thread block (TB) is highly dependent on the kernel program. The response time of preemption cannot be guaranteed and some TB-level preemption techniques cannot be applied to all kernel functions. In this paper, we propose three complementary ways to reduce and compress the architectural states to achieve lightweight context switching on SIMT processors. Experiments show that our approaches can reduce the register context size by 91.5% on average. Based on lightweight context switching, we enable instruction-level preemption on SIMT processors with compiler and hardware co-design. With our proposed schemes, the preemption latency is reduced by 59.7% on average compared to the naive approach.


Archive | 2007

Virtual architecture and instruction set for parallel thread computing

John R. Nickolls; Henry Packard Moreton; Lars Nyland; Ian Buck; Richard Craig Johnson; Robert Steven Glanville; Jayant B. Kolhe


Archive | 2006

Atomic memory operators in a parallel processor

Ian Buck; John R. Nickolls; Michael C. Shebanow; Lars Nyland


Archive | 2008

Indirect Function Call Instructions in a Synchronous Parallel Thread Processor

Brett W. Coon; John R. Nickolls; Lars Nyland; Peter C. Mills; John Erik Lindholm


Archive | 2008

SYSTEMS AND METHODS FOR COALESCING MEMORY ACCESSES OF PARALLEL THREADS

Lars Nyland; John R. Nickolls; Gentaro Hirota; Tanmoy Mandal


Archive | 2011

Lock Mechanism to Enable Atomic Updates to Shared Memory

Brett W. Coon; John R. Nickolls; Lars Nyland; Peter C. Mills


Archive | 2010

COOPERATIVE THREAD ARRAY REDUCTION AND SCAN OPERATIONS

Brian Fahs; Ming Y. Siu; Brett W. Coon; John R. Nickolls; Lars Nyland


Archive | 2010

Architecture and instructions for accessing multi-dimensional formatted surface memory

John R. Nickolls; Brian Fahs; Lars Nyland; John Erik Lindholm; Richard Craig Johnson


Archive | 2014

SYSTEM, METHOD, AND COMPUTER PROGRAM PRODUCT FOR COLLECTING EXECUTION STATISTICS FOR GRAPHICS PROCESSING UNIT WORKLOADS

Gregory Paul Smith; Lars Nyland


Archive | 2008

SYSTEMS AND METHODS FOR VOTING AMONG PARALLEL THREADS

John R. Nickolls; Lars Nyland; Peter C. Mills; Jeremy Sugerman; Timothy Foley; Brian Fahs; Michael Garland; David Luebke

Collaboration


Dive into the Lars Nyland's collaboration.

Top Co-Authors

Avatar

Ian Buck

University of Virginia

View shared research outputs
Top Co-Authors

Avatar

Gentaro Hirota

University of North Carolina at Chapel Hill

View shared research outputs
Researchain Logo
Decentralizing Knowledge