John D. Leidel | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where John D. Leidel is active.

Explore More

Publication

Featured researches published by John D. Leidel.

ieee international conference on high performance computing data and analytics | 2012

CHOMP: A Framework and Instruction Set for Latency Tolerant, Massively Multithreaded Processors

John D. Leidel; Kevin R. Wadleigh; Joe Bolding; Tony Brewer; Dean E. Walker

Given the recent advent of the multicore era [1], we find that parallel application performance is no longer solely gated by an architectures core arithmetic unit performance. Memory bandwidth has failed to grow at the same rate as effective core density. This paper presents a framework for constructing tightly coupled, chip-multithreading [CMT] processors that contain specific features well-suited to hiding latency to main memory and executing highly concurrent applications. This framework, deemed the “Convey Hybrid OpenMP” or CHOMP architecture, is built around a RISC instruction set that permits the hardware and software runtime mechanisms to participate in efficient scheduling of concurrent application workloads regardless of the distribution and type of instructions utilized. In this manner, all instructions in CHOMP have the ability to participate in the concurrency algorithms present in the hardware scheduler that drive context switch events. This, coupled with a set of hardware supported extended memory semantic instructions, means that the CHOMP architecture is well suited to executing applications that access memory using non-unit stride or irregular access patterns. Furthermore, the CHOMP architecture and framework contains specific logic and instruction set support that allows application-level, dynamic power gating of individual register files and function pipes.

ieee international symposium on parallel & distributed processing, workshops and phd forum | 2013

Toward a Scalable Heterogeneous Runtime System for the Convey MX Architecture

John D. Leidel; Joe Bolding; Geoffrey Rogers

Given the recent advent of the multicore era [1], research efforts in the area of high performance, low latency runtime systems have increased significantly. This research has given birth to new techniques in low-overhead scheduling techniques, small-memory footprint parallel execution units and kernel-free contextual environments. This paper presents a framework and runtime system for a truly heterogeneous approach to low-latency, high performance runtime techniques on the Convey MX-100 platform and CHOMP micro-architecture [14]. This framework, deemed the Convey Lightweight Runtime [CLR], is designed to provide high performance, programming-model agnostic parallel library support to the massively parallel CHOMP infrastructure. This work explores the fundamental design requirements and implementation details behind constructing the CLR system as a truly heterogeneous low-level runtime system for a wide array of parallel programming model targets.

IEEE Micro | 2017

In-Memory Intelligence

Tim Finkbeiner; Glen E. Hush; Troy D. Larsen; Perry V. Lea; John D. Leidel; Troy A. Manning

Recent activity in near-data processing has built or proposed systems that can exploit technologies such as 3D stacks, in-situ computing, or dataflow devices. However, little effort has been applied to exploit the natural parallelism and throughput of DRAM. This article details research from Micron Technology in the area of processing in memory as a form of memory-centric computing. In-Memory Intelligence (IMI) attempts to place a massive array of bit-serial computing elements on pitch with the memory array, as close to the information as possible. This contrasts with near-memory devices that rely on some form of storage but must communicate with that storage via a fast, low-latency interface. Initial simulations and models show stair-step improvements in performance and power for various applications. Such technology allows DRAM to provide functionality in a heterogeneous system to alleviate the pressures of the von-Neumann barrier.

Archive | 2015