Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Charles D. Wait is active.

Publication


Featured researches published by Charles D. Wait.


international conference on parallel architectures and compilation techniques | 2004

A High-Performance SIMD Floating Point Unit for BlueGene/L: Architecture, Compilation, and Algorithm Design

Leonardo R. Bachega; Siddhartha Chatterjee; Kenneth Dockser; John A. Gunnels; Manish Gupta; Fred G. Gustavson; Christopher A. Lapkowski; Gary K. Liu; Mark P. Mendell; Charles D. Wait; T. J. Christopher Ward

We describe the design, implementation, and evaluation of a dual-issue SIMD-like extension of the PowerPC 440 floating-point unit (FPU) core. This extended FPU is targeted at both IBMs massively parallel BlueGene/L machine as well as more pervasive embedded platforms. It has several novel features, such as a computational crossbar and cross-load/store instructions, which enhance the performance of numerical codes. We further discuss the hardware-software co-design that was essential to fully realize the performance benefits of the FPU when constrained by the memory bandwidth limitations and high penalties for misaligned data access imposed by the memory hierarchy on a BlueGene/L node. We describe several novel compiler and algorithmic techniques to take advantage of this architecture. Using both hand-optimized and compiled code for key linear algebraic kernels, we validate the architectural design choices, evaluate the success of the compiler, and quantify the effectiveness of the novel algorithm design techniques. Preliminary performance data shows that the algorithm-compiler-hardware combination delivers a significant fraction of peak floating-point performance for compute-bound kernels such as matrix multiplication, and delivers a significant fraction of peak memory bandwidth for memory-bound kernels such as daxpy, while being largely insensitive to data alignment.


Archive | 2008

Structural Power Reduction in Multithreaded Processor

Stephen Joseph Schwinn; Matthew R. Tubbs; Charles D. Wait


Archive | 2008

Dynamic Merging of Pipeline Stages in an Execution Pipeline to Reduce Power Consumption

Stephen Joseph Schwinn; Matthew R. Tubbs; Charles D. Wait


Archive | 2011

FLOATING POINT EXECUTION UNIT WITH FIXED POINT FUNCTIONALITY

Mark J. Hickey; Adam J. Muff; Matthew R. Tubbs; Charles D. Wait


Archive | 2008

Processing Unit Incorporating Issue Rate-Based Predictive Thermal Management

Stephen Joseph Schwinn; Matthew R. Tubbs; Charles D. Wait


Archive | 2013

FAULT TOLERANT STABILITY CRITICAL EXECUTION CHECKING USING REDUNDANT EXECUTION PIPELINES

Mark J. Hickey; Adam J. Muff; Matthew R. Tubbs; Charles D. Wait


Archive | 2008

Pre-loading context states by inactive hardware thread in advance of context switch

Mark J. Hickey; Stephen Joseph Schwinn; Matthew R. Tubbs; Charles D. Wait


Archive | 2010

Programmable Integrated Processor Blocks

Mark J. Hickey; Eric O. Mejdrich; Adam J. Muff; Paul E. Schardt; Robert A. Shearer; Matthew R. Tubbs; Charles D. Wait


Archive | 2008

Data Dependent Instruction Decode

Mark J. Hickey; Adam J. Muff; Matthew R. Tubbs; Charles D. Wait


Archive | 2006

Rounding floating point division results

Charles D. Wait

Researchain Logo
Decentralizing Knowledge