Thomas W. Fox | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Thomas W. Fox is active.

Explore More

Publication

Featured researches published by Thomas W. Fox.

international symposium on microarchitecture | 2012

The IBM Blue Gene/Q Compute Chip

Ruud A. Haring; Martin Ohmacht; Thomas W. Fox; Michael Karl Gschwind; David L. Satterfield; Krishnan Sugavanam; Paul W. Coteus; Philip Heidelberger; Matthias A. Blumrich; Robert W. Wisniewski; Alan Gara; George Liang-Tai Chiu; Peter A. Boyle; Norman H. Chist; Changhoan Kim

Blue Gene/Q aims to build a massively parallel high-performance computing system out of power-efficient processor chips, resulting in power-efficient, cost-efficient, and floor-space- efficient systems. Focusing on reliability during design helps with scaling to large systems and lowers the total cost of ownership. This article examines the architecture and design of the Compute chip, which combines processors, memory, and communication functions on a single chip.

Ibm Journal of Research and Development | 2015

Active Memory Cube: A processing-in-memory architecture for exascale systems

Ravi Nair; Samuel F. Antao; Carlo Bertolli; Pradip Bose; José R. Brunheroto; Tong Chen; Chen-Yong Cher; Carlos H. Andrade Costa; J. Doi; Constantinos Evangelinos; Bruce M. Fleischer; Thomas W. Fox; Diego S. Gallo; Leopold Grinberg; John A. Gunnels; Arpith C. Jacob; P. Jacob; Hans M. Jacobson; Tejas Karkhanis; Choon Young Kim; Jaime H. Moreno; John Kevin Patrick O'Brien; Martin Ohmacht; Yoonho Park; Daniel A. Prener; Bryan S. Rosenburg; Kyung Dong Ryu; Olivier Sallenave; Mauricio J. Serrano; Patrick Siegl

Many studies point to the difficulty of scaling existing computer architectures to meet the needs of an exascale system (i.e., capable of executing

Ibm Journal of Research and Development | 2003

An innovative low-power high-performance programmable signal processor for digital communications

Jaime H. Moreno; Victor Zyuban; Uzi Shvadron; Fredy D. Neeser; Jeff H. Derby; Malcolm Scott Ware; Krishnan K. Kailas; Ayal Zaks; Amir Geva; Shay Ben-David; Sameh W. Asaad; Thomas W. Fox; Daniel Littrell; Marina Biberstein; Dorit Naishlos; Hillery C. Hunter

10^{18}

international solid-state circuits conference | 2011

A 4R2W register file for a 2.3GHz wire-speed POWER™ processor with double-pumped write operation

Gary S. Ditlow; Robert K. Montoye; Salvatore N. Storino; Sherman M. Dance; Sebastian Ehrenreich; Bruce M. Fleischer; Thomas W. Fox; Kyle M. Holmes; Junichi Mihara; Yutaka Nakamura; Shohji Onishi; Robert Shearer; Dieter Wendel; Leland Chang

floating-point operations per second), consuming no more than 20 MW in power, by around the year 2020. This paper outlines a new architecture, the Active Memory Cube, which reduces the energy of computation significantly by performing computation in the memory module, rather than moving data through large memory hierarchies to the processor core. The architecture leverages a commercially demonstrated 3D memory stack called the Hybrid Memory Cube, placing sophisticated computational elements on the logic layer below its stack of dynamic random-access memory (DRAM) dies. The paper also describes an Active Memory Cube tuned to the requirements of a scientific exascale system. The computational elements have a vector architecture and are capable of performing a comprehensive set of floating-point and integer instructions, predicated operations, and gather-scatter accesses across memory in the Cube. The paper outlines the software infrastructure used to develop applications and to evaluate the architecture, and describes results of experiments on application kernels, along with performance and power projections.

great lakes symposium on vlsi | 2004

Design methodology for semi custom processor cores

Victor Zyuban; Sameh W. Asaad; Thomas W. Fox; Anne-Marie Haen; Daniel Littrell; Jaime H. Moreno

We describe an innovative, low-power, high-performance, programmable signal processor (DSP) for digital communications. The architecture of this processor is characterized by its explicit design for low-power implementations, its innovative ability to jointly exploit instruction-level parallelism and data-level parallelism to achieve high performance, its suitability as a target for an optimizing high-level language compiler, and its explicit replacement of hardware resources by compile-time practices. We describe the methodology used in the development of the processor, highlighting the techniques deployed to enable application/architecture/compiler/implementation co-development, and the optimization approach and metric used for power-performance evaluation and tradeoff analysis. We summarize the salient features of the architecture, provide a brief description of the hardware organization, and discuss the compiler techniques used to exercise these features. We also summarize the simulation environment and associated software development tools. Coding examples from two representative kernels in the digital communications domain are also provided. The resulting methodology, architecture, and compiler represent an advance of the state of the art in the area of low-power, domain-specific microprocessors.

Archive | 2004

Programmable graphics processing engine

Bruce D. D'Amora; Thomas W. Fox

In multi-ported register files, memory cell size grows quadratically with the total number of ports due to wordline and bitline wiring. Reducing the number of physical access ports in a memory cell can thus lead to significant area and power savings as well as latency improvement. Double-pumped register files operate access ports twice in a single clock period to reduce area by halving the number of physical ports in the memory cell — a technique often confined to low-frequency applications. Replication of a memory cell in separate arrays halves the number of physical read ports in each copy. In this work, double-pumped write ports and replicated read ports are applied to a 4R2W register file in a highperformance microprocessor product [1]. This paper describes detailed implementation and measured hardware characteristics of this array and demonstrates a fast error correction scheme. The techniques used balance high efficiency and low latency and thus differ from previous work, in which double-pumped ports perform a write followed by a read in a very large register file [2] or where write ports are double-pumped without cell-level read port reduction [3].

Archive | 2012

Sequential location accesses in an active memory device

Bruce M. Fleischer; Thomas W. Fox; Hans M. Jacobson; Ravi Nair

We describe a semi-custom design methodology for embedded processor cores that was prototyped through the development of a low power high performance DSP core. When compared to the standard ASIC design flow, this methodology enables significant improvement in the speed and power; such benefits are obtained without compromising the generality and flexibility that characterizes the ASIC-based design techniques. Our methodology achieves fast turn-around time in the process from RTL description to post-PD timing results, and exhibits stable convergence on timing; these characteristics enable the application of optimizations spanning multiple levels of the design hierarchy. Such optimizations proved to be much more effective than those that focus only on a single design stage.

Archive | 2000