Thomas W. Fox
IBM
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Thomas W. Fox.
international symposium on microarchitecture | 2012
Ruud A. Haring; Martin Ohmacht; Thomas W. Fox; Michael Karl Gschwind; David L. Satterfield; Krishnan Sugavanam; Paul W. Coteus; Philip Heidelberger; Matthias A. Blumrich; Robert W. Wisniewski; Alan Gara; George Liang-Tai Chiu; Peter A. Boyle; Norman H. Chist; Changhoan Kim
Blue Gene/Q aims to build a massively parallel high-performance computing system out of power-efficient processor chips, resulting in power-efficient, cost-efficient, and floor-space- efficient systems. Focusing on reliability during design helps with scaling to large systems and lowers the total cost of ownership. This article examines the architecture and design of the Compute chip, which combines processors, memory, and communication functions on a single chip.
Ibm Journal of Research and Development | 2015
Ravi Nair; Samuel F. Antao; Carlo Bertolli; Pradip Bose; José R. Brunheroto; Tong Chen; Chen-Yong Cher; Carlos H. Andrade Costa; J. Doi; Constantinos Evangelinos; Bruce M. Fleischer; Thomas W. Fox; Diego S. Gallo; Leopold Grinberg; John A. Gunnels; Arpith C. Jacob; P. Jacob; Hans M. Jacobson; Tejas Karkhanis; Choon Young Kim; Jaime H. Moreno; John Kevin Patrick O'Brien; Martin Ohmacht; Yoonho Park; Daniel A. Prener; Bryan S. Rosenburg; Kyung Dong Ryu; Olivier Sallenave; Mauricio J. Serrano; Patrick Siegl
Many studies point to the difficulty of scaling existing computer architectures to meet the needs of an exascale system (i.e., capable of executing
Ibm Journal of Research and Development | 2003
Jaime H. Moreno; Victor Zyuban; Uzi Shvadron; Fredy D. Neeser; Jeff H. Derby; Malcolm Scott Ware; Krishnan K. Kailas; Ayal Zaks; Amir Geva; Shay Ben-David; Sameh W. Asaad; Thomas W. Fox; Daniel Littrell; Marina Biberstein; Dorit Naishlos; Hillery C. Hunter
10^{18}
international solid-state circuits conference | 2011
Gary S. Ditlow; Robert K. Montoye; Salvatore N. Storino; Sherman M. Dance; Sebastian Ehrenreich; Bruce M. Fleischer; Thomas W. Fox; Kyle M. Holmes; Junichi Mihara; Yutaka Nakamura; Shohji Onishi; Robert Shearer; Dieter Wendel; Leland Chang
floating-point operations per second), consuming no more than 20 MW in power, by around the year 2020. This paper outlines a new architecture, the Active Memory Cube, which reduces the energy of computation significantly by performing computation in the memory module, rather than moving data through large memory hierarchies to the processor core. The architecture leverages a commercially demonstrated 3D memory stack called the Hybrid Memory Cube, placing sophisticated computational elements on the logic layer below its stack of dynamic random-access memory (DRAM) dies. The paper also describes an Active Memory Cube tuned to the requirements of a scientific exascale system. The computational elements have a vector architecture and are capable of performing a comprehensive set of floating-point and integer instructions, predicated operations, and gather-scatter accesses across memory in the Cube. The paper outlines the software infrastructure used to develop applications and to evaluate the architecture, and describes results of experiments on application kernels, along with performance and power projections.
great lakes symposium on vlsi | 2004
Victor Zyuban; Sameh W. Asaad; Thomas W. Fox; Anne-Marie Haen; Daniel Littrell; Jaime H. Moreno
We describe an innovative, low-power, high-performance, programmable signal processor (DSP) for digital communications. The architecture of this processor is characterized by its explicit design for low-power implementations, its innovative ability to jointly exploit instruction-level parallelism and data-level parallelism to achieve high performance, its suitability as a target for an optimizing high-level language compiler, and its explicit replacement of hardware resources by compile-time practices. We describe the methodology used in the development of the processor, highlighting the techniques deployed to enable application/architecture/compiler/implementation co-development, and the optimization approach and metric used for power-performance evaluation and tradeoff analysis. We summarize the salient features of the architecture, provide a brief description of the hardware organization, and discuss the compiler techniques used to exercise these features. We also summarize the simulation environment and associated software development tools. Coding examples from two representative kernels in the digital communications domain are also provided. The resulting methodology, architecture, and compiler represent an advance of the state of the art in the area of low-power, domain-specific microprocessors.
Archive | 2004
Bruce D. D'Amora; Thomas W. Fox
In multi-ported register files, memory cell size grows quadratically with the total number of ports due to wordline and bitline wiring. Reducing the number of physical access ports in a memory cell can thus lead to significant area and power savings as well as latency improvement. Double-pumped register files operate access ports twice in a single clock period to reduce area by halving the number of physical ports in the memory cell — a technique often confined to low-frequency applications. Replication of a memory cell in separate arrays halves the number of physical read ports in each copy. In this work, double-pumped write ports and replicated read ports are applied to a 4R2W register file in a highperformance microprocessor product [1]. This paper describes detailed implementation and measured hardware characteristics of this array and demonstrates a fast error correction scheme. The techniques used balance high efficiency and low latency and thus differ from previous work, in which double-pumped ports perform a write followed by a read in a very large register file [2] or where write ports are double-pumped without cell-level read port reduction [3].
Archive | 2012
Bruce M. Fleischer; Thomas W. Fox; Hans M. Jacobson; Ravi Nair
We describe a semi-custom design methodology for embedded processor cores that was prototyped through the development of a low power high performance DSP core. When compared to the standard ASIC design flow, this methodology enables significant improvement in the speed and power; such benefits are obtained without compromising the generality and flexibility that characterizes the ASIC-based design techniques. Our methodology achieves fast turn-around time in the process from RTL description to post-PD timing results, and exhibits stable convergence on timing; these characteristics enable the application of optimizations spanning multiple levels of the design hierarchy. Such optimizations proved to be much more effective than those that focus only on a single design stage.
Archive | 2000
Thomas W. Fox; Mark Ernest Van Nostrand
Archive | 2000
Gordon Clyde Fossum; Thomas W. Fox
Archive | 2003
Shay Ben-David; Jeffrey Haskell Derby; Thomas W. Fox; Fredy D. Neeser; Jamie H. Moreno; Uzi Shvadron; Ayal Zaks