Harm Peter Hofstee | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Harm Peter Hofstee is active.

Explore More

Publication

Featured researches published by Harm Peter Hofstee.

Ibm Journal of Research and Development | 2005

Introduction to the cell multiprocessor

James Allan Kahle; Michael Norman Day; Harm Peter Hofstee; Charles Ray Johns; T. R. Maeurer; David Shippy

This paper provides an introductory overview of the Cell multiprocessor. Cell represents a revolutionary extension of conventional microprocessor architecture and organization. The paper discusses the history of the project, the program objectives and challenges, the disign concept, the architecture and programming models, and the implementation.

IEEE Micro | 2006

Synergistic Processing in Cell's Multicore Architecture

Michael Karl Gschwind; Harm Peter Hofstee; Brian Flachs; M. Hopkin; Y. Watanabe; T. Yamazaki

Eight synergistic processor units enable the Cell Broadband Engines breakthrough performance. The SPU architecture implements a novel, pervasively data-parallel architecture combining scalar and SIMD processing on a wide data path. A large number of SPUs per chip provide high thread-level parallelism. The streamlined architecture provides an efficient multithreaded execution environment for both scalar and SIMD threads and represents a reaffirmation of the RISC principles of combining leading edge architecture and compiler optimizations. These design decisions have enabled the Cell BE to deliver unprecedented supercomputer-class compute power for consumer applications

IEEE Journal of Solid-state Circuits | 2006

The microarchitecture of the synergistic processor for a cell processor

Brian Flachs; Shigehiro Asano; Sang Hoo Dhong; Harm Peter Hofstee; Gilles Gervais; Roy Kim; T. Le; Peichun Liu; Jens Leenstra; John Samuel Liberty; Brad W. Michael; Hwa-Joon Oh; Silvia Melitta Mueller; Osamu Takahashi; A. Hatakeyama; Yukio Watanabe; Naoka Yano; Daniel Alan Brokenshire; Mohammad Peyravian; Vandung To; E. Iwata

This paper describes an 11 FO4 streaming data processor in the IBM 90-nm SOI-low-k process. The dual-issue, four-way SIMD processor emphasizes achievable performance per area and power. Software controls most aspects of data movement and instruction flow to improve memory system performance and core performance density. The design minimizes instruction latency while providing for fine grain clock control to reduce power.

international symposium on microarchitecture | 1998

Designing for a gigahertz [guTS integer processor]

Harm Peter Hofstee; Sang Hoo Dhong; David Meltzer; Kevin J. Nowka; Joel Abraham Silberman; J.I. Burns; Stephen D. Posluszny; Osamu Takahashi

At the IEEE International Solid State Circuits Conference this February, the IBM Austin Research Laboratory presented an experimental 64-bit integer processor called guTS (gigahertz unit Test Site). The goal of the guTS project was to demonstrate that circuit techniques, and circuit-centric design, could significantly increase the performance of microprocessors, thus providing headroom for future performance growth beyond contributions from microarchitecture and CMOS technology. To clearly distinguish the design contributions of this project from innovations in CMOS technology we chose a fabrication technology that was in production in 1997. The guTS processor is a full-custom, nearly 100% dynamic design. Its single-issue core implements 96 instructions from the integer subset of the PowerPC instruction set architecture, and covers in excess of 90% of instructions executed in typical code. Address translation, floating-point, and I/O-related instructions are omitted. All instructions, including loads and stores, execute in one cycle. We measured core speeds in excess of a gigahertz. We focus here on the circuit-centric design approach that enabled the gigahertz result. This approach requires designers to operate across the boundaries of microarchitecture, logic, circuit, and physical design. We explain why developments in CMOS technology increasingly favor this approach.

Ibm Journal of Research and Development | 2000

Custom circuit design as a driver of microprocessor performance

D. H. Allen; Sang Hoo Dhong; Harm Peter Hofstee; Jens Leenstra; Kevin J. Nowka; D. L. Stasiak; Dieter Wendel

This paper presents a survey of some of the most aggressive custom designs for CMOS processor products and prototypes in IBM. We argue that microprocessor performance growth, which has traditionally been driven primarily by CMOS technology and microarchitectural improvements, can receive a substantial contribution from improvements in circuit design and physical organization. We predict that in future microprocessor designs the floorplan and wire plan will be as important as the microarchitecture, more control logic will be structured and become indistinguishable from dataflow elements, and more circuits will be designed and analyzed at the level of single transistors and wires.

Ibm Journal of Research and Development | 2013

Big data text-oriented benchmark creation for Hadoop

Anne E. Gattiker; Fadi H. Gebara; Harm Peter Hofstee; Jer Hayes; Anthony N. Hylick

Massive-scale Big Data analytics is representative of a new class of workloads that justifies a rethinking of how computing systems should be optimized. This paper addresses the need for a set of benchmarks that system designers can use to measure the quality of their designs and that customers can use to evaluate competing systems offerings with respect to commonly performed text-oriented workflows in Hadoop™. Additions are needed to existing benchmarks such as HiBench in terms of both scale and relevance. We describe a methodology for creating a petascale data-size text-oriented benchmark that includes representative Big Data workflows and can be used to test total system performance, with demands balanced across storage, network, and computation. Creating such a benchmark requires meeting unique challenges associated with the data size and its often unstructured nature. To be useful, the benchmark also needs to be sufficiently generic to be accepted by the community at large. Here, we focus on a text-oriented Hadoop workflow that consists of three common tasks: categorizing text documents, identifying significant documents within each category, and analyzing significant documents for new topic creation.

Ibm Journal of Research and Development | 2007

Microarchitecture and implementation of the synergistic processor in 65-nm and 90-nm SOI

Brian Flachs; S. Asano; Sang Hoo Dhong; Harm Peter Hofstee; Gilles Gervais; Roy Moonseuk Kim; T. N. Le; P. Liu; Jens Leenstra; John Samuel Liberty; Brad W. Michael; H.-J. Oh; Stefan Mueller; Osamu Takahashi; K. Hirairi; A. Kawasumii; H. Murakami; H. Noro; S. Onishi; J. Pille; J. Silberman; S. Yong; A. Hatakeyama; Y. Watanabe; Naoka Yano; Daniel Alan Brokenshire; Mohammad Peyravian; V. To; Eiji Iwata

This paper describes the architecture and implementation of the original gaming-oriented synergistic processor element (SPE) in both 90-nm and 65-nm silicon-on-insulator (SOI) technology and introduces a new SPE implementation targeted for the high-performance computing community. The Cell Broadband Engine™ processor contains eight SPEs. The dual-issue, four-way single-instruction multiple-data processor is designed to achieve high performance per area and power and is optimized to process streaming data, simulate physical phenomena, and render objects digitally. Most aspects of data movement and instruction flow are controlled by software to improve the performance of the memory system and the core performance density. The SPE was designed as an 11-F04 (fan-out-of-4-inverter-delay) processor using 20.9 million transistors within 14.8 mm 2 using the IBM 90-nm SOI low-k process. CMOS (complementary metal-oxide semiconductor) static gates implement the majority of the logic. Dynamic circuits are used in critical areas and occupy 19% of the non-static random access memory (SRAM) area. Instruction set architecture, microarchitecture, and physical implementation are tightly coupled to achieve a compact and power-efficient design. Correct operation has been observed at up to 5.6 GHz and 7.3 GHz, respectively, in 90-nm and 65-nm SOI technology.

Ibm Journal of Research and Development | 2013

True hardware random number generation implemented in the 32-nm SOI POWER7+ processor

John Samuel Liberty; A. Barrera; David William Boerstler; T. B. Chadwick; S. R. Cottier; Harm Peter Hofstee; J. A. Rosser; M. L. Tsai

This paper provides a description of the hardware random number generator that is implemented on the IBM POWER7+™ processor. We discuss the underlying mechanism using basic ring oscillator circuits implemented in standard digital logic circuits. The source of entropy is based on sampling phase jitter in the ring oscillators, and the rate of phase jitter accumulation is measured. We show that the design is simple and robust yet able to generate a high rate of random bits while using a minimum of logic area. The design is very resistant to physical manipulation, being able to produce solid entropy values under environmental conditions that exceed the requirements of the surrounding circuitry. With a design-specific mechanism to correct for ring oscillator sample bias, the output shows a very high rate of entropy, which is validated.

Archive | 2000