Peter Hofstee
IBM
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Peter Hofstee.
international solid-state circuits conference | 1998
Joel Abraham Silberman; Naoaki Aoki; David William Boerstler; Jeffrey L. Burns; Sang Hoo Dhong; Axel Essbaum; Uttam Shyamalindu Ghoshal; David F. Heidel; Peter Hofstee; Kyung Tek Lee; David Meltzer; Hung Ngo; Kevin J. Nowka; Stephen D. Posluszny; Osamu Takahashi; Ivan Vo; Brian Zoric
This 64 b single-issue integer processor, comprised of about one million transistors, is fabricated in a 0.15 /spl mu/m effective channel length, six-metal-layer CMOS technology. Intended as a vehicle to explore circuit, clocking, microarchitecture, and methodology options for high-frequency processors, the processor prototype implements 60 fixed-point compare, logical, arithmetic, and rotate-merge-mask instructions of the PowerPC instruction-set architecture with single-cycle latency. The processor executes programs written in this instruction subset from cache with a 1 ns cycle. In addition, the prototype implements 36 PowerPC load/store instructions that execute as single-cycle operations (zero wait cycles) with 1.15 ns latency. Full data forwarding and full at speed scan testing are supported.
international conference on computer graphics and interactive techniques | 2008
Bernard Frischer; Dean Abernathy; Gabriele Guidi; Joel Myers; Cassie Thibodeau; Antonio Salvemini; Pascal Müller; Peter Hofstee; Barry L. Minor
Rome Reborn (www.romereborn.virginia.edu) is an international initiative, started in 1996 and based at the Institute for Advanced Technology in the Humanities (IATH; see www.iath.virginia.edu), to create 3D urban models illustrating the development of ancient Rome from the first settlement in the late Bronze Age (ca. 1,000 B.C.) to the depopulation of the city in the early Middle Ages (ca. A.D. 550). Other institutional partners have included the Politecnico di Milano, UCLA, the Université de Caen, and the Ausonius Institute at the Université de Bordeaux-III. Commercial rights to Rome Reborn have been exclusively licensed to Past Perfect Productions s.r.l., a corporation based in Rome, Italy (http://www.pastperfectproductions.com/).
international solid-state circuits conference | 2000
Peter Hofstee; Naoaki Aoki; David William Boerstler; Paula Kristine Coulman; Sang Hoo Dhong; Brian Flachs; N. Kojima; O. Kwon; Kyung Tek Lee; David Meltzer; Kevin J. Nowka; J. Park; J. Peter; Stephen D. Posluszny; M. Shapiro; Joel Abraham Silberman; Osamu Takahashi; B. Weinberger
This 64 b single-issue PowerPC processor contains 19M transistors and is fabricated in 0.12 /spl mu/m L/sub eff/ six-layer copper interconnect CMOS. Nominal processor clock frequency is 1.0 GHz. At the fast end of the process distribution the processor reaches 1.15 GHz (1.87 V, 101/spl deg/C, 112 W). As in a previous design, nearly the entire processor is implemented using delayed-reset and self-resetting dynamic circuit macros. New contributions include: (1) a fully pipelined, four execution-stage IEEE double-precision floating-point unit (FPU) with fused multiply-add. 2) Sum-addressed memory management units (MMUs) and 64 kB 2-cycle caches. (3) Support for the full 64 b PowerPC instruction set. (4) Dynamic PLA-based control. (5) A microarchitecture and floorplan that balances critical paths. (6) Delayed-reset dynamic circuits that support stress testing (burn-in). 7) Improved clock generation and distribution.
design automation conference | 2000
Stephen D. Posluszny; Naoaki Aoki; David William Boerstler; Paula Kristine Coulman; Sang Hoo Dhong; B. Flachs; Peter Hofstee; N. Kojima; O. Kwon; K. Lee; David Meltzer; Kevin J. Nowka; J. Park; J. Peter; Joel Abraham Silberman; Osamu Takahashi; P. Villarrubial
This paper presents a design methodology emphasizing early and quick timing closure for high frequency microprocessor designs. This methodology was used to design a Gigahertz class PowerPC microprocessor with 19 million transistors. Characteristics of “Timing Closure by Design are 1) logic partitioned on timing boundaries, 2) predictable control structures (PLAs), 3) static interfaces for dynamic circuits, 4) low skew clock distribution, 5) deterministic method of macro placement, 6) simplified timing analysis, and 7) refinement method of chip integration with early timing analysis.
IEEE Computer Architecture Letters | 2016
Minghua Li; Guancheng Chen; Qijun Wang; Yonghua Lin; Peter Hofstee; Per Stenström; Dian Zhou
Hardware prefetching on IBMs latest POWER8 processor is able to improve performance of many applications significantly, but it can also cause performance loss for others. The IBM POWER8 processor provides one of the most sophisticated hardware prefetching designs which supports 225 different configurations. Obviously, it is a big challenge to find the optimal or near-optimal hardware prefetching configuration for a specific application. We present a dynamic prefetching tuning scheme in this paper, named prefetch automatic tuner (PATer). PATer uses a prediction model based on machine learning to dynamically tune the prefetch configuration based on the values of hardware performance monitoring counters (PMCs). By developing a two-phase prefetching selection algorithm and a prediction accuracy optimization algorithm in this tool, we identify a set of selected key hardware prefetch configurations that matter mostly to performance as well as a set of PMCs that maximize the machine learning prediction accuracy. We show that PATer is able to accelerate the execution of diverse workloads up to 1.4×.
international conference on parallel architectures and compilation techniques | 2016
Zhen Jia; Chao Xue; Guancheng Chen; Jianfeng Zhan; Lixin Zhang; Yonghua Lin; Peter Hofstee
Much research work devotes to tuning big data analytics in modern data centers, since even a small percentage of performance improvement immediately translates to huge cost savings because of the large scale. Simultaneous multithreading (SMT) receives great interest from data center communities, as it has the potential to boost performance of big data analytics by increasing the processor resources utilization. For example, the emerging processor architectures like POWER8 support up to 8-way multithreading. However, as different big data workloads have disparate architectural characteristics, how to identify the most efficient SMT configuration to achieve the best performance is challenging in terms of both complex application behaviors and processor architectures. In this paper, we specifically focus on auto-tuning SMT configuration for Spark-based big data workloads on POWER8. However, our methodology could be generalized and extended to other programming software stacks and other architectures. We propose a prediction-based dynamic SMT threading (PBDST) framework to adjust the thread count in SMT cores on POWER8 processors by using versatile machine learning algorithms. Its innovation lies in adopting online SMT configuration predictions derived from microarchitecture level profiling, to regulate the thread counts that could achieve nearly optimal performance. Moreover it is implemented at Spark software stack layer and transparent to user applications. After evaluating a large set of machine learning algorithms, we choose the most efficient ones to perform online predictions. The experimental results demonstrate that our approach can achieve up to 56.3% performance improvement and an average performance gain of 16.2% in comparison with the default configuration-the maximum SMT configuration-SMT8 on our system.
international conference on parallel processing | 2009
Martti Forsell; Peter Hofstee; Ahmed Jerraya; Chris R. Jesshope; Uzi Vishkin; Jesper Larsson Träff
The last session of the HPPC 2009 workshop was dedicated to a panel discussion between the invited speakers and three additional, selected panelists. The theme of the panel was originally suggested by Uzi Vishkin, and developed with the moderator. A preamble was given in advance to the five panelists, and provoked an intensive and determined discussion. The panelists were given the chance to briefly summarize their view- and standpoints after the panel.
ACM Queue | 2007
Peter Hofstee; Michael Vizard
Today we’re going to talk about system on a chip and some of the design issues that go with that, and more importantly, some of the newer trends, such as the work that IBM is doing around the cell processor to advance the whole system on a chip processor. To that end, we’ve invited Peter Hofstee, Chief Scientist for the cell processor project that is being funded by IBM, Toshiba, and Sony, to talk to us today about how the whole system on a chip marketplace might change in the advent of the invention of the cell processor, and what technology is driving that.
international symposium on vlsi technology systems and applications | 2001
Osamu Takahashi; Sang Hoo Dhong; Peter Hofstee; J. Silbelman
Future high-performance computing systems, including some of the embedded systems, require not just high-speed circuit design techniques, but also require power-conscious design, so that the whole system could be optimized for the highest performance achievable. A fresh look at high-speed circuits with power in mind is needed.
Journal of Parallel and Distributed Computing | 2018
Raphael Polig; Kubilay Atasu; Heiner Giefers; Christoph Hagleitner; Laura Chiticariu; Frederick R. Reiss; Huaiyu Zhu; Peter Hofstee
Abstract Unstructured text data is being generated at an unprecedented rate in the form of Twitter feeds, machine logs or medical records. The analysis of this data is an important step to gaining significant insight regarding innovation, security and decision-making. The performance of traditional compute systems struggles to keep up with the rapid data growth and the expected high quality of information extraction. To cope with this situation, a compilation framework is presented that can transform text analytics queries into a hardware description. Deployed on an FPGA, the queries can be executed 60 times faster on average compared to a multi-threaded software implementation. The performance has been evaluated on two generations of high-end server systems including two generations of FPGAs, demonstrating the performance gains from advanced technology.