Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Peter W. Cook is active.

Publication


Featured researches published by Peter W. Cook.


international symposium on microarchitecture | 2000

Power-aware microarchitecture: design and modeling challenges for next-generation microprocessors

David M. Brooks; Pradip Bose; Stanley E. Schuster; Hans M. Jacobson; Prabhakar Kudva; Alper Buyuktosunoglu; John-David Wellman; Victor Zyuban; Manish Gupta; Peter W. Cook

The ability to estimate power consumption during early-stage definition and trade-off studies is a key new methodology enhancement. Opportunities for saving power can be exposed via microarchitecture-level modeling, particularly through clock-gating and dynamic adaptation. In this paper we describe the approach of using energy-enabled performance simulators in early design. We examine some of the emerging paradigms in processor design and comment on their inherent power-performance characteristics.


IEEE Computer | 2003

Dynamically tuning processor resources with adaptive processing

David H. Albonesi; Rajeev Balasubramonian; S.G. Dropsbo; Sandhya Dwarkadas; Eby G. Friedman; Michael C. Huang; Volkan Kursun; Grigorios Magklis; Michael L. Scott; Greg Semeraro; Pradip Bose; Alper Buyuktosunoglu; Peter W. Cook; Stanley E. Schuster

By using adaptive processing to dynamically tune major microprocessor resources, developers can achieve greater energy efficiency with reasonable hardware and software overhead while avoiding undue performance loss. Adaptive processors require few additional transistors. Further, because adaptation occurs only in response to infrequent trigger events, the decision logic can be placed into a low-leakage state until such events occur.


IEEE Journal of Solid-state Circuits | 1990

Second-generation RISC floating point with multiply-add fused

Erdem Hokenek; Robert K. Montoye; Peter W. Cook

A 440000-transistor second-generation RISC (reduced instruction set computer) floating-point chip is described. The pipeline latency is only two cycles, and a double-precision result is produced every cycle. System throughput and accuracy are increased by using a floating-point multiply-add-fused unit, which carries out a double-precision accumulate as a two-cycle pipelined execution with only one rounding error. While the cycle time (40 ns) is competitive with other CMOS RISC systems, the floating-point performance stretches to the range of bipolar RISC systems (7.4-13 MFLOPS LINPACK). Leading zero anticipation makes the two-cycle pipeline possible by nearly eliminating the additional postnormalization time, and it allows for reduced overall system latency. Partial decode shifters allow complete time sharing for the multiply and data alignment. Improved design techniques for logarithmic addition and higher order counters for multiplication complete this second-generation RISC floating-point unit design. >


PACS '00 Proceedings of the First International Workshop on Power-Aware Computer Systems-Revised Papers | 2000

An Adaptive Issue Queue for Reduced Power at High Performance

Alper Buyuktosunoglu; Stanley E. Schuster; David M. Brooks; Pradip Bose; Peter W. Cook; David H. Albonesi

Increasing power dissipation has become a major constraint for future performance gains in the design of microprocessors. In this paper, we present the circuit design of an issue queue for a superscalar processor that leverages transmission gate insertion to provide dynamic low-cost configurability of size and speed. A novel circuit structure dynamically gathers statistics of issue queue activity over intervals of instruction execution. These statistics are then used to change the size of an issue queue organization on-the-fly to improve issue queue energy and performance. When applied to a fixed, full-size issue queue structure, the result is up to a 70% reduction in energy dissipation. The complexity of the additional circuitry to achieve this result is almost negligible. Furthermore, self-timed techniques embedded in the adaptive scheme can provide a 56% decrease in cycle time of the CAM array read of the issue queue when we change the adaptive issue queue size from 32 entries (largest possible) to 8 entries (smallest possible in our design).


symposium on asynchronous circuits and systems | 2002

Synchronous interlocked pipelines

Hans M. Jacobson; Prabhakar Kudva; Pradip Bose; Peter W. Cook; Stanley E. Schuster; Eric G. Mercer; Chris J. Myers

Locality principles are becoming paramount in controlling advancement of data through pipelined systems. Achieving fine grained power down and progressive pipeline stalls at the local stage level is therefore becoming increasingly, important to enable lower dynamic power consumption while keeping introduced switching noise under control as well as avoiding global distribution of timing critical stall signals. It has long been known that the interlocking properties of as asynchronous pipelined systems have a potential to provide such benefits. However it has not been understood how such interlocking can be achieved in synchronous pipelines. This paper presents a novel technique based on local clock gating and synchronous handshake protocols that achieves stage level interlocking characteristics in synchronous pipelines similar to that of asynchronous pipelines. The presented technique is directly applicable to traditional synchronous pipelines and works equally well for two-phase clocked pipelines based on transparent latches, as well as one-phase clocked pipelines based on master-slave latches.


great lakes symposium on vlsi | 2001

A circuit level implementation of an adaptive issue queue for power-aware microprocessors

Alper Buyuktosunoglu; David H. Albonesi; Stanley E. Schuster; David M. Brooks; Pradip Bose; Peter W. Cook

Increasing power dissipation has become a major constraint for future performance gains in the design of microproces sors In this paper we present the circuit design of an issue queue for a superscalar processor that leverages transmis sion gate insertion to provide dynamic low cost con gura bility of size and speed A novel circuit structure dynami cally gathers statistics of issue queue activity over intervals of instruction execution These statistics are then used to change the size of an issue queue organization on the y to improve issue queue energy and performance When applied to a xed full size issue queue structure the result is up to a reduction in energy dissipation The complexity of the additional circuitry to achieve this result is almost neg ligible Furthermore self timed techniques embedded in the adaptive scheme can provide a decrease in cycle time of the CAM array read of the issue queue when we change the adaptive issue queue size from entries largest possible to entries smallest possible in our design


international solid-state circuits conference | 2000

Asynchronous interlocked pipelined CMOS circuits operating at 3.3-4.5 GHz

Stanley E. Schuster; William Robert Reohr; Peter W. Cook; David F. Heidel; Michael Immediato; Keith A. Jenkins

Chip performance, power, noise, and clock synchronization are becoming formidable challenges as microprocessor performance moves into the GHz regime and beyond. Interlocked pipelined CMOS (IPCMOS), an asynchronous clocking technique, helps address these challenges. This paper shows how a typical block (e.g., Block D) is interlocked with all the blocks with which it interacts. In the forward direction, dedicated Valid signals emulate the worst-case path through each driving block and thus determine when data can be latched within the typical block. In the reverse direction, Acknowledge signals indicate that data has been received by the subsequent blocks and that new data may be processed within the typical block. In this interlocked approach local clocks are generated only when there is an operation to perform.


IEEE Journal of Solid-state Circuits | 1998

Measurement and modeling of on-chip transmission line effects in a 400 MHz microprocessor

Phillip J. Restle; Keith A. Jenkins; A. Deutsch; Peter W. Cook

On-chip interconnect delays are becoming an increasingly important factor for high-performance microprocessors. Consequently, critical on-chip wiring must be carefully optimized to reduce and control interconnect delays, and accurate interconnect modeling has become more important. This paper shows the importance of including transmission line effects in interconnect modeling of the on-chip clock distribution of a 400 MHz CMOS microprocessor. Measurements of clock waveforms on the microprocessor showing 30 ps skew were made using an electron beam prober. Waveforms from a test chip are also shown to demonstrate the importance of transmission line effects.


international symposium on low power electronics and design | 2002

Tradeoffs in power-efficient issue queue design

Alper Buyuktosunoglu; David H. Albonesi; Pradip Bose; Peter W. Cook; Stanley E. Schuster

A major consumer of microprocessor power is the issue queue. Several microprocessors, including the Alpha 21264 and POWER4TM, use a compacting latch-based issue queue design which has the advantage of simplicity of design and verification. The disadvantage of this structure, however, is its high power dissipation.In this paper, we explore different issue queue power optimization techniques that vary not only in their performance and power characteristics, but in how much they deviate from the baseline implementation. By developing and comparing techniques that build incrementally on the baseline design, as well as those that achieve higher power savings through a more significant redesign effort, we quantify the extra benefit the higher design cost techniques provide over their more straightforward counterparts.


international electron devices meeting | 1993

SOI for a 1-volt CMOS technology and application to a 512 Kb SRAM with 3.5 ns access time

Ghavam G. Shahidi; Tak H. Ning; Terry I. Chappell; J.H. Comfort; Barbara Alane Chappell; Robert L. Franch; Carl J. Anderson; Peter W. Cook; Stanley E. Schuster; M.G. Rosenfield; Michael R. Polcari; Robert H. Dennard; Bijan Davari

In this paper a CMOS technology that is optimum for low voltage (in the I-volt range) applications is presented. Thin but undepleted SOI is used as the substrate, which gives low junction capacitance and no body effect. Furthermore floating body effects causes a reduction of subthreshold slope at high drain bias. This lowers the high-V/sub DS/ threshold to be used, which increases the current drive without significant increase in the off-current. This technology was applied to a high performance 512 Kb SRAM. Access time of 3.5 ns at 1 V was obtained.<<ETX>>

Researchain Logo
Decentralizing Knowledge