Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Naoaki Aoki is active.

Publication


Featured researches published by Naoaki Aoki.


international solid-state circuits conference | 1998

A 1.0-GHz single-issue 64-bit powerPC integer processor

Joel Abraham Silberman; Naoaki Aoki; David William Boerstler; Jeffrey L. Burns; Sang Hoo Dhong; Axel Essbaum; Uttam Shyamalindu Ghoshal; David F. Heidel; Peter Hofstee; Kyung Tek Lee; David Meltzer; Hung Ngo; Kevin J. Nowka; Stephen D. Posluszny; Osamu Takahashi; Ivan Vo; Brian Zoric

This 64 b single-issue integer processor, comprised of about one million transistors, is fabricated in a 0.15 /spl mu/m effective channel length, six-metal-layer CMOS technology. Intended as a vehicle to explore circuit, clocking, microarchitecture, and methodology options for high-frequency processors, the processor prototype implements 60 fixed-point compare, logical, arithmetic, and rotate-merge-mask instructions of the PowerPC instruction-set architecture with single-cycle latency. The processor executes programs written in this instruction subset from cache with a 1 ns cycle. In addition, the prototype implements 36 PowerPC load/store instructions that execute as single-cycle operations (zero wait cycles) with 1.15 ns latency. Full data forwarding and full at speed scan testing are supported.


international solid-state circuits conference | 2000

A 1 GHz single-issue 64 b PowerPC processor

Peter Hofstee; Naoaki Aoki; David William Boerstler; Paula Kristine Coulman; Sang Hoo Dhong; Brian Flachs; N. Kojima; O. Kwon; Kyung Tek Lee; David Meltzer; Kevin J. Nowka; J. Park; J. Peter; Stephen D. Posluszny; M. Shapiro; Joel Abraham Silberman; Osamu Takahashi; B. Weinberger

This 64 b single-issue PowerPC processor contains 19M transistors and is fabricated in 0.12 /spl mu/m L/sub eff/ six-layer copper interconnect CMOS. Nominal processor clock frequency is 1.0 GHz. At the fast end of the process distribution the processor reaches 1.15 GHz (1.87 V, 101/spl deg/C, 112 W). As in a previous design, nearly the entire processor is implemented using delayed-reset and self-resetting dynamic circuit macros. New contributions include: (1) a fully pipelined, four execution-stage IEEE double-precision floating-point unit (FPU) with fused multiply-add. 2) Sum-addressed memory management units (MMUs) and 64 kB 2-cycle caches. (3) Support for the full 64 b PowerPC instruction set. (4) Dynamic PLA-based control. (5) A microarchitecture and floorplan that balances critical paths. (6) Delayed-reset dynamic circuits that support stress testing (burn-in). 7) Improved clock generation and distribution.


design automation conference | 2000

“Timing closure by design,” a high frequency microprocessor design methodology

Stephen D. Posluszny; Naoaki Aoki; David William Boerstler; Paula Kristine Coulman; Sang Hoo Dhong; B. Flachs; Peter Hofstee; N. Kojima; O. Kwon; K. Lee; David Meltzer; Kevin J. Nowka; J. Park; J. Peter; Joel Abraham Silberman; Osamu Takahashi; P. Villarrubial

This paper presents a design methodology emphasizing early and quick timing closure for high frequency microprocessor designs. This methodology was used to design a Gigahertz class PowerPC microprocessor with 19 million transistors. Characteristics of “Timing Closure by Design are 1) logic partitioned on timing boundaries, 2) predictable control structures (PLAs), 3) static interfaces for dynamic circuits, 4) low skew clock distribution, 5) deterministic method of macro placement, 6) simplified timing analysis, and 7) refinement method of chip integration with early timing analysis.


IEEE Journal of Solid-state Circuits | 1999

A 1-GHz logic circuit family with sense amplifiers

Osamu Takahashi; Naoaki Aoki; Joel Abraham Silberman; Sang Hoo Dhong

This paper describes a newly developed logic circuit family based on dual-rail bit lines and sense amplifiers that is used extensively in a 1.0-GHz, single-issue, 64-bit PowerPC integer processor, gigahertz unit test site (guTS). The family consists of an incrementor, a count-leading-zero, a rotator, and a read-only memory. Each macro consists of a leaf-cell array, dual-rail bit lines, a row of sense amplifiers, a control block, and peripheral circuits. A common read-out scheme sensing the differential voltage of dual-rail bit lines is used. The hardware was fabricated in a 0.25-/spl mu/m drawn channel length, six-metal-layer (Al) CMOS technology (1.8-V nominal VDD). Wafer testing was performed using a probe card. The macros were tested cycle by cycle by scanning the input data to the read/write address latches and data latches, and scanning the result out from the output receiving latches. Functional testing was performed on guTS macros at frequencies up to 1.0 GHz at 25/spl deg/C with nominal VDD (1.1 GHz for the ROM).


symposium on vlsi circuits | 1998

1 GHz logic circuits with sense amplifiers

Osamu Takahashi; Naoaki Aoki; J. Silbermah; Sang Hoo Dhong

This paper describes a logic circuit family which is used extensively in 1.0 GHz single-issue 64-bit PowerPC integer processor. The family consists of an incrementor, a count-leading-zero, a rotator, and a ROM. Each macro consists of a leaf-cell array, dual rail bit-lines, a row of sense amplifiers, a control block, and peripheral circuits. A common read out scheme of sensing the differential voltage of dual rail bit-lines is used. The hardware is fabricated in a 0.25 /spl mu/m mask channel length, 6-metal-layer (Al) CMOS technology (1.8 V nom. V/sub DD/).


symposium on vlsi circuits | 2000

A 1.6 ns access, 1 GHz two-way set-predicted and sum-indexed 64-kByte data cache

Joel Abraham Silberman; Naoaki Aoki; Nobuo Kojima; Sang Hoo Dhong

A 64-kByte cache exploits combined address generation and word line decoding in the SRAM array, translation array, and directory. In place of a late select, set selection in the two-way associative cache is accomplished in the decode path by accessing a stored prediction from a sum-indexed array built into the decoder.


Archive | 2000

Set-associative cache memory having a built-in set prediction array

Naoaki Aoki; Sang Hoo Dhong; Nobuo Kojima; Joel Abraham Silberman


Archive | 2000

4 to 2 adder

Naoaki Aoki; Sang Hoo Dhong; Nobuo Kojima; Ohsang Kwon


Archive | 1999

Processor cycle time independent pipeline cache and method for pipelining data from a cache

Naoaki Aoki; Sang Hoo Dhong; Nobuo Kojima; Joel Abraham Silberman


Archive | 1999

Method and apparatus for generating true/complement signals

Naoaki Aoki; Sang Hoo Dhong; Nobuo Kojima; Joel Abraham Silberman

Researchain Logo
Decentralizing Knowledge