Donald A. Priore
Advanced Micro Devices
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Donald A. Priore.
international solid-state circuits conference | 1992
Daniel W. Dobberpuhl; Richard T. Witek; Randy L. Allmon; Robert Anglin; David Bertucci; Sharon M. Britton; Linda Chao; Robert A. Conrad; Daniel E. Dever; Bruce A. Gieseke; Soha Hassoun; Gregory W. Hoeppner; Kathryn Kuchler; Maureen Ladd; Burton M. Leary; Liam Madden; Edward J. McLellan; Derrick R. Meyer; James Montanaro; Donald A. Priore; Vidya Rajagopalan; Sridhar Samudrala; Sribalan Santhanam
A RISC (reduced-instruction-set computer)-style microprocessor operating up to 200 MHz, implements a 64-b architecture that provides huge linear address space without bottlenecks that would impede highly concurrent implementations. Fully pipelined and capable of issuing two instructions per clock cycle, this implementation can execute up to 400 M operations per second. The chip includes an 8-kB I-cache, an 8-kB D-cache, and two associated translation buffers, a four-entry 32-B/entry write buffer, a pipelined 64-b integer execution unit with 32-entry register file, and a pipelined floating-point unit with an additional 32 registers. The pin interface includes integral support for an external secondary cache. The package is a 431-pin PGA with 140 pins dedicated to VDD/VSS. The chip is fabricated in 0.75- mu m n-well CMOS with three layers of metallization. The die measures 16.8*13.9 mm/sup 2/ and contains 1.68 M transistors. Power dissipation is 30 W from a 3.3-V supply at 200 MHz. >
international solid-state circuits conference | 1997
B.A. Gieseke; R.L. Allmon; D.W. Bailey; B.J. Benschneider; S.M. Britton; J.D. Clouser; H.R. Fair; J.A. Farrell; M.K. Gowan; C.L. Houghton; J.B. Keller; T.H. Lee; D.L. Leibholz; S.C. Lowell; M.D. Matson; R.J. Matthew; V. Peng; M.D. Quinn; Donald A. Priore; M.J. Smith; K.E. Wilcox
A six-issue, four-fetch, out-of-order execution, 6OOMHz Alpha microprocessor achieves an estimated 40SpecInt95, 60SpecFP95 and 1800MB/s on McCalpin Stream. The 16.7x18.8mm2 die contains 15.2M transistors and dissipates an estimated 72W. It is in 2.0V, 6-metal, 0.35/spl mu/m CMOS with CMP planarization. The chip is in a 587-pin ceramic IPGA with 198 pins for VDD/VSS that includes a CuW heat slug for low thermal resistance between die and detachable heat sink. An on-chip PLL performs frequency multiplication of a differential PECL reference and synchronizes I/O by phase-aligning a CPU clock to the reference.
international solid-state circuits conference | 1995
Bradley J. Benschneider; Andrew J. Black; William J. Bowhill; Sharon M. Britton; Dainel E. Dever; Dale R. Donchin; Robert J. Dupcak; Richard Fromm; Mary K. Gowan; Paul E. Gronowski; Michael Kantrowitz; Marc E. Lamere; Sharad Mehta; Jeanne E. Meyer; R.O. Mueller; Andy Olesin; Ronald P. Preston; Donald A. Priore; Sribalan Santhanam; Michael J. Smith; Gilbert M. Wolrich
This 300 MHz quad-issue custom VLSI implementation of the Alpha architecture delivers 1200 MIPS (peak), 600 MFLOPS (peak), 341 SPECint92, and 512 SPECfp92. The 16.5 mm/spl times/18.1 mm die contains 9.3 M transistors and dissipates 50 W at 300 MHz. It is fabricated in a 3.3 V, four-layer metal, 0.5 /spl mu/m, CMOS process. The upper metal layers (metal-3 and metal-4), primarily used for power, ground, and clock distribution. The chip supports 3.3 V/5.0 V interfaces and is packaged in a 499-pin ceramic IPGA. It contains an 8-kbyte instruction cache; an 8-kbyte, dual-ported, data cache; and a 96-kbyte, unified, second-level, 3-way set associative, fully pipelined, writeback cache. This paper describes the circuit and implementation techniques that were used to attain the 300 MHz operating frequency.
international conference on computer design | 1998
Mark D. Matson; Dan Bailey; Shane L. Bell; Larry L. Biro; Steve Butler; John D. Clouser; Jim Farrell; Mike Gowan; Donald A. Priore; Kathryn Wilcox
The circuit techniques used to implement a 600 MHz, out-of-order, superscalar RISC Alpha microprocessor are described. Innovative logic and circuit design created a chip that attains 30+ SpecInt95 and 50+ SpecFP95, and supports a secondary cache bandwidth of 6.4 GB/s. Microarchitectural techniques were used to optimize latencies and cycle time, while a variety of static and dynamic design methods balanced critical path delays against power consumption. The chip relies heavily on full custom design and layout to meet speed and area goals. An extensive CAD suite guaranteed the integrity of the design.
IEEE Journal of Solid-state Circuits | 2015
Kathryn Wilcox; Robert Cole; Harry R. Fair; Kevin Gillespie; Aaron Grenat; Carson Henrion; Ravi Jotwani; Stephen Kosonocky; Benjamin Munger; Samuel Naffziger; Robert S. Orefice; Sanjay Pant; Donald A. Priore; Ravinder Rachala; Jonathan White
This work describes the physical design implementation of the AMD “Steamroller” module and adaptive clocking system that are both integral pieces of the AMD Kaveri APU SoC which was implemented using a 28 nm high-K metal gate Bulk CMOS process. The Steamroller module occupies 29.47 mm 2 and contains 236 million transistors. Various aspects of the core design are covered including the power and timing methodologies as well as design challenges moving from 32 nm SOI to 28 nm Bulk CMOS. Adaptive clocking, one of the key features used for core power efficiency, is described in detail.
international solid-state circuits conference | 2014
Kevin Gillespie; Harry R. Fair; Carson Henrion; Ravi Jotwani; Stephen Kosonocky; Robert S. Orefice; Donald A. Priore; Jonathan White; Kathryn Wilcox
The AMD two-core x86-64 CPU module, codenamed “Steamroller”, contains 236 million transistors implemented in 28nm high-κ metal gate (HKMG) bulk CMOS using 12 levels of metal. It is designed to operate from 0.8 to 1.45V. The CPU module occupies 29.47 mm2, which includes two independent integer cores, two instruction decode units and shared instruction fetch, floating-point, and 2MB 16-way L2 cache units (Fig. 5.5.7). Along with the second instruction decode unit, this design includes a larger shared 96KB 3-way instruction cache and a 10KB L2 branch target buffer for improved single-threaded performance and multi-threaded throughput compared to a previous 32nm AMD x86-64 CPU codenamed “Bulldozer” [1].
Digital Technical Journal | 1992
Daniel W. Dobberpuhl; Richard T. Witek; Randy L. Allmon; Robert Anglin; David Bertucci; Sharon M. Britton; Linda Chao; Robert A. Conrad; Daniel E. Dever; Bruce A. Gieseke; Soha Hassoun; Gregory W. Hoeppner; Kathryn Kuchler; Maureen Ladd; Burton M. Leary; Liam Madden; Edward J. McLellan; Derrick R. Meyer; James Montanaro; Donald A. Priore; Vidya Rajagopalan; Sridhar Samudrala; Sribalan Santhanam
Archive | 2010
Kevin Gillespie; Timothy J. Correia; Donald A. Priore
Archive | 2010
Edward J. McLellan; Magiting M. Talisayon; Donald A. Priore
Archive | 2008
Russell Schreiber; Keith Kasprak; Donald A. Priore