Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where David Finan is active.

Publication


Featured researches published by David Finan.


IEEE Journal of Solid-state Circuits | 2008

An 80-Tile Sub-100-W TeraFLOPS Processor in 65-nm CMOS

Sriram R. Vangal; Jason Howard; Gregory Ruhl; Saurabh Dighe; Howard Wilson; James W. Tschanz; David Finan; Arvind Singh; Tiju Jacob; Shailendra Jain; Vasantha Erraguntla; Clark Roberts; Yatin Hoskote; Nitin Borkar; Shekhar Borkar

This paper describes an integrated network-on-chip architecture containing 80 tiles arranged as an 8x10 2-D array of floating-point cores and packet-switched routers, both designed to operate at 4 GHz. Each tile has two pipelined single-precision floating-point multiply accumulators (FPMAC) which feature a single-cycle accumulation loop for high throughput. The on-chip 2-D mesh network provides a bisection bandwidth of 2 Terabits/s. The 15-FO4 design employs mesochronous clocking, fine-grained clock gating, dynamic sleep transistors, and body-bias techniques. In a 65-nm eight-metal CMOS process, the 275 mm2 custom design contains 100 M transistors. The fully functional first silicon achieves over 1.0 TFLOPS of performance on a range of benchmarks while dissipating 97 W at 4.27 GHz and 1.07 V supply.


international solid-state circuits conference | 2010

A 48-Core IA-32 message-passing processor with DVFS in 45nm CMOS

Jason Howard; Saurabh Dighe; Yatin Hoskote; Sriram R. Vangal; David Finan; Gregory Ruhl; David Jenkins; Howard Wilson; Nitin Borkar; Gerhard Schrom; Fabric Pailet; Shailendra Jain; Tiju Jacob; Satish Yada; Sraven Marella; Praveen Salihundam; Vasantha Erraguntla; Michael Konow; Michael Riepen; Guido Droege; Joerg Lindemann; Matthias Gries; Thomas Apel; Kersten Henriss; Tor Lund-Larsen; Sebastian Steibl; Shekhar Borkar; Vivek De; Rob F. Van der Wijngaart; Timothy G. Mattson

Current developments in microprocessor design favor increased core counts over frequency scaling to improve processor performance and energy efficiency. Coupling this architectural trend with a message-passing protocol helps realize a data-center-on-a-die. The prototype chip (Figs. 5.7.1 and 5.7.7) described in this paper integrates 48 Pentium™ class IA-32 cores [1] on a 6×4 2D-mesh network of tiled core clusters with high-speed I/Os on the periphery. The chip contains 1.3B transistors. Each core has a private 256KB L2 cache (12MB total on-die) and is optimized to support a message-passing-programming model whereby cores communicate through shared memory. A 16KB message-passing buffer (MPB) is present in every tile, giving a total of 384KB on-die shared memory, for increased performance. Power is kept at a minimum by transmitting dynamic, fine-grained voltage-change commands over the network to an on-die voltage-regulator controller (VRC). Further power savings are achieved through active frequency scaling at the tile granularity. Memory accesses are distributed over four on-die DDR3 controllers for an aggregate peak memory bandwidth of 21GB/s at 4× burst. Additionally, an 8-byte bidirectional system interface (SIF) provides 6.4GB/s of I/O bandwidth. The die area is 567mm2 and is implemented in 45nm high-к metal-gate CMOS [2].


international solid-state circuits conference | 2007

An 80-Tile 1.28TFLOPS Network-on-Chip in 65nm CMOS

Sriram R. Vangal; Jason Howard; Gregory Ruhl; Saurabh Dighe; Howard Wilson; James W. Tschanz; David Finan; Priya Iyer; Arvind Singh; Tiju Jacob; Shailendra Jain; Sriram Venkataraman; Yatin Hoskote; Nitin Borkar

A 275mm2 network-on-chip architecture contains 80 tiles arranged as a 10 times 8 2D array of floating-point cores and packet-switched routers, operating at 4GHz. The 15-F04 design employs mesochronous clocking, fine-grained clock gating, dynamic sleep transistors, and body-bias techniques. The 65nm 100M transistor die is designed to achieve a peak performance of 1.0TFLOPS at 1V while dissipating 98W.


IEEE Journal of Solid-state Circuits | 2005

Area-efficient linear regulator with ultra-fast load regulation

Peter Hazucha; Tanay Karnik; Bradley Bloechel; Colleen Parsons; David Finan; Shekhar Borkar

We demonstrate a fully integrated linear regulator for multisupply voltage microprocessors implemented in a 90 nm CMOS technology. Ultra-fast single-stage load regulation achieves a 0.54-ns response time at 94% current efficiency. For a 1.2-V input voltage and 0.9-V output voltage the regulator enables a 90 mV/sub P-P/ output droop for a 100-mA load step with only a small on-chip decoupling capacitor of 0.6 nF. By using a PMOS pull-up transistor in the output stage we achieved a small regulator area of 0.008 mm/sup 2/ and a minimum dropout voltage of 0.2 V for 100 mA of output current. The area for the 0.6-nF MOS capacitor is 0.090 mm/sup 2/.


international solid-state circuits conference | 2007

Adaptive Frequency and Biasing Techniques for Tolerance to Dynamic Temperature-Voltage Variations and Aging

J. Tschanz; Nam Sung Kim; Saurabh Dighe; Jason Howard; Gregory Ruhl; S. Vanga; S. Narendra; Yatin Hoskote; Howard Wilson; C. Lam; M. Shuman; Dinesh Somasekhar; Stephen H. Tang; David Finan; Tanay Karnik; Nitin Borkar; Nasser A. Kurd; Vivek De

Temperature, voltage, and current sensors monitor the operation of a TCP/IP offload accelerator engine fabricated in 90nm CMOS, and a control unit dynamically changes frequency, voltage, and body bias for optimum performance and energy efficiency. Fast response to droops and temperature changes is enabled by a multi-PLL clocking unit and on-chip body bias. Adaptive techniques are also used to compensate performance degradation due to device aging, reducing the aging guardband.


IEEE Journal of Solid-state Circuits | 2003

A TCP offload accelerator for 10 Gb/s Ethernet in 90-nm CMOS

Yatin Hoskote; Bradley Bloechel; Greg Dermer; Vasantha Erraguntla; David Finan; Jason Howard; D. Klowden; Siva G. Narendra; Gregory Ruhl; J. Tschanz; Sriram R. Vangal; V. Veeramachaneni; Howard Wilson; Jianping Xu; Nitin Borkar

This programmable engine is designed to offload TCP inbound processing at wire speed for 10-Gb/s Ethernet, supporting 64-byte minimum packet size. This prototype chip employs a high-speed core and a specialized instruction set. It includes hardware support for dynamically reordering out-of-order packets. In a 90-nm CMOS process, the 8-mm/sup 2/ experimental chip has 460 K transistors. First silicon has been validated to be fully functional and achieves 9.64-Gb/s packet processing performance at 1.72 V and consumes 6.39 W.


international solid-state circuits conference | 2003

A 5 GHz floating point multiply-accumulator in 90 nm dual V/sub T/ CMOS

Sriram R. Vangal; Yatin Hoskote; Dinesh Somasekhar; Vasantha Erraguntla; Jason Howard; Greg Ruhl; V. Veeramachaneni; David Finan; Sanu K. Mathew; Nitin Borkar

A 32 b single-cycle floating point accumulator that uses base 32 and carry-save format with delayed addition is described. Combined algorithmic, logic and circuit techniques enable multiply-accumulate operation at 5 GHz. In a 90 nm 7M dual-V/sub T/ CMOS process, the 2 mm/sup 2/ prototype contains 230K transistors and dissipates 1.2 W at 5 GHz, 1.2 V and 25/spl deg/C.


symposium on vlsi circuits | 2004

An area-efficient, integrated, linear regulator with ultra-fast load regulation

Peter Hazucha; Tanay Karnik; Bradley Bloechel; Colleen Parsons; David Finan; Shekhar Borkar

We demonstrate a fully-integrated linear regulator for multi-supply-voltage microprocessors implemented in a 90 nm CMOS technology. Ultra-fast, single-stage load regulation achieves 0.54 ns response time at 94% current efficiency. This enables 10% peak-to-peak output noise for a 100 mA load step with only a small on-chip decoupling capacitor of 0.6 nF. A PMOS pull-up transistor in the output stage results in a small regulator area of 0.008 mm/sup 2/ and the 0.6 nF MOS capacitor area of 0.090 mm/sup 2/.


international solid-state circuits conference | 1994

FA 18.4: a phase-tolerant 3.8 GB/s data-communication router for a multiprocessor supercomputer backplane

Edmund A. Reese; Howard Wilson; D. Nedwek; J. Jex; M. Khaira; T. Burton; P. Nag; H. Kumar; Charles E. Dike; David Finan; M. Haycock

Recent parallel processor supercomputer designs use an active backplane of routers to form the interconnections between processing elements. Today, high-bandwidth interconnect systems capable of scaling to configurations with more than 500 processing nodes tend to use self-timed designs. This avoids clock distribution problems seen in large phase-sensitive synchronous systems. The BiCMOS routing component described in this paper employs 200 MHz clocked communication for large scalable parallel-processor supercomputer systems. This scheme eliminates need for clock edges to be phase-aligned across the clock distribution network. Additionally, router inputs accept data at any phase relationship to the receiving router internal clock.<<ETX>>


international solid-state circuits conference | 2003

A 10GHz TCP offload accelerator for 10Gb/s Ethernet in 90nm dual-V/sub T/ CMOS

Yatin Hoskote; Vasantha Erraguntla; David Finan; Jason Howard; Dan Klowden; Siva G. Narendra; Greg Ruhl; James W. Tschanz; Sriram R. Vangal; V. Veeramachaneni; Howard Wilson; Jianping Xu; Nitin Borkar

This prototype offloads TCP input processing on minimum packet sizes at wire speed for 10Gb/s Ethernet. The design employs a 10GHz core with a specialized instruction set and includes hardware support for dynamically reordering packets. In a 90nm dual-V/sub T/ CMOS process, the 8mm/sup 2/ chip has 260K transistors. Simulation predicts a power dissipation of 1.9W at 1.2V and 10GHz.

Collaboration


Dive into the David Finan's collaboration.

Researchain Logo
Decentralizing Knowledge