Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Dean L. Lewis is active.

Publication


Featured researches published by Dean L. Lewis.


high-performance computer architecture | 2010

An optimized 3D-stacked memory architecture by exploiting excessive, high-density TSV bandwidth

Dong Hyuk Woo; Nak Hee Seong; Dean L. Lewis; Hsien-Hsin S. Lee

Memory bandwidth has become a major performance bottleneck as more and more cores are integrated onto a single die, demanding more and more data from the system memory. Several prior studies have demonstrated that this memory bandwidth problem can be addressed by employing a 3D-stacked memory architecture, which provides a wide, high frequency memory-bus interface. Although previous 3D proposals already provide as much bandwidth as a traditional L2 cache can consume, the dense through-silicon-vias (TSVs) of 3D chip stacks can provide still more bandwidth. In this paper, we contest that we need to re-architect our memory hierarchy, including the L2 cache and DRAM interface, so that it can take full advantage of this massive bandwidth. Our technique, SMART-3D, is a new 3D-stacked memory architecture with a vertical L2 fetch/write-back network using a large array of TSVs. Simply stated, we leverage the TSV bandwidth to hide latency behind very large data transfers. We analyze the design trade-offs for the DRAM arrays, careful enough to avoid compromising the DRAM density because of TSV placement. Moreover, we propose an efficient mechanism to manage the false sharing problem when implementing SMART-3D in a multi-socket system. For single-threaded memory-intensive applications, the SMART-3D architecture achieves speedups from 1.53 to 2.14 over planar designs and from 1.27 to 1.72 over prior 3D designs. We achieve similar speedups for multi-program and multi-threaded workloads on multi-core and multi-socket processors. Furthermore, SMART-3D can even lower the energy consumption in the L2 cache and 3D DRAM for it reduces the total number of row buffer misses.


international solid-state circuits conference | 2012

3D-MAPS: 3D Massively parallel processor with stacked memory

Dae Hyun Kim; Krit Athikulwongse; Michael B. Healy; Mohammad M. Hossain; Moongon Jung; Ilya Khorosh; Gokul Kumar; Young-Joon Lee; Dean L. Lewis; Tzu-Wei Lin; Chang Liu; Shreepad Panth; Mohit Pathak; Minzhen Ren; Guanhao Shen; Taigon Song; Dong Hyuk Woo; Xin Zhao; Joungho Kim; Ho Choi; Gabriel H. Loh; Hsien-Hsin S. Lee; Sung Kyu Lim

Several recent works have demonstrated the benefits of through-silicon-via (TSV) based 3D integration, but none of them involves a fully functioning multicore processor and memory stacking. 3D-MAPS (3D Massively Parallel Processor with Stacked Memory) is a two-tier 3D IC, where the logic die consists of 64 general-purpose processor cores running at 277MHz, and the memory die contains 256KB SRAM. Fabrication is done using 130nm GlobalFoundries device technology and Tezzaron TSV and bonding technology. Packaging is done by Amkor. This processor contains 33M transistors, 50K TSVs, and 50K face-to-face connections in 5x5mm2 footprint. The chip runs at 1.5V and consumes up to 4W, resulting in 16W/cm2 power density. The core architecture is developed from scratch to benefit from single-cycle access to SRAM.


international test conference | 2007

A scanisland based design enabling prebond testability in die-stacked microprocessors

Dean L. Lewis; Hsien Hsin S. Lee

Die stacking is a promising new technology that enables integration of devices in the third dimension. Recent research thrusts in 3D-integrated microprocessor design have demonstrated significant improvements in both power consumption and performance. However, this technology is currently being held back due to the lack of test technology. Because processor functionality is partitioned across different silicon die layers, only partial circuitry exists on each layer pre-bond. In current 3D manufacturing, layers in the die stack are simply bonded together to form the complete processor; no testing is performed at the pre-bond stage. Such a strategy leads to an exponential decay in the yield of the final product and places an economic limit on the number of die that can be stacked. To overcome this limit, pre-bond test is a necessity. In this paper, we present a technique to enable pre-bond test in each layer. Further, we address several issues with integrating this new test hardware into the final design. Finally, we use a sample 3D floorplan based on the Alpha 21264 to show that our technique can be implemented at a minimal cost (0.2% area overhead). Our design for pre-bond testability enables the structural test necessary to continue 3D integration for microprocessors beyond a few layers.


custom integrated circuits conference | 2010

Design and analysis of 3D-MAPS: A many-core 3D processor with stacked memory

Michael B. Healy; Krit Athikulwongse; Rohan Goel; Mohammad M. Hossain; Dae Hyun Kim; Young-Joon Lee; Dean L. Lewis; Tzu-Wei Lin; Chang Liu; Moongon Jung; Brian Ouellette; Mohit Pathak; Hemant Sane; Guanhao Shen; Dong Hyuk Woo; Xin Zhao; Gabriel H. Loh; Hsien-Hsin S. Lee; Sung Kyu Lim

We describe the design and analysis of 3D-MAPS, a 64-core 3D-stacked memory-on-processor running at 277 MHz with 63 GB/s memory bandwidth, sent for fabrication using Tezzarons 3D stacking technology. We also describe the design flow used to implement it using industrial 2D tools and custom add-ons to handle 3D specifics.


international conference on computer aided design | 2009

Pre-bond testable low-power clock tree design for 3D stacked ICs

Xin Zhao; Dean L. Lewis; Hsien-Hsin S. Lee; Sung Kyu Lim

Pre-bond testing of 3D stacked ICs involves testing individual dies before bonding. The overall yield of 3D ICs improves with pre-bond testability because designers can avoid stacking defective dies with good ones. However, pre-bond testability presents unique challenges to 3D clock tree design. First, each die needs a complete 2D clock tree for the pre-bond testing. In addition, the entire 3D stack needs a complete 3D clock tree for post-bond testing and normal operations. In the case of two-die stack, a straightforward solution is to have two complete 2D clock trees connected with a single Through-Silicon-Via (TSV). We show that this solution suffers from long wirelength and high clock power consumption. Instead, our algorithm minimizes the overall wirelength and clock power consumption while providing the pre-bond testability and post-bond operability under given skew and slew constraints. Compared with the single-TSV solution, SPICE simulation results show that our multi-TSV approach significantly reduces the clock power by up to 15.9% for two-die and 29.7% for four-die stack. In addition, the wirelength reduction is up to 24.4% and 42.0%.


2009 IEEE International Conference on 3D System Integration | 2009

Architectural evaluation of 3D stacked RRAM caches

Dean L. Lewis; Hsien-Hsin S. Lee

The first memristor, originally theorized by Dr. Leon Chua in 1971, was identiffed by a team at HP Labs in 2008. This new fundamental circuit element is unique in that its resistance changes as current passes through it, giving the device a memory of the past system state. The immediately obvious application of such a device is in a non-volatile memory, wherein high- and low-resistance states are used to store binary values. A memory array of memristors forms what is called a resistive RAM or RRAM. In this paper, we survey the memristors that have been produced by a number of different research teams and present a point-by-point comparison between DRAM and this new RRAM, based on both existent and expected near-term memristor devices. In particular, we consider the case of a die-stacked 3D memory that is integrated onto a logic die and evaluate which memory is best suited for the job. While still suffering a few shortcomings, RRAM proves itself a very interesting design alternative to well-established DRAM technologies.


ieee computer society annual symposium on vlsi | 2009

Testing Circuit-Partitioned 3D IC Designs

Dean L. Lewis; Hsien-Hsin S. Lee

3D integration is an emerging technology that allows for the vertical stacking of multiple silicon die. These stacked die are tightly integrated with through-silicon vias and promise significant power and area reductions by replacing long global wires with short vertical connections. This technology necessitates that neighboring logical blocks exist on different layers in the stack. However, such functional partitions disable intra-chip communication pre-bond and thus disrupt traditional test techniques.Previous work has described a general test architecture that enables pre-bond testability of an architecturally partitioned 3D processor and provided mechanisms for basic layer functionality. This work proposes new test methods for designs partitioned at the circuits level,in which the gates and transistors of individual circuits could be split across multiple die layers. We investigated a bit-partitioned adder unit and a port-split register file, which represents the most difficult circuit-partitioned design to test pre-bond but which is used widely in many circuits. Two layouts of each circuit, planar and 3D, are produced. Our experiments verify the performance and power results and examine the test coverage achieved.


IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems | 2011

Low-Power Clock Tree Design for Pre-Bond Testing of 3-D Stacked ICs

Xin Zhao; Dean L. Lewis; Hsien-Hsin S. Lee; Sung Kyu Lim

Pre-bond testing of 3-D stacked integrated circuits (ICs) involves testing each individual die before bonding. The overall yield of 3-D ICs improves with pre-bond testability because manufacturers can avoid stacking defective dies with good ones. However, pre-bond testability presents unique challenges to 3-D clock tree design. First, each die needs a complete 2-D clock tree to enable pre-bond test. Second, the entire 3-D stack needs a complete 3-D clock tree for post-bond test and operation. In the case of a two-die stack, a straightforward solution is to have two complete 2-D clock trees connected with a single through-silicon-via (TSV). We show that this solution suffers from long wirelength (WL) and high clock power consumption. Our algorithm improves on this solution, minimizes the overall WL and clock power consumption, and provides both pre-bond testability and post-bond operability with minimum skew and constrained slew. Compared with the single-TSV solution, SPICE simulation results show that our multi-TSV approach significantly reduces the clock power by up to 15.9% for two-die and 29.7% for four-die stacks. In addition, the WL is reduced by up to 24.4% and 42.0%.


IEEE Transactions on Computers | 2015

Design and Analysis of 3D-MAPS (3D Massively Parallel Processor with Stacked Memory)

Dae Hyun Kim; Krit Athikulwongse; Michael B. Healy; Mohammad M. Hossain; Moongon Jung; Ilya Khorosh; Gokul Kumar; Young-Joon Lee; Dean L. Lewis; Tzu-Wei Lin; Chang Liu; Shreepad Panth; Mohit Pathak; Minzhen Ren; Guanhao Shen; Taigon Song; Dong Hyuk Woo; Xin Zhao; Joungho Kim; Ho Choi; Gabriel H. Loh; Hsien-Hsin S. Lee; Sung Kyu Lim

This paper describes the architecture, design, analysis, and simulation and measurement results of the 3D-MAPS (3D massively parallel processor with stacked memory) chip built with a 1.5 V, 130 nm process technology and a two-tier 3D stacking technology using 1.2 \microm-diameter, 6 \micro m-height through-silicon vias (TSVs) and 3.4\nbsp\microm-diameter face-to-face bond pads. 3D-MAPS consists of a core tier containing 64 cores and a memory tier containing 64 memory blocks. Each core communicates with its dedicated 4KB SRAM block using face-to-face bond pads, which provide negligible data transfer delay between the core and the memory tiers. The maximum operating frequency is 277 MHz and the maximum memory bandwidth is 70.9 GB/s at 277 MHz. The peak measured memory bandwidth usage is 63.8 GB/s and the peak measured power is approximately 4 W based on eight parallel benchmarks.


international conference on computer design | 2011

Designing 3D test wrappers for pre-bond and post-bond test of 3D embedded cores

Dean L. Lewis; Shreepad Panth; Xin Zhao; Sung Kyu Lim; Hsien-Hsin S. Lee

3D integration is a promising new technology for tightly integrating multiple active silicon layers into a single chip stack. Both the integration of heterogeneous tiers and the partitioning of functional units across tiers leads to significant improvements in functionality, area, performance, and power consumption. Managing the complexity of 3D design is a significant challenge that will require a system-on-chip approach, but the application of SOC design to 3D necessitates extensions to current test methodology. In this paper, we propose extending test wrappers, a popular SOC DFT technique, into the third dimension. We develop an algorithm employing the Best Fit Decreasing and Kernighan-Lin Partitioning heuristics to produce 3D wrappers that minimize test time, maximize reuse of routing resources across test modes, and allow for different TAM bus widths in different test modes. On average the two variants of our algorithm reuse 93% and 92% of the test wrapper wires while delivering test times of just 0.06% and 0.32% above the minimum.

Collaboration


Dive into the Dean L. Lewis's collaboration.

Top Co-Authors

Avatar

Hsien-Hsin S. Lee

Georgia Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Sung Kyu Lim

Georgia Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Xin Zhao

Georgia Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Dong Hyuk Woo

Georgia Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Chang Liu

Georgia Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Dae Hyun Kim

Georgia Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Gabriel H. Loh

Georgia Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Guanhao Shen

Georgia Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Krit Athikulwongse

Georgia Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Michael B. Healy

Georgia Institute of Technology

View shared research outputs
Researchain Logo
Decentralizing Knowledge