James W. Tschanz
Intel
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by James W. Tschanz.
design automation conference | 2003
Shekhar Borkar; Tanay Karnik; Siva G. Narendra; James W. Tschanz; Ali Keshavarzi; Vivek De
Parameter variation in scaled technologies beyond 90nm will pose a major challenge for design of future high performance microprocessors. In this paper, we discuss process, voltage and temperature variations; and their impact on circuit and microarchitecture. Possible solutions to reduce the impact of parameter variations and to achieve higher frequency bins are also presented.
international solid-state circuits conference | 2002
James W. Tschanz; James Kao; Siva G. Narendra; Raj Nair; Dimitri A. Antoniadis; Anantha P. Chandrakasan; Vivek De
Measurements on a 150 nm CMOS test chip show that on-chip bidirectional adaptive body biasing compensates effectively for die-to-die parameter variation to meet both frequency and leakage requirements. An enhancement of this technique to correct for within-die variations triples the accepted die count in the highest frequency bin.
IEEE Journal of Solid-state Circuits | 2008
Sriram R. Vangal; Jason Howard; Gregory Ruhl; Saurabh Dighe; Howard Wilson; James W. Tschanz; David Finan; Arvind Singh; Tiju Jacob; Shailendra Jain; Vasantha Erraguntla; Clark Roberts; Yatin Hoskote; Nitin Borkar; Shekhar Borkar
This paper describes an integrated network-on-chip architecture containing 80 tiles arranged as an 8x10 2-D array of floating-point cores and packet-switched routers, both designed to operate at 4 GHz. Each tile has two pipelined single-precision floating-point multiply accumulators (FPMAC) which feature a single-cycle accumulation loop for high throughput. The on-chip 2-D mesh network provides a bisection bandwidth of 2 Terabits/s. The 15-FO4 design employs mesochronous clocking, fine-grained clock gating, dynamic sleep transistors, and body-bias techniques. In a 65-nm eight-metal CMOS process, the 275 mm2 custom design contains 100 M transistors. The fully functional first silicon achieves over 1.0 TFLOPS of performance on a range of benchmarks while dissipating 97 W at 4.27 GHz and 1.07 V supply.
international solid-state circuits conference | 2007
Sriram R. Vangal; Jason Howard; Gregory Ruhl; Saurabh Dighe; Howard Wilson; James W. Tschanz; David Finan; Priya Iyer; Arvind Singh; Tiju Jacob; Shailendra Jain; Sriram Venkataraman; Yatin Hoskote; Nitin Borkar
A 275mm2 network-on-chip architecture contains 80 tiles arranged as a 10 times 8 2D array of floating-point cores and packet-switched routers, operating at 4GHz. The 15-F04 design employs mesochronous clocking, fine-grained clock gating, dynamic sleep transistors, and body-bias techniques. The 65nm 100M transistor die is designed to achieve a peak performance of 1.0TFLOPS at 1V while dissipating 98W.
international symposium on low power electronics and design | 2001
James W. Tschanz; Siva G. Narendra; Zhanping Chen; Shekhar Borkar; Manoj Sachdev; Vivek De
Flip-flops and latches are crucial elements of a design from both a delay and energy standpoint. We compare several styles of single edge-triggered flip-flops, including semidynamic and static with both implicit and explicit pulse generation. We present an implicit-pulsed, semidynamic flip-flop (ip-DCO) which has the fastest delay of any flip-flop considered, along with a large amount of negative setup time. However, an explicit-pulsed static flip-flop (ep-SFF) is the most energy-efficient and is ideal for the majority of critical paths in the design. In order to further reduce the power consumption, dual edge-triggered flip-flops are evaluated. It is shown that classic dual edge-triggered designs suffer from a large area penalty and reduced performance, prohibiting their use in critical paths. A new explicit-pulsed dual edge-triggered flip-flop is presented which provides the same performance as the single edge-triggered version with significantly less energy consumption in the flip-flop as well as in the clock distribution network.
IEEE Journal of Solid-state Circuits | 2011
Keith A. Bowman; James W. Tschanz; Shih-Lien Lu; Paolo A. Aseron; Muhammad M. Khellah; Arijit Raychowdhury; Bibiche M. Geuskens; Chris Wilkerson; Tanay Karnik; Vivek De
A 45 nm microprocessor core integrates resilient error-detection and recovery circuits to mitigate the clock frequency (FCLK) guardbands for dynamic parameter variations to improve throughput and energy efficiency. The core supports two distinct error-detection designs, allowing a direct comparison of the relative trade-offs. The first design embeds error-detection sequential (EDS) circuits in critical paths to detect late timing transitions. In addition to reducing the Fclk guardbands for dynamic variations, the embedded EDS design can exploit path-activation rates to operate the microprocessor faster than infrequently-activated critical paths. The second error-detection design offers a less-intrusive approach for dynamic timing-error detection by placing a tunable replica circuit (TRC) per pipeline stage to monitor worst-case delays. Although the TRCs require a delay guardband to ensure the TRC delay is always slower than critical-path delays, the TRC design captures most of the benefits from the embedded EDS design with less implementation overhead. Furthermore, while core min-delay constraints limit the potential benefits of the embedded EDS design, a salient advantage of the TRC design is the ability to detect a wider range of dynamic delay variation, as demonstrated through low supply voltage (VCC) measurements. Both error-detection designs interface with error-recovery techniques, enabling the detection and correction of timing errors from fast-changing variations such as high-frequency VCC droops. The microprocessor core also supports two separate error-recovery techniques to guarantee correct execution even if dynamic variations persist. The first technique requires clock control to replay errant instructions at 1/2FCLK. In comparison, the second technique is a new multiple-issue instruction replay design that corrects errant instructions with a lower performance penalty and without requiring clock control. Silicon measurements demonstrate that resilient circuits enable a 41% throughput gain at equal energy or a 22% energy reduction at equal throughput, as compared to a conventional design when executing a benchmark program with a 10% VCC droop. In addition, the microprocessor includes a new adaptive clock control circuit that interfaces with the resilient circuits and a phase-locked loop (PLL) to track recovery cycles and adapt to persistent errors by dynamically changing Fclk f°Γ maximum efficiency.
IEEE Journal of Solid-state Circuits | 2003
James W. Tschanz; Siva G. Narendra; Raj Nair; Vivek De
Test chip measurements show that adaptive V/sub CC/ is useful for reducing impacts of parameter variations on frequency, active power and leakage power of microprocessors. Using adaptive V/sub CC/ together with adaptive V/sub BS/ or WID-V/sub BS/ is much more effective than using any of them individually.
IEEE Micro | 2006
Osman S. Unsal; James W. Tschanz; Keith A. Bowman; Vivek De; Xavier Vera; Antonio González; Oguz Ergin
Parameter variations, which are increasing along with advances in process technologies, affect both timing and power. Variability must be considered at both the circuit and microarchitectural design levels to keep pace with performance scaling and to keep power consumption within reasonable limits. This article presents an overview of the main sources of variability and surveys variation-tolerant circuit and microarchitectural approaches
design automation conference | 2002
Tanay Karnik; Yibin Ye; James W. Tschanz; Liqiong Wei; Steven M. Burns; V. Govindarajulu; Vivek De; Shekhar Borkar
We describe various design automation solutions for design migration to a dual-Vt process technology. We include the results of a Lagrangian Relaxation based tool, iSTATS, and a heuristic iterative optimization flow. Joint dual-Vt allocation and sizing reduces total power by 10+% compared with Vt allocation alone, and by 25+% compared with pure sizing methods. The heuristic flow requires 5x larger computation runtime than iSTATS due to its iterative nature.
design automation conference | 2009
Keith A. Bowman; James W. Tschanz; Chris Wilkerson; Shih-Lien Lu; Tanay Karnik; Vivek De; Shekhar Borkar
Three circuit techniques for dynamic variation tolerance are presented: (i) Sensors with adaptive voltage and frequency circuits, (ii) Tunable replica circuits for timing-error prediction with error recovery, and (iii) Embedded error-detection sequential circuits with error recovery. These circuits mitigate the clock frequency guardbands for dynamic variations, thus improving microprocessor performance and energy-efficiency. These circuits are described with a focus on the different trade-offs in guardband reduction and design overhead. Opportunities for CAD to further enhance microprocessor performance and energy efficiency are offered.