Keith A. Bowman
Intel
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Keith A. Bowman.
IEEE Journal of Solid-state Circuits | 2011
Keith A. Bowman; James W. Tschanz; Shih-Lien Lu; Paolo A. Aseron; Muhammad M. Khellah; Arijit Raychowdhury; Bibiche M. Geuskens; Chris Wilkerson; Tanay Karnik; Vivek De
A 45 nm microprocessor core integrates resilient error-detection and recovery circuits to mitigate the clock frequency (FCLK) guardbands for dynamic parameter variations to improve throughput and energy efficiency. The core supports two distinct error-detection designs, allowing a direct comparison of the relative trade-offs. The first design embeds error-detection sequential (EDS) circuits in critical paths to detect late timing transitions. In addition to reducing the Fclk guardbands for dynamic variations, the embedded EDS design can exploit path-activation rates to operate the microprocessor faster than infrequently-activated critical paths. The second error-detection design offers a less-intrusive approach for dynamic timing-error detection by placing a tunable replica circuit (TRC) per pipeline stage to monitor worst-case delays. Although the TRCs require a delay guardband to ensure the TRC delay is always slower than critical-path delays, the TRC design captures most of the benefits from the embedded EDS design with less implementation overhead. Furthermore, while core min-delay constraints limit the potential benefits of the embedded EDS design, a salient advantage of the TRC design is the ability to detect a wider range of dynamic delay variation, as demonstrated through low supply voltage (VCC) measurements. Both error-detection designs interface with error-recovery techniques, enabling the detection and correction of timing errors from fast-changing variations such as high-frequency VCC droops. The microprocessor core also supports two separate error-recovery techniques to guarantee correct execution even if dynamic variations persist. The first technique requires clock control to replay errant instructions at 1/2FCLK. In comparison, the second technique is a new multiple-issue instruction replay design that corrects errant instructions with a lower performance penalty and without requiring clock control. Silicon measurements demonstrate that resilient circuits enable a 41% throughput gain at equal energy or a 22% energy reduction at equal throughput, as compared to a conventional design when executing a benchmark program with a 10% VCC droop. In addition, the microprocessor includes a new adaptive clock control circuit that interfaces with the resilient circuits and a phase-locked loop (PLL) to track recovery cycles and adapt to persistent errors by dynamically changing Fclk f°Γ maximum efficiency.
IEEE Micro | 2006
Osman S. Unsal; James W. Tschanz; Keith A. Bowman; Vivek De; Xavier Vera; Antonio González; Oguz Ergin
Parameter variations, which are increasing along with advances in process technologies, affect both timing and power. Variability must be considered at both the circuit and microarchitectural design levels to keep pace with performance scaling and to keep power consumption within reasonable limits. This article presents an overview of the main sources of variability and surveys variation-tolerant circuit and microarchitectural approaches
design automation conference | 2009
Keith A. Bowman; James W. Tschanz; Chris Wilkerson; Shih-Lien Lu; Tanay Karnik; Vivek De; Shekhar Borkar
Three circuit techniques for dynamic variation tolerance are presented: (i) Sensors with adaptive voltage and frequency circuits, (ii) Tunable replica circuits for timing-error prediction with error recovery, and (iii) Embedded error-detection sequential circuits with error recovery. These circuits mitigate the clock frequency guardbands for dynamic variations, thus improving microprocessor performance and energy-efficiency. These circuits are described with a focus on the different trade-offs in guardband reduction and design overhead. Opportunities for CAD to further enhance microprocessor performance and energy efficiency are offered.
IEEE Journal of Solid-state Circuits | 2011
Saurabh Dighe; Sriram R. Vangal; Paolo A. Aseron; Shasi Kumar; Tiju Jacob; Keith A. Bowman; Jason Howard; James W. Tschanz; Vasantha Erraguntla; Nitin Borkar; Vivek De; Shekhar Borkar
In this paper, we present measured within-die core-to-core Fmax and leakage variation data for an 80-core processor in 65 nm CMOS and 1) populate a parameterized energy/performance model to determine the most energy-efficient operating point for a workload; 2) examine impacts of per-core clock and power gating on optimal dynamic voltage-frequency-core scaling (DVFCS) operating points; and 3) compare improvements in energy efficiency achievable by variation-aware DVFCS and core mapping on Single-Voltage/Multiple-Frequency (SVMF), Multiple-Voltage/Single-Frequency (MVSF) and Multiple-Voltage/Multiple-Frequency (MVMF) designs. Variation-aware DVFS with optimal core mapping is shown to improve energy efficiency 6%-35% across a range of compute/communication activity workloads. A new dynamic thread hopping scheme boosts performance by 5%-10% or energy efficiency by 20%-60%.
international solid-state circuits conference | 2008
Keith A. Bowman; James W. Tschanz; Nam Sung Kim; Janice C. Lee; Chris Wilkerson; Shih Lien L. Lu; Tanay Karnik; Vivek De
Microprocessor clock frequency (FCLK) is traditionally determined based on maximum supply voltage (Vcc) droop and temperature specifications. Since typical usage patterns usually run at nominal Vcc and temperature, these infrequent dynamic variations severely limit FCLK. The concept of timing-error detection and correction in previous work by Ernst, D., et al, (2003) is extended and implemented in a test-chip in 65nm CMOS in Bai, P., et al, (2004) to explore the effectiveness of resilient circuits in eliminating Vcc and temperature FCLK guardbands as well as exploiting path-activation probabilities to maximize throughput (TP).
design automation conference | 2005
James W. Tschanz; Keith A. Bowman; Vivek De
Die-to-die and within-die variations impact the frequency and power of fabricated dies, affecting functionality, performance, and revenue. Variation-tolerant circuits and post-silicon tuning techniques are important for minimizing the impacts of these variations. This paper describes several circuit techniques that can be employed to ensure efficient circuit operation in the presence of ever-increasing variations.
international symposium on low power electronics and design | 2007
Keith A. Bowman; Alaa R. Alameldeen; Srikanth T. Srinivasan; Chris Wilkerson
A statistical performance simulator is developed to explore the impact of die-to-die (D2D) and within-die (WID) parameter variations on the distributions of maximum clock frequency (FMAX) and throughput for multi-core processors in a future 22nm technology. The simulator integrates a compact analytical throughput model, which captures the key dependencies of multi-core processors, into a statistical simulation framework that models the effects of D2D and WID parameter variations on critical path delays across a die. The salient contributions from this paper are: (1) Product-level variation analysis for multi-core processors must focus on throughput, rather than just FMAX, and (2) Multi-core processors are inherently more variation tolerant than single-core processors due to the larger impact of memory latency and bandwidth on overall throughput. To elucidate these two points, multi-core and single-core processors have a similar chip-level FMAX distribution (mean degradation of 9% and standard deviation of 5%) for multi-threaded applications. In contrast to single-core processors, memory latency and bandwidth constraints significantly limit the throughput dependency on FMAX in multi-core processors, thus reducing the throughput mean degradation and standard deviation by 50%. Since single-threaded applications running on a multi-core processor can execute on the fastest core, mean FMAX and throughput gains of 4% are achieved from the nominal design target.
international solid-state circuits conference | 2010
James W. Tschanz; Keith A. Bowman; Shih-Lien Lu; Paolo A. Aseron; Muhammad M. Khellah; Arijit Raychowdhury; Bibiche M. Geuskens; Chris Wilkerson; Tanay Karnik; Vivek De
Microprocessors experience a wide range of dynamic variations, including voltage droops, temperature changes, and device aging, which vary across applications and systems. The necessity of ensuring correct operation even under infrequent worst-case conditions results in clock frequency (FCLK) or supply voltage (VCC) guardbands that degrade performance and increase energy consumption. In this paper, a research microprocessor core is described with resilient and adaptive circuits to mitigate dynamic variation guardbands for maximizing throughput or minimizing energy. The resiliency features consist of embedded error-detection sequentials (EDS) [1-4] and tunable replica circuits (TRC) [5] in conjunction with error-recovery circuits to detect and correct timing errors. A new instruction-replay error-recovery technique is introduced to correct errant instructions with low performance cost and implementation overhead. In addition, the microprocessor contains an adaptive clock controller based on error statistics to operate at maximum efficiency across a range of dynamic variations.
international solid-state circuits conference | 2010
Saurabh Dighe; Sriram R. Vangal; Paolo A. Aseron; Shasi Kumar; Tiju Jacob; Keith A. Bowman; Jason Howard; James W. Tschanz; Vasantha Erraguntla; Nitin Borkar; Vivek De; Shekhar Borkar
Many-core processors with on-die network-on-chip (NoC) interconnects have emerged as viable architectures for Single-Instruction/Multiple-Data (SIMD) vector applications and parallel workloads, and have been implemented in 65nm CMOS with Dynamic Voltage-Frequency Scaling (DVFS). Chips with Single-Voltage/Single-Frequency (SVSF) for all cores running homogeneous threads as well as Multiple-Voltage/Multiple-Frequency (MVMF), running heterogeneous applications and using independent V/F control for each core, have been reported. Combination of DVFS with dynamic core-count scaling (or DVFCS) has been proposed to further improve performance & energy efficiency across varying workloads. With technology scaling, both leakage power and core-to-core variations in frequency (Fmax) & leakage due to within-die device parameter variations have become significant, thus creating the need for per-core power gating and variation-aware DVFCS. Recently, variation-aware core mapping has been investigated using high level architectural simulations and statistical variation models.
international solid-state circuits conference | 2010
Arijit Raychowdhury; Bibiche M. Geuskens; Jaydeep P. Kulkarni; James W. Tschanz; Keith A. Bowman; Tanay Karnik; Shih-Lien Lu; Vivek De; Muhammad M. Khellah
8T SRAM cell (Fig. 19.6.1) is commonly used in single-VCC microprocessor core for its performance critical low-level caches and multi-ported register-file arrays [1]. 8T cell offers fast read (RD) and write (WR), dual-port capability, and generally lower minimum Vcc (or VMIN) than the 6T cell. By using a decoupled single-ended RD port with domino-style hierarchical RD bit-line, 8T cell features fast RD evaluation path without causing access disturbance that limits RD VMIN in the 6T cell. Using the 8T cell in a half-select-free architecture eliminates pseudo-reads during partial writes, hence enabling WR VMIN optimization independent of RD.