Bibiche M. Geuskens
Intel
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Bibiche M. Geuskens.
IEEE Journal of Solid-state Circuits | 2011
Keith A. Bowman; James W. Tschanz; Shih-Lien Lu; Paolo A. Aseron; Muhammad M. Khellah; Arijit Raychowdhury; Bibiche M. Geuskens; Chris Wilkerson; Tanay Karnik; Vivek De
A 45 nm microprocessor core integrates resilient error-detection and recovery circuits to mitigate the clock frequency (FCLK) guardbands for dynamic parameter variations to improve throughput and energy efficiency. The core supports two distinct error-detection designs, allowing a direct comparison of the relative trade-offs. The first design embeds error-detection sequential (EDS) circuits in critical paths to detect late timing transitions. In addition to reducing the Fclk guardbands for dynamic variations, the embedded EDS design can exploit path-activation rates to operate the microprocessor faster than infrequently-activated critical paths. The second error-detection design offers a less-intrusive approach for dynamic timing-error detection by placing a tunable replica circuit (TRC) per pipeline stage to monitor worst-case delays. Although the TRCs require a delay guardband to ensure the TRC delay is always slower than critical-path delays, the TRC design captures most of the benefits from the embedded EDS design with less implementation overhead. Furthermore, while core min-delay constraints limit the potential benefits of the embedded EDS design, a salient advantage of the TRC design is the ability to detect a wider range of dynamic delay variation, as demonstrated through low supply voltage (VCC) measurements. Both error-detection designs interface with error-recovery techniques, enabling the detection and correction of timing errors from fast-changing variations such as high-frequency VCC droops. The microprocessor core also supports two separate error-recovery techniques to guarantee correct execution even if dynamic variations persist. The first technique requires clock control to replay errant instructions at 1/2FCLK. In comparison, the second technique is a new multiple-issue instruction replay design that corrects errant instructions with a lower performance penalty and without requiring clock control. Silicon measurements demonstrate that resilient circuits enable a 41% throughput gain at equal energy or a 22% energy reduction at equal throughput, as compared to a conventional design when executing a benchmark program with a 10% VCC droop. In addition, the microprocessor includes a new adaptive clock control circuit that interfaces with the resilient circuits and a phase-locked loop (PLL) to track recovery cycles and adapt to persistent errors by dynamically changing Fclk f°Γ maximum efficiency.
international solid-state circuits conference | 2010
James W. Tschanz; Keith A. Bowman; Shih-Lien Lu; Paolo A. Aseron; Muhammad M. Khellah; Arijit Raychowdhury; Bibiche M. Geuskens; Chris Wilkerson; Tanay Karnik; Vivek De
Microprocessors experience a wide range of dynamic variations, including voltage droops, temperature changes, and device aging, which vary across applications and systems. The necessity of ensuring correct operation even under infrequent worst-case conditions results in clock frequency (FCLK) or supply voltage (VCC) guardbands that degrade performance and increase energy consumption. In this paper, a research microprocessor core is described with resilient and adaptive circuits to mitigate dynamic variation guardbands for maximizing throughput or minimizing energy. The resiliency features consist of embedded error-detection sequentials (EDS) [1-4] and tunable replica circuits (TRC) [5] in conjunction with error-recovery circuits to detect and correct timing errors. A new instruction-replay error-recovery technique is introduced to correct errant instructions with low performance cost and implementation overhead. In addition, the microprocessor contains an adaptive clock controller based on error statistics to operate at maximum efficiency across a range of dynamic variations.
international solid-state circuits conference | 2010
Arijit Raychowdhury; Bibiche M. Geuskens; Jaydeep P. Kulkarni; James W. Tschanz; Keith A. Bowman; Tanay Karnik; Shih-Lien Lu; Vivek De; Muhammad M. Khellah
8T SRAM cell (Fig. 19.6.1) is commonly used in single-VCC microprocessor core for its performance critical low-level caches and multi-ported register-file arrays [1]. 8T cell offers fast read (RD) and write (WR), dual-port capability, and generally lower minimum Vcc (or VMIN) than the 6T cell. By using a decoupled single-ended RD port with domino-style hierarchical RD bit-line, 8T cell features fast RD evaluation path without causing access disturbance that limits RD VMIN in the 6T cell. Using the 8T cell in a half-select-free architecture eliminates pseudo-reads during partial writes, hence enabling WR VMIN optimization independent of RD.
IEEE Journal of Solid-state Circuits | 2014
Rinkle Jain; Bibiche M. Geuskens; Stephen T. Kim; Muhammad M. Khellah; Jaydeep P. Kulkarni; James W. Tschanz; Vivek De
A fully integrated switched capacitor voltage regulator (SCVR) with on-die high density MIM capacitor, distributed across a 14 KB register file (RF) load is demonstrated in 22 nm tri-gate CMOS. The multi-conversion-ratio SCVR provides a wide output voltage range of 0.45-1 V from a fixed input voltage of 1.225 V. It achieves 63-84% conversion efficiency and supports a maximum load current density of 0.88 A/mm2. The area overhead of the dedicated SCVR on the load is 3.6%. Measured data is presented on various performance indices in detail. Subsequent learning on tradeoffs between various factors like capacitance characteristics, conversion efficiency and current density are delineated and, correlated with theoretical estimates. Performance of RF array shows comparable results when powered with the SCVR and the external rail. The all-digital, modular design allows efficient spatial distribution across the load and hence robust power delivery. The extremely fast response times in the order of few nanoseconds is targeted to benefit agile power management. This work evinces voltage regulator technology as a standard homogenous CMOS component, which can proliferate DVFS domains for maximum energy and area benefits.
IEEE Transactions on Circuits and Systems | 2011
Keith A. Bowman; James W. Tschanz; Arijit Raychowdhury; Muhammad M. Khellah; Bibiche M. Geuskens; Shih-Lien Lu; Paolo A. Aseron; Tanay Karnik; Vivek De
A 45 nm microprocessor integrates an all-digital dynamic variation monitor (DVM) to continuously measure the impact of dynamic parameter variations on circuit-level performance to enhance silicon debug and adaptive clock control. The DVM consists of a tunable replica circuit, a time-to-digital converter, and multiplexers to measure circuit delay or frequency changes with less than a 1% measured resolution error while capturing clock-to-data correlations. In validating the DVM with microprocessor maximum clock frequency (FMAX) measurements, an on-die noise injector circuit induces a supply voltage (VCC) droop at a particular cycle in the test program. The FMAX measurement is then repeated for over a thousand iterations while shifting the droop placement to a different cycle per iteration. Silicon measurements demonstrate the DVM capability of tracking the worst case FMAX reduction to within 1% for a wide range of VCC droop profiles. Furthermore, silicon measurements reveal that FMAX is highly sensitive to the placement and magnitude of a high-frequency VCC droop during program execution, thus highlighting the value of the DVM for silicon debug. In addition, the DVM interfaces with an adaptive clock control circuit to dynamically adjust the clock frequency by changing the divide ratio in the phase-locked loop in response to persistent variations, enabling the microprocessor to adapt to the operating environment for maximum efficiency.
international solid-state circuits conference | 2012
Jaydeep P. Kulkarni; Bibiche M. Geuskens; Tanay Karnik; Muhammad M. Khellah; James W. Tschanz; Vivek De
High-performance microprocessors and SoCs include multiple embedded memory arrays used as register files and low-level caches that typically share the same supply voltage as the core. The desire for wide voltage range operation to optimize power and performance dictates the need for SRAM arrays that can achieve both high performance and low minimum voltage of operation (VMIN). The 8T bitcell is commonly used in these applications because its decoupled read and write ports offer fast read (RD) and write (WR) operations with generally lower VMIN than the 6T bitcell. However, process variations result in mismatches between the pull-up and access devices limiting write VMIN, and/or between read port and keeper transistors limiting read VMIN. Traditional device up-sizing provides diminishing returns at a large area and power cost. In addition to cell upsizing, dynamic assist techniques have been used for VMIN reduction in 6T and 8T arrays - examples include temporary collapse of bitcell voltage for write VMIN reduction and boosting read and write wordlines requiring careful design of the embedded charge pump and the level shifters. In contrast, this paper describes a new capacitive-coupling (CC) write wordline boost which employs intrinsic coupling capacitance between write bitlines (WBL) and accessed write wordline (WWL) to boost WWL without the need for a charge pump or complex level shifters. The scheme has a built-in self-induced VCC collapse (SIC) allowing the cell voltage to partially collapse during the write operation, further improving write VMIN. The technique is implemented in a 12KB, 8T cell macro with cell area of 0.238μm2, fabricated in a 22nm CMOS technology.
symposium on vlsi circuits | 2010
Arijit Raychowdhury; Bibiche M. Geuskens; Keith A. Bowman; James W. Tschanz; Shih-Lien Lu; Tanay Karnik; Muhammad M. Khellah; Vivek De
Infrequent dynamic events like VCC droops and temperature changes result in the use of a static VCC guard-band. Measured data on a 16KB 8T array featuring tunable replica bits illustrate the opportunity of eliminating a majority of the static guard-band in memory arrays, resulting in lower operating VCC/power.
custom integrated circuits conference | 2010
Keith A. Bowman; James W. Tschanz; Arijit Raychowdhury; Muhammad M. Khellah; Bibiche M. Geuskens; Shih-Lien Lu; Paolo A. Aseron; Tanay Karnik; Vivek De
A 45nm microprocessor integrates an all-digital dynamic variation monitor (DVM), consisting of a tunable replica circuit with a time-to-digital converter, to measure the impact of dynamic variations on path-level delay or frequency. Measurements reveal the high sensitivity of the microprocessor maximum clock frequency (FMAX) to the placement and magnitude of a high-frequency supply voltage (VCC) droop and demonstrate the DVM capability of tracking FMAX changes to within 1% for a wide range of VCC droop profiles. Furthermore, the DVM interfaces with an adaptive clock control circuit to dynamically change the clock frequency in response to dynamic variations, enabling the microprocessor to operate at maximum efficiency.
IEEE Journal of Solid-state Circuits | 2011
Arijit Raychowdhury; Bibiche M. Geuskens; Keith A. Bowman; James W. Tschanz; Shih-Lien Lu; Tanay Karnik; Muhammad M. Khellah; Vivek De
Infrequent dynamic events like V<sub>CC</sub> droops and temperature changes result in the use of a static V<sub>CC</sub> guardband in 8T SRAM arrays. This paper proposes the use of tunable replica bits (TRBs) as a potential solution to mitigating a part of the V<sub>CC</sub> guardband. Measured data on a 16 KB 8T array featuring tun able replica bits illustrate 9% reduction of the operating minimum V<sub>CC</sub> (V<sub>MIN</sub>) and correspondingly a 7.5% reduction in array power.
international conference on electrical and control engineering | 2006
Ataur R. Patwary; Hans J. Greub; Zhongfeng Wang; Bibiche M. Geuskens
As the leakage current keeps increasing in every generation of VLSI processing technology, appropriate selection of the local and global bit-line organization of Register Files (RFs) becomes an important design issue for low-power and high-performance applications. In this paper, several different bit-line organizations are proposed based on simulation results for designing low-power and high-performance RFs using 65 nm CMOS devices, while maintaining maximum robustness against noise.