David Michael Bull
University of Michigan
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by David Michael Bull.
international solid-state circuits conference | 2008
David T. Blaauw; Sudherssen Kalaiselvan; Kevin Lai; Wei-Hsiang Ma; Sanjay Pant; Shidhartha Das; David Michael Bull
We take advantage of these findings and propose a Razor II approach that introduces two components. First, instead of performing both error detection and correction in the FF, Razor II performs only detection in the FF, while correction is performed through architectural replay.
IEEE Journal of Solid-state Circuits | 2009
Shidhartha Das; Sanjay Pant; Wei Hsiang Ma; Sudherssen Kalaiselvan; Kevin Lai; David Michael Bull; David T. Blaauw
Traditional adaptive methods that compensate for PVT variations need safety margins and cannot respond to rapid environmental changes. In this paper, we present a design (RazorII) which implements a flip-flop with in situ detection and architectural correction of variation-induced delay errors. Error detection is based on flagging spurious transitions in the state-holding latch node. The RazorII flip-flop naturally detects logic and register SER. We implement a 64-bit processor in 0.13 mum technology which uses RazorII for SER tolerance and dynamic supply adaptation. RazorII based DVS allows elimination of safety margins and operation at the point of first failure of the processor. We tested and measured 32 different dies and obtained 33% energy savings over traditional DVS using RazorII for supply voltage control. We demonstrate SER tolerance on the RazorII processor through radiation experiments.
international solid-state circuits conference | 2011
David Michael Bull; Shidhartha Das; Karthik Shivashankar; Ganesh S. Dasika; Krisztian Flautner; David T. Blaauw
Razor is a hybrid technique for dynamic detection and correction of timing errors. A combination of error detecting circuits and micro-architectural recovery mechanisms creates a system that is robust in the face of timing errors, and can be tuned to an efficient operating point by dynamically eliminating unused timing margins. Savings from margin reclamation can be realized as per device power-efficiency improvement, or parametric yield improvement for a batch of devices. In this paper, we apply Razor to a 32 bit ARM processor with a micro-architecture design that has balanced pipeline stages with critical memory access and clock-gating enable paths. The design is fabricated on a UMC 65 nm process, using industry standard EDA tools, with a worst-case STA signoff of 724 MHz. Based on measurements on 87 samples from split-lots, we obtain 52% power reduction for the overall distribution at 1 GHz operation. We present error rate driven dynamic voltage and frequency scaling schemes where runtime adaptation to PVT variations and tolerance of fast transients is demonstrated. All Razor cells are augmented with a sticky error history bit, allowing precise diagnosis of timing errors over the execution of test vectors. We show potential for parametric yield improvement through energy-efficient operation using Razor.
IEEE Transactions on Very Large Scale Integration Systems | 2013
Paul N. Whatmough; Shidhartha Das; David Michael Bull; Izzat Darwazeh
In this paper, we present a novel circuit-level timing error mitigation technique, which aims to increase energy-efficiency of digital signal processing datapaths without loss of robustness. Timing errors are detected using razor flip-flops on critical-paths, and the error-rate feedback is used to control a dynamic voltage scaling control loop. In place of conventional razor error correction by replay, we propose a new approach to bound the magnitude of intermittent timing errors at the circuit level. A timing guard-band is created by shaping the path delay distribution such that the critical paths correspond to a group of least-significant bit registers. These end-points are ensured to be critical by modifying the topology of the final stage carry-merge adder, and by using tool-based device sizing. Hence, timing violations lead to weakly correlated logical errors of small magnitude in a mean-squared-error sense. We examine this approach in an finite-impulse response (FIR) filter and a 2-D discrete cosine transform implementation, in 32-nm CMOS. Power saving compared to a conventional design at iso-frequency is 21%-23% at the typical corner, while retaining a voltage guard-band to protect against fast transient changes in switching activity and supply noise. The impact on minimum clock period is small (16%-20%), as it does not necessitate the use of ripple-carry adders and also requires only a bare minimum of additional design effort.
international solid-state circuits conference | 2013
Paul N. Whatmough; Shidhartha Das; David Michael Bull
The unrelenting demands of wireless/multimedia DSP workloads necessitate specialized hardware to achieve higher performance and power efficiency. Razor systems offer even greater power efficiency by minimizing static supply voltage (VDD) guardbands for process/voltage/temperature (PVT) variation, while also providing a degree of resilience to general delay faults (e.g. SEUs). To date, Razor has only been demonstrated on silicon in the context of microprocessor pipelines [1][2]. Reported Algorithmic Noise Tolerance (ANT) circuits [3][4] operate at very high error rates, but rely on imbalanced ripple-carry adders and hence clock frequency (Fclk) is limited (50-88MHz). ANT also requires additional datapaths for error detection/correction, which cannot be clock gated in the absence of errors, increasing baseline area and power. Combining Razor error detection with algorithm-level correction enables high-Fclk datapaths and low-overheads. A 0.19mm2 16-tap Razor FIR datapath is fabricated in 65nm LP CMOS, with input and output SRAMs, tunable pulse-clock generator, BIST logic and an AHB slave on-chip bus interface (Fig. 24.5.1), demonstrating: 1) two distinct fixed-latency Razor error-correction techniques for real-time DSP datapaths: time-borrow tracking (TBT) and interpolation-based approximate error correction (AEC); 2) a Razor latch (RZL) circuit with reduced pessimism; 3) a 1GHz datapath, an order of magnitude improvement over [3][4] due to elimination of ripple-carry adders; 4) energy efficiency improvement of up to 37%.
design automation conference | 2008
Ganesh S. Dasika; Shidhartha Das; Kevin Fan; Scott A. Mahlke; David Michael Bull
Hardware accelerators are common in embedded systems that have high performance requirements but must still operate within stringent energy constraints. To facilitate short time-to-market and reduced non-recurring engineering costs, automatic systems that can rapidly generate hardware bearing both power and performance in mind are extremely attractive. This paper proposes the BLADES (Better-than-worst-case Loop Accelerator Design) system for automatically designing self-tuning hardware accelerators that dynamically select their best operating frequency and voltage based on environmental conditions, silicon variation, and input data characteristics. Errors in operation are detected by Razor flip-flops, and recovery is initiated. The architecture efficiently supports detection, rollback, and recovery to provide a highly adaptable and configurable loop accelerator. The overhead of deploying Razor flip-flops is significantly reduced by automatically chaining primitive computation operations together. Results on a range of loop accelerators show average energy savings of 32% gained by voltage scaling below the nominal supply voltage.
international solid-state circuits conference | 2015
Paul N. Whatmough; Shidhartha Das; Zacharias Hadjilambrou; David Michael Bull
The current trend for System-on-Chip (SoC) compute subsystems is to improve energy efficiency, while operating at a similar power budget as previous generations. Reduced supply voltages and increased transistor density affords SoCs composed of multiple clusters of CPUs and additional specialized compute engines. However, this comes at the cost of both increasing current, and increasing current density, to the extent that these systems are ultimately constrained by power delivery. Pathological AC supply noise conditions may arise due to sporadic combinations of system and micro-architectural events, and these effectively limit the energy efficiency of the system, as sufficient voltage margin must be deployed to guarantee these conditions do not result in system failure.
design automation conference | 2011
Paul N. Whatmough; Shidhartha Das; David Michael Bull; Izzat Darwazeh
In this paper, we present a novel circuit-level timing error mitigation technique, which aims to increase energy-efficiency when applying a known in situ error-detection and correction technique, called Razor, to DSP datapaths. Timing errors are detected using Razor flip-flops at critical-path endpoints and the error-rate feedback is used to control a dynamic voltage scaling (DVS) control loop. We propose a new approach to bound the magnitude of intermittent timing errors at the circuit level by introducing a guard-band over which timing errors are safely mitigated. The guard-band is achieved by shaping the path delay distribution such that the critical paths correspond to a group of LSB result registers. These end-points are ensured to be critical by modifying the topology of the final stage carry-merge adder and by using tool-based device sizing. Hence, timing violations lead to weakly correlated logical errors of small magnitude in a mean-squared-error sense. We applied this approach to a digital filter in 32nm CMOS. Power saving compared to a conventional design was 23%, over worst-case process and temperature corners.
international symposium on low power electronics and design | 2015
Shidhartha Das; Paul N. Whatmough; David Michael Bull
Power delivery is a well-known challenge for high-end microprocessor systems. Comparatively, mobile computing platforms typically consume order-of-magnitude lower currents, but economic and volume constraints limit the quality of the Power Delivery Network. In addition, the trend towards GHz+ operating frequencies and the ubiquity of low-power techniques such as clock-gating and power-gating, make these systems susceptible to pathological AC transients. Consequently, mobile computing systems are ultimately limited by power-delivery. In this paper, we present the system-level Power Delivery Network (PDN) modeling, analysis and measurement results on a dual-core 64bit ARM Cortex-A57 compute cluster in 28nm CMOS. We present a comprehensive analysis of the PDN by characterizing the individual contribution of each constituent i.e. the PCB, package and the die. We present frequency- and time-domain simulation results and correlate that with measurement (both on-chip and off-chip). Our results demonstrate how complex software and micro-architectural interactions can trigger PDN resonances that ultimately lead to system failure.
IEEE Transactions on Circuits and Systems | 2014
Shidhartha Das; Ganesh S. Dasika; Karthik Shivashankar; David Michael Bull
Dynamic adaptation using Razor-based detection and correction of timing errors has demonstrated substantial improvements in performance and energy-efficiency in microprocessors. In this work, we apply Razor to hardware accelerators that find increasing application in System-on-Chip designs with high-performance requirements that must be delivered under stringent power budgets. We describe the implementation and silicon measurement results from a Razor-based hardware loop-accelerator (RZLA), implementing the Sobel edge-detection algorithm. Unlike in microprocessors, the RZLA pipeline is datapath-dominated with statically-scheduled control that has queue-based storage structures which are simply extended to support check-pointing and recovery. We exploit these characteristics typical of DSP and image-processing accelerators to implement Razor recovery in manner that is amenable to RTL validation and verification. We show a low-overhead pulsed-latch based Razor Flip-flop (RFF) architecture that adds only a single extra transistor on clock to minimize clock power overhead. The RFF is deployed in conjunction with a level-sensitive latch-insertion based algorithm to address the minimum-delay constraint present in all Razor systems. This algorithm enables the use of 50% of the clock period for timing speculation leading to robust error detection and correction across a wide dynamic voltage- and frequency-scaling range. Fabricated in 65 nm CMOS, the RZLA reclaims voltage margins to demonstrate 34% energy-efficiency improvements on a per-device basis and 33% overall, for the entire batch of devices at 1 GHz operation.