John Sartori | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where John Sartori is active.

Explore More

Publication

Featured researches published by John Sartori.

design, automation, and test in europe | 2010

Scalable stochastic processors

Sriram Narayanan; John Sartori; Rakesh Kumar; Douglas L. Jones

Future microprocessors increasingly rely on an unreliable CMOS fabric due to aggressive scaling of voltage and frequency, and shrinking design margins. Fortunately, many emerging applications can tolerate computational errors caused by hardware unreliabilities, at least during certain execution intervals. In this paper, we propose scalable stochastic processors, a computing platform for error-tolerant applications that is able to scale gracefully according to performance demands and power constraints while producing outputs that are, in the worst case, stochastically correct. Scalability is achieved by exposing to the application layer multiple functional units that differ in their architecture but share functionality. A mobile video encoding application here is able to achieve the lowest power consumption at any bitrate demand by dynamically switching between functional-unit architectures.

asia and south pacific design automation conference | 2010

Slack redistribution for graceful degradation under voltage overscaling

Andrew B. Kahng; Seokhyeong Kang; Rakesh Kumar; John Sartori

Modern digital IC designs have a critical operating point, or “wall of slack”, that limits voltage scaling. Even with an error-tolerance mechanism, scaling voltage below a critical voltage - so-called overscaling - results in more timing errors than can be effectively detected or corrected. This limits the effectiveness of voltage scaling in trading off system reliability and power. We propose a design-level approach to trading off reliability and voltage (power) in, e.g., microprocessor designs. We increase the range of voltage values at which the (timing) error rate is acceptable; we achieve this through techniques for power-aware slack redistribution that shift the timing slack of frequently-exercised, near-critical timing paths in a power- and area-efficient manner. The resulting designs heuristically minimize the voltage at which the maximum allowable error rate is encountered, thus minimizing power consumption for a prescribed maximum error rate and allowing the design to fail more gracefully. Compared with baseline designs, we achieve a maximum of 32.8% and an average of 12.5% power reduction at an error rate of 2%. The area overhead of our techniques, as evaluated through physical implementation (synthesis, placement and routing), is no more than 2.7%.

high-performance computer architecture | 2010

Designing a processor from the ground up to allow voltage/reliability tradeoffs

Andrew B. Kahng; Seokhyeong Kang; Rakesh Kumar; John Sartori

Current processor designs have a critical operating point that sets a hard limit on voltage scaling. Any scaling beyond the critical voltage results in exceeding the maximum allowable error rate, i.e., there are more timing errors than can be effectively and gainfully detected or corrected by an error-tolerance mechanism. This limits the effectiveness of voltage scaling as a knob for reliability/power tradeoffs. In this paper, we present power-aware slack redistribution, a novel design-level approach to allow voltage/reliability tradeoffs in processors. Techniques based on power-aware slack redistribution reapportion timing slack of the frequently-occurring, near-critical timing paths of a processor in a power- and area-efficient manner, such that we increase the range of voltages over which the incidence of operational (timing) errors is acceptable. This results in soft architectures — designs that fail gracefully, allowing us to perform reliability/power tradeoffs by reducing voltage up to the point that produces maximum allowable errors for our application. The goal of our optimization is to minimize the voltage at which a soft architecture encounters the maximum allowable error rate, thus maximizing the range over which voltage scaling is possible and minimizing power consumption for a given error rate. Our experiments demonstrate 23% power savings over the baseline design at an error rate of 1%. Observed power reductions are 29%, 29%, 19%, and 20% for error rates of 2%, 4%, 8%, and 16% respectively. Benefits are higher in the face of error recovery using Razor. Area overhead of our techniques is up to 2.7%.

design, automation, and test in europe | 2011

On the efficacy of NBTI mitigation techniques

Tuck Boon Chan; John Sartori; Puneet Gupta; Rakesh Kumar

Negative Bias Temperature Instability (NBTI) has become an important reliability issue in modern semiconductor processes. Recent work has attempted to address NBTI-induced degradation at the architecture level. However, such work has relied on device-level analytical models that, we argue, are limited in their flexibility to model the impact of architecture-level techniques on NBTI degradation. In this paper, we propose a flexible numerical model for NBTI degradation that can be adapted to better estimate the impact of architecture-level techniques on NBTI degradation. Our model is a numerical solution to the reaction-diffusion equations describing NBTI degradation that has been parameterized to model the impact of dynamic voltage scaling, averaging effects across logic paths, power gating, and activity management We use this model to understand the effectiveness of different classes of architecture-level techniques that have been proposed to mitigate the effects of NBTI. We show that the potential benefits from these techniques are, for the most part, smaller than what has been previously suggested, and that guardbanding may still be an efficient way to deal with aging.

IEEE Transactions on Multimedia | 2013

Branch and Data Herding: Reducing Control and Memory Divergence for Error-Tolerant GPU Applications

John Sartori; Rakesh Kumar

Control and memory divergence between threads within the same execution bundle, or warp, have been shown to cause significant performance bottlenecks for GPU applications. In this paper, we exploit the observation that many GPU applications exhibit error tolerance to propose branch and data herding. Branch herding eliminates control divergence by forcing all threads in a warp to take the same control path. Data herding eliminates memory divergence by forcing each thread in a warp to load from the same memory block. To safely and efficiently support branch and data herding, we propose a static analysis and compiler framework to prevent exceptions when control and data errors are introduced, a profiling framework that aims to maximize performance while maintaining acceptable output quality, and hardware optimizations to improve the performance benefits of exploiting error tolerance through branch and data herding. Our software implementation of branch herding on NVIDIA GeForce GTX 480 improves performance by up to 34% (13%, on average) for a suite of NVIDIA CUDA SDK and Parboil benchmarks. Our hardware implementation of branch herding improves performance by up to 55% (30%, on average). Data herding improves performance by up to 32% (25%, on average). Observed output quality degradation is minimal for several applications that exhibit error tolerance, especially for visual computing applications.

compilers, architecture, and synthesis for embedded systems | 2011

Stochastic computing: Embracing errors in architecture and design of processors and applications

John Sartori; Joseph Sloan; Rakesh Kumar

As device sizes shrink, device-level manufacturing challenges have led to increased variability in physical circuit characteristics. Exponentially increasing circuit density has not only brought about concerns in the reliable manufacturing of circuits, but has also exaggerated variations in dynamic circuit behavior. The resulting uncertainty in performance, power, and reliability imposed by compounding static and dynamic non-determinism threatens to halt the continuation of Moores law, which has been arguably the primary driving force behind technology and innovation for decades. As the marginal benefits of technology scaling continue to languish, a new vision for stochastic computing has begun to emerge. Rather than hiding variations under expensive guardbands, designers have begun to relax traditional correctness constraints and deliberately expose hardware variability to higher levels of the compute stack, thus tapping into potentially significant performance and energy benefits, while exploiting software and hardware error resilience to tolerate errors. In this paper, we present our vision for design, architecture, compiler, and application-level stochastic computing techniques that embrace errors in order to ensure the continued viability of semiconductor scaling.

design, automation, and test in europe | 2009

Distributed peak power management for many-core architectures

John Sartori; Rakesh Kumar

Recently proposed techniques for peak power management involve centralized decision-making and assume quick evaluation of the various power management states. These techniques do not prevent instantaneous power from exceeding the peak power budget, but instead trigger corrective action when the budget has been exceeded. Similarly, they are not suitable for many-core architectures (processors with tens or possibly hundreds of cores on the same die) due to an exponential explosion in the number of global power management states. In this paper, we look at a hierarchical and a gradient ascent-based technique for decentralized peak power management for many-core architectures. The proposed techniques prevent power from exceeding the peak power budget and enable the placement of several more cores on a die than what the power budget would normally allow. We show up to 47% (33% on average) improvements in throughput for a given power budget. Our techniques outperform the static oracle by 22%.

design automation conference | 2010

Recovery-driven design: a power minimization methodology for error-tolerant processor modules

Andrew B. Kahng; Seokhyeong Kang; Rakesh Kumar; John Sartori

Conventional CAD methodologies optimize a processor module for correct operation, and prohibit timing violations during nominal operation. In this paper, we propose recovery-driven design, a design approach that optimizes a processor module for a target timing error rate instead of correct operation. We show that significant power benefits are possible from a recovery-driven design flow that deliberately allows errors caused by voltage overscaling to occur during nominal operation, while relying on an error recovery technique to tolerate these errors. We present a detailed evaluation and analysis of such a CAD methodology that minimizes the power of a processor module for a target error rate. We demonstrate power benefits of up to 25%, 19%, 22%, 24%, 20%, 28%, and 20% versus traditional P&R at error rates of 0.125%, 0.25%, 0.5%, 1%, 2%, 4%, and 8%, respectively. Coupling recovery-driven design with an error recovery technique enables increased efficiency and additional power savings.

IEEE Transactions on Very Large Scale Integration Systems | 2013

Enhancing the Efficiency of Energy-Constrained DVFS Designs

Andrew B. Kahng; Seokhyeong Kang; Rakesh Kumar; John Sartori

The proliferation of embedded systems and mobile devices has created an increasing demand for low-energy hardware. Dynamic voltage and frequency scaling (DVFS) is a popular energy reduction technique that allows a hardware design to reduce average power consumption while still enabling the design to meet a high-performance target when necessary. To conserve energy, many DVFS-based embedded and mobile devices often spend a large fraction of their lifetimes in a low-power mode. However, DVFS designs produced by conventional multimode CAD flows tend to have significant energy overheads when operating outside of the peak performance mode, even when they are operating in a low-power mode. A dedicated core can be added for low-energy operation, but has a high cost in terms of area and leakage. In this paper, we explore the DVFS design space to identify the factors that affect DVFS efficiency. Based on our insights, we propose two design-level techniques to enhance the energy efficiency of DVFS for energy constrained systems. First, we present a context-aware DVFS design flow that considers the intrinsic characteristics of the hardware design, as well as the operating scenario-including the relative amounts of time spent in different modes, the range of performance scalability, and the target efficiency metric-to optimize the design for maximum energy efficiency. We also present a selective replication-based DVFS design methodology that identifies hardware modules for which context-aware multimode design may be inefficient and creates dedicated module replicas for different operating modes for such modules. We show that context-aware design can reduce average power by up to 20% over a conventional multimode design flow. Selective replication can reduce average power by an additional 4%. We also use the generated insights to identify microarchitectural decisions that impact DVFS efficiency. We show that the benefits from the proposed design-level techniques increase when microarchitectural transformations are allowed.

high performance computer architecture | 2012

Power balanced pipelines

John Sartori; Ben Ahrens; Rakesh Kumar

Since the onset of pipelined processors, balancing the delay of the microarchitectural pipeline stages such that each microarchitectural pipeline stage has an equal delay has been a primary design objective, as it maximizes instruction throughput. Unfortunately, this causes significant energy inefficiency in processors, as each microarchitectural pipeline stage gets the same amount of time to complete, irrespective of its size or complexity. For power-optimized processors, the inefficiency manifests itself as a significant imbalance in power consumption of different microarchitectural pipestages. In this paper, rather than balancing processor pipelines for delay, we propose the concept of power balanced pipelines - i.e., processor pipelines in which different delays are assigned to different microarchitectural pipestages to reduce the power disparity between the stages while guaranteeing the same processor frequency/performance. A specific implementation of the concept uses cycle time stealing [19] to deliberately redistribute cycle time from low-power pipeline stages to power-hungry stages, relaxing their timing constraints and allowing them to operate at reduced voltages or use smaller, less leaky cells. We present several static and dynamic techniques for power balancing and demonstrate that balancing pipeline power rather than delay can result in 46% processor power reduction with no loss in processor throughput for a full FabScalar processor over a power-optimized baseline. Benefits are comparable over a Fabscalar baseline where static cycle time stealing is used to optimize achieved frequency. Power savings increase at lower operating frequencies. To the best of our knowledge, this is the first such work on microarchitecture-level power reduction that guarantees the same performance.

Explore More