Abhishek A. Sinkar
University of Wisconsin-Madison
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Abhishek A. Sinkar.
international symposium on microarchitecture | 2010
Erika Gunadi; Abhishek A. Sinkar; Nam Sung Kim; Mikko H. Lipasti
Bias temperature instability, hot-carrier injection, and gate-oxide wear out will cause severe lifetime degradation in the performance and the reliability of future CMOS devices. The design guard band to counter these negative effects will be too expensive, largely due to the worst-case behavior induced by the uneven utilization of devices on the chip. To mitigate these effects over a chip’s lifetime, this paper proposes Colt, a simple yet holistic scheme to balance the utilization of devices in a processor by equalizing the duty cycle ratio of circuits’internal nodes and the usage frequency of devices. Colt relies on alternating true-and complement-mode operations to equalize the duty cycle ratio of signals (thus the utilization of devices) inmost data path and storage devices. Colt also employs a pseudorandom indexing scheme to balance the usage of entries in storage structures that often exhibit highly uneven utilization of entries. Finally, an operand-swapping scheme equalizes utilization of the left and right operand data paths. The proposed mechanisms impose trivial overhead in area, complexity, power, and performance, while recapturing 27% of aging-induced performance degradation and improving meantime to failure by an estimated 40%.
high-performance computer architecture | 2013
Ulya R. Karpuzcu; Abhishek A. Sinkar; Nam Sung Kim; Josep Torrellas
While Near-Threshold Voltage Computing (NTC) is a promising approach to push back the manycore power wall, it suffers from a high sensitivity to parameter variations. One possible way to cope with variations is to use multiple on-chip voltage (Vdd) domains. However, this paper finds that such an approach is energy inefficient. Consequently, for NTC, we propose a manycore organization that has a single Vdd domain and relies on multiple frequency domains to tackle variation. We call it EnergySmart. For this approach to be competitive, it has to be paired with effective core assignment strategies and also support fine-grain (i.e., short-interval) DVFS. This paper shows that, at NTC, a simple chip with a single Vdd domain can deliver a higher performance per watt than one with multiple Vdd domains.
IEEE Transactions on Very Large Scale Integration Systems | 2014
Abhishek A. Sinkar; Hamid Reza Ghasemi; Michael J. Schulte; Ulya R. Karpuzcu; Nam Sung Kim
Per-core voltage domains can improve performance under a power constraint. Most commercial processors, however, only have a single voltage domain for all processor cores. This is because splitting the single voltage domain into per-core voltage domains and powering them with multiple off-chip voltage regulators (VRs) incur a high cost for the platform and package designs. Although using on-chip switching VRs can be an alternative solution, integrating high-quality inductors for VRs with cores has been a technical challenge. In this paper, we propose a cost-effective power delivery technique to support per-core voltage domains. Our technique is based on the observations that: 1) core-to-core (C2C) voltage variations are relatively small for most execution intervals when the voltages/frequencies are optimized to maximize performance under a power constraint and 2) per-core power-gating devices augmented with feedback control circuitry can serve as low-cost VRs that can provide high efficiency in situations like 1). Our experimental results show that processors using our technique can achieve power efficiency as high as those using the per-core on-chip switching VRs at a much lower cost.
design automation conference | 2012
Hamid Reza Ghasemi; Abhishek A. Sinkar; Michael J. Schulte; Nam Sung Kim
Per-core voltage domains can improve performance under a power constraint. Most commercial processors, however, only have one chip-wide voltage domain because splitting the voltage domain into per-core voltage domains and powering them with multiple off-chip voltage regulators (VRs) incurs a high cost for the platform and package designs. Although using on-chip switching VRs can be an alternative solution, integrating high-quality inductors and cores on the same chip has been a technical challenge. In this paper, we propose a cost-effective power delivery technique to support per-core voltage domains. Our technique is based on the observations that (i) core-to-core voltage variations are relatively small for most execution intervals when the voltages/frequencies are optimized to maximize performance under a power constraint and (ii) per-core power-gating devices augmented with small circuits can serve as low-cost VRs that can provide high efficiency in situations like (i). Our experimental results show that processors using our technique can achieve power efficiency as high as those using per-core on-chip switching VRs at much lower cost.
design, automation, and test in europe | 2012
Abhishek A. Sinkar; Hao Wang; Nam Sung Kim
Modern multi-core processors use power management techniques such as dynamic voltage and frequency scaling (DVFS) and clock gating (CG) which cause the processor to operate in various performance and power states depending on runtime workload characteristics. A voltage regulator (VR), which is designed to provide power to the processor at its highest performance level, can significantly degrade in efficiency when the processor operates in the deep power saving states. In this paper, we propose VR optimization techniques to improve the energy efficiency of the processor + VR system by using the workload dependent P- and C-state residency of real processors. Our experimental results for static VR optimization show up to 19%, 20%, and 4% reduction in energy consumption for workstation, mobile and server multi-core processors. We also investigate the effect of dynamically changing VR parameters on the energy efficiency compared to the static optimization.
international symposium on low power electronics and design | 2009
Nam Sung Kim; Jun Seomun; Abhishek A. Sinkar; Jungseob Lee; Tae Hee Han; Ken Choi; Youngsoo Shin
Manufactured dies exhibit a large spread of maximum frequency and leakage power due to process variations, which have been increasing with technology scaling. Reducing the spread is very important for maximizing the frequency and the yield of power-constrained designs, because otherwise many dies that do not satisfy frequency or power constraints would be discarded. In this paper, we propose two optimization methods to improve the maximum operating frequency and the yield using power gates that already exist in many power-constrained designs. In the first method, we consider the designs of multiple cores, where each of them can be independently power-gated. When each core shows different frequencies due to within-die variations, the strength of a power gate in each core is adjusted to make their maximum operating frequencies even. This allows faster cores to consume less active leakage power, reducing the total power consumption well below a power constraint in a globally-clocked design. We subsequently increase global supply voltage for higher overall frequency until the power constraint is satisfied. In our experiments assuming multicore processors with 2--16 cores, the maximum operating frequency was improved by 4-23%. In the second method, we take leaky-but-fast dies (which otherwise would be discarded) and adjust the strength of the power gates such that they can operate in an acceptable power and frequency region. The problem is extended to designs employing a frequency binning strategy, where we have an additional objective of maximizing the number of dies for higher frequency bins. In our experiments with ISCAS benchmark circuits, most discarded fast-but leaky dies were recovered using the second method.
international conference on vlsi design | 2009
Chunhua Yao; Kewal K. Saluja; Abhishek A. Sinkar
A complete Built-In Self-Test (BIST) solution based on word oriented Random Access Scan architecture (WOR-BIST), is proposed. Our WOR-BIST scheme reduces the test power consumption significantly due to reduced switching activity during scan operations. We also provide a greedy algorithm to reduce the test data volume and test application time. We performed logic simulation of the test vectors to show its impact on the average and peak power during testing. We implemented the scheme to demonstrate its impact on the chip area and timing performance. Application of our scheme to large ISCAS and ITC benchmark circuits shows that our scheme is superior in area, power and performance to the conventional multiple serial scan.
international symposium on quality electronic design | 2010
Abhishek A. Sinkar; Nam Sung Kim
Power-gating (PG) techniques have been widely used in modern digital ICs to reduce their standby leakage power during idle periods. Meanwhile, virtual supply voltage (VVDD) of a power-gated IC is a function of strength of a PG device and total current flowing through it. Thus, the VVDD level becomes susceptible to 1) negative bias temperature instability (NBTI) degradation that weakens the PG device over time and 2) temporal temperature variation that affects active leakage current (thus total current) of the IC. To account for the NBTI degradation, the PG device must be upsized such that it guarantees a minimum VVDD level that prevents any timing failure over chip lifetime. Moreover, the PG device is also sized for the worst-case voltage drop partly resulted by a large amount of active leakage current at high temperature. However, increasing the size of the PG device to consider both effects leads to higher VVDD (thus active leakage power) than necessary at low temperature and/or in early chip lifetime. To minimize active leakage power increase due to these effects, we propose two techniques that adjust strength of a PG device based on its usage and ICs temperature at runtime. Both techniques are applied to an experimental setup modeling total current consumption of an IC in 32nm technology and their efficacy is demonstrated in the presence of within-die spatial process and temperature variations. On average of 100 die samples, they can reduce active leakage power by up to 10% in early chip lifetime.
vlsi test symposium | 2009
Lin Xie; Azadeh Davoodi; Kewal K. Saluja; Abhishek A. Sinkar
Effects of fluctuations in circuit timing due to process and environmental variations are becoming increasingly important as we move into sub-45nm technology. Since the delay of each gate is dependent on its input vectors, the timing yield, the probability that the circuit meets the given timing constraint, varies with different primary input patterns. Traditional timing yield estimation approaches assumed worst case delay models for each gate over all its input vectors, which results in much pessimism. To overcome the aforementioned problems, this paper proposes a Monte Carlo based approach which can obtain a much tighter lower bound on the circuit timing yield compared to the existing timing yield estimation techniques. Specifically, our approach builds multiple input-vector-dependent variation-aware delay models for each logic gate, and considers the impact of false paths, both static and dynamic false paths, which are carefully selected from the likely timing-critical paths under variability. We demonstrate gradual improvement in the estimated timing yield in the simulation results, and show that the timing yield computed using traditional worst-case delay models is highly pessimistic.
international conference on computer aided design | 2013
Hao Wang; Abhishek A. Sinkar; Nam Sung Kim
Recent studies on near-threshold computing (NTC) investigated an optimum supply voltage which yields minimum energy per operation (Emin), and proposed various optimization techniques at the device, circuit, and architecture levels to further minimize Emn. However, most of these studies often overlooked the significance of (i) energy consumption of off-chip memory accesses; (ii) energy loss of voltage regulators (VRs); and (iii) the cost of chip area in NTC environment. In this paper, we first demonstrate the increasing significance of (i) and (ii) in NTC environment with a comprehensive set of device, circuit, and architectural-level models. Second, we explore technology optimization to improve the trade-off between platform energy and chip area considering (iii) in NTC environment. The experimental results show that our optimized technology achieves 4% to 21% energy reduction for various chip area constraints, achieving significant improvement in trade-off between platform energy and chip area for a wide range of parallel benchmarks.