Shrikanth Ganapathy
Polytechnic University of Catalonia
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Shrikanth Ganapathy.
international symposium on computer architecture | 2013
Naifeng Jing; Yao Shen; Yao Lu; Shrikanth Ganapathy; Zhigang Mao; Minyi Guo; Ramon Canal; Xiaoyao Liang
The heavily-threaded data processing demands of streaming multiprocessors (SM) in a GPGPU require a large register file (RF). The fast increasing size of the RF makes the area cost and power consumption unaffordable for traditional SRAM designs in the future technologies. In this paper, we propose to use embedded-DRAM (eDRAM) as an alternative in future GPGPUs. Compared with SRAM, eDRAM provides higher density and lower leakage power. However, the limited data retention time in eDRAM poses new challenges. Periodic refresh operations are needed to maintain data integrity. This is exacerbated with the scaling of eDRAM density, process variations and temperature. Unlike conventional CPUs which make use of multi-ported RF, most of the RFs in modern GPGPU are heavily banked but not multi-ported to reduce the hardware cost. This provides a unique opportunity to hide the refresh overhead. We propose two different eDRAM implementations based on 3T1D and 1T1C memory cells. To mitigate the impact of periodic refresh, we propose two novel refresh solutions using bank bubble and bank walk-through. Plus, for the 1T1C RF, we design an interleaved bank organization together with an intelligent warp scheduling strategy to reduce the impact of the destructive reads. The analysis shows that our schemes present better energy efficiency, scalability and variation tolerance than traditional SRAM-based designs.
design, automation, and test in europe | 2010
Shrikanth Ganapathy; Ramon Canal; Antonio González; Antonio Rubio
With every process generation, the problem of variability in physical parameters and environmental conditions poses a great challenge to the design of fast and reliable circuits. Propagation delays which decide circuit performance are likely to suffer the most from this phenomena. While Statistical static timing analysis (SSTA) is used extensively for this purpose, it does not account for dynamic conditions during operation. In this paper, we present a multivariate regression based technique that computes the propagation delay of circuits subject to manufacturing process variations in the presence of temporal variations like temperature. It can be used to predict the dynamic behavior of circuits under changing operating conditions. The median error between the proposed model and circuit-level simulations is below 5%. With this model, we ran a study of the effect of temperature on access time delays for 500 cache samples. The study was run in 0.557 seconds, compared to the 20h and 4min of the SPICE simulation achieving a speedup of over 1×105. As a case study, we show that the access times of caches can vary as much as 2.03X at high temperatures in future technologies under process variations.
design automation conference | 2015
Shrikanth Ganapathy; Georgios Karakonstantis; Adam Teman; Andreas Burg
Inherently error-resilient applications in areas such as signal processing, machine learning and data analytics provide opportunities for relaxing reliability requirements, and thereby reducing the overhead incurred by conventional error correction schemes. In this paper, we exploit the tolerable imprecision of such applications by designing an energy-efficient fault-mitigation scheme for unreliable data memories to meet target yield. The proposed approach uses a bit-shuffling mechanism to isolate faults into bit locations with lower significance. This skews the bit-error distribution towards the low order bits, substantially limiting the output error magnitude. By controlling the granularity of the shuffling, the proposed technique enables trading-off quality for power, area, and timing overhead. Compared to error-correction codes, this can reduce the overhead by as much as 83% in read power, 77% in read access time, and 89% in area, when applied to various data mining applications in 28nm process technology.
international conference on computer design | 2012
Shrikanth Ganapathy; Ramon Canal; Dan Alexandrescu; Enrico Costenaro; Antonio González; Antonio Rubio
In view of device scaling issues, embedded DRAM (eDRAM) technology is being considered as a strong alternative to conventional SRAM for use in on-chip memories. Memory cells designed using eDRAM technology in addition to being logic-compatible, are variation tolerant and immune to noise present at low supply voltages. However, two major causes of concern are the data retention capability which is worsened by parameter variations leading to frequent data refreshes (resulting in large dynamic power overhead) and the transient reduction of stored charge increasing soft-error (SE) susceptibility. In this paper, we present a novel variation-tolerant 4T-DRAM cell whose power consumption is 20.4% lower when compared to a similar sized eDRAM cell. The retention time on-average is improved by 2.04X while incurring a delay overhead of 3% on the read-access time. Most importantly, using a soft-error (SE) rate analysis tool, we have confirmed that the cell sensitivity to SEs is reduced by 56% on-average in a natural working environment.
international new circuits and systems conference | 2015
Shrikanth Ganapathy; Adam Teman; Robert Giterman; Andreas Burg; Georgios Karakonstantis
Embedded memories account for a large fraction of the overall silicon area and power consumption in modern SoC(s). While embedded memories are typically realized with SRAM, alternative solutions, such as embedded dynamic memories (eDRAM), can provide higher density and/or reduced power consumption. One major challenge that impedes the widespread adoption of eDRAM is that they require frequent refreshes potentially reducing the availability of the memory in periods of high activity and also consuming significant amount of power due to such frequent refreshes. Reducing the refresh rate while on one hand can reduce the power overhead, if not performed in a timely manner, can cause some cells to lose their content potentially resulting in memory errors. In this paper, we consider extending the refresh period of gain-cell based dynamic memories beyond the worst-case point of failure, assuming that the resulting errors can be tolerated when the use-cases are in the domain of inherently error-resilient applications. For example, we observe that for various data mining applications, a large number of memory failures can be accepted with tolerable imprecision in output quality. In particular, our results indicate that by allowing as many as 177 errors in a 16 kB memory, the maximum loss in output quality is 11%. We use this failure limit to study the impact of relaxing reliability constraints on memory availability and retention power for different technologies.
design, automation, and test in europe | 2014
Shrikanth Ganapathy; Ramon Canal; Dan Alexandrescu; Enrico Costenaro; Antonio González; Antonio Rubio
With the growing importance of parametric (process and environmental) variations in advanced technologies, it has become a serious challenge to design reliable, fast and low-power embedded memories. Adopting a variation-aware design paradigm requires a holistic perspective of memory-wide metrics such as yield, power and performance. However, accurate estimation of such metrics is largely dependent on circuit implementation styles, technology parameters and architecture-level specifics. In this paper, we propose a fully automated tool - INFORMER - that helps high-level designers estimate memory reliability metrics rapidly and accurately. The tool relies on accurate circuit-level simulations of failure mechanisms such as soft-errors and parametric failures. The statistics obtained can then help couple low-level metrics with higher-level design choices. A new technique for rapid estimation of low-probability failure events is also proposed. We present three use-cases of our prototype tool to demonstrate its diverse capabilities in autonomously guiding large SRAM based robust memory designs.
international symposium on quality electronic design | 2013
Shrikanth Ganapathy; Ramon Canal; Antonio González; Antonio Rubio
Modern day microprocessors effectively utilise supply voltage scaling for tremendous power reduction. The minimum voltage beyond which a processor cannot operate reliably is defined as V ddmin. On-chip memories like caches are the most susceptible to voltage-noise induced failures because of process variations and reduced noise-margins thereby arbitrating whole processors V ddmin. In this paper, we evaluate the effectiveness of a new class of hybrid techniques in improving cache yield through failure prevention and correction. Proactive read/write assist techniques like body-biasing (BB) and wordline boosting (WLB) when combined with reactive techniques like ECC and redundancy are shown to offer better quality-energy-area trade offs when compared to their standalone configurations. Proactive techniques can help lower V ddmin (improving functional margin) for significant power savings and reactive techniques ensure that the resulting large number of failures are corrected (improving functional yield). Our results in 22nm technology indicate that at scaled supply voltages, hybrid techniques can improve parametric yield by atleast 28% when considering worst-case process variations.
international symposium on low power electronics and design | 2010
Shrikanth Ganapathy; Ramon Canal; Antonio González; Antonio Rubio
Estimation of static and dynamic energy of caches is critical for high-performance low-power designs. Commercial CAD tools performing energy estimation statically are not aware of the changing operating and environmental conditions which makes the problem of energy estimation more dynamic in nature. It is worsened by process induced variations of low level parameters like threshold voltage and channel length. In this paper we present MODEST, a proposal for estimating the static and dynamic energy of caches taking into account spatial variations of physical parameters, temporal changes of supply voltage and environmental factors like temperature. It can be used to estimate the energy of different blocks of a cache based on a combination empirical data and analytical equations. The observed maximum and median error between MODEST and HSPICE energy-estimates for 22,500 samples is around 7.8% and 0.5% respectively. As a case study, using MODEST, we propose a two step iterative optimization procedure involving Dual-Vth assignment and standby supply voltage minimization for reclaiming energy-constrained caches. The observed energy reduction is around 50.8% for the most-leaky Cache. A speed-up of 750X over conventional hard-coded implementation for such optimizations is achieved.
international parallel and distributed processing symposium | 2008
Nagarajan Venkateswaran; Vinoth Krishnan Elangovan; Karthik Ganesan; T. R. S. Sagar; S. Aananthakrishanan; S. Ramalingam; Shyamsundar Gopalakrishnan; Madhavan Manivannan; Deepak Srinivasan; Viswanath Krishnamurthy; Karthik Chandrasekar; Viswanath Venkatesan; Balaji Subramaniam; V. Sangkar; Aravind Vasudevan; Shrikanth Ganapathy; Sriram Murali; M. Thyagarajan
In this paper we present a novel cluster paradigm and silicon operating system. Our approach in developing the competent cluster design revolves around an execution model to aid the execution of multiple independent applications simultaneously on the cluster, leading to cost sharing across applications. The execution model should envisage simultaneous execution of multiple applications (running traces of multiple independent applications in the same node at an instant, without time sharing) and on all the partitionsf nodes) of a single cluster, without sacrificing the performance of individual application, unlike in the current cluster models. Performance scalability is achieved as we increase the number of nodes, the problem size of the individual independent applications, due to non-dependency across applications and hence increase in the number of non-dependent operations (as the problem sizes of the applications get increased) and this leads to better utilization of the unused resources within the node. This execution model is very much dependent on the node architecture for performance scalability. This would be a major initiative towards achieving performance cost-effective supercomputing.
ieee convention of electrical and electronics engineers in israel | 2014
Shrikanth Ganapathy; Georgios Karakonstantis; Ramon Canal; Andreas Burg
With scaling of process technologies and worsening of process variations, embedded memories are susceptible to a large number of failure mechanisms making it hard to achieve high yield. In this paper, by bringing together architecture and circuit-level exploration tools, we analyse the impact of process variations on static random access memory (SRAM) cell stability and determine the impact of SRAM failures on memory functional yield. We then detail the importance of repair mechanisms such as error correcting codes (ECC) and redundancy on improving yield subject to constraints set on power and area. Finally, we show that a design paradigm orthogonal to traditional repair mechanisms involving redefinition of the yield criterion by accepting memories with failures is a promising candidate for improving yield without incurring additional overheads.