Swaroop Ghosh | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Swaroop Ghosh is active.

Explore More

Publication

Featured researches published by Swaroop Ghosh.

Proceedings of the IEEE | 2010

Parameter Variation Tolerance and Error Resiliency: New Design Paradigm for the Nanoscale Era

Swaroop Ghosh; Kaushik Roy

Variations in process parameters affect the operation of integrated circuits (ICs) and pose a significant threat to the continued scaling of transistor dimensions. Such parameter variations, however, tend to affect logic and memory circuits in different ways. In logic, this fluctuation in device geometries might prevent them from meeting timing and power constraints and degrade the parametric yield. Memories, on the other hand, experience stability failures on account of such variations. Process limitations are not exhibited as physical disparities only; transistors experience temporal device degradation as well. Such issues are expected to further worsen with technology scaling. Resolving the problems of traditional Si-based technologies by employing non-Si alternatives may not present a viable solution; the non-Si miniature devices are expected to suffer the ill-effects of process/temporal variations as well. To circumvent these nonidealities, there is a need to design ICs that can adapt themselves to operate correctly under the presence of such inconsistencies. In this paper, we first provide an overview of the process variations and time-dependent degradation mechanisms. Next, we discuss the emerging paradigm of variation-tolerant adaptive design for both logic and memories. Interestingly, these resiliency techniques transcend several design abstraction levels-we present circuit and microarchitectural techniques to perform reliable computations in an unreliable environment.

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems | 2007

CRISTA: A New Paradigm for Low-Power, Variation-Tolerant, and Adaptive Circuit Synthesis Using Critical Path Isolation

Swaroop Ghosh; Swarup Bhunia; Kaushik Roy

Design considerations for robustness with respect to variations and low-power operations typically impose contradictory design requirements. Low-power design techniques such as voltage scaling, dual- , etc., can have a large negative impact on parametric yield. In this paper, we propose a novel paradigm for low-power variation-tolerant circuit design called critical path isolation for timing adaptiveness (CRISTA), which allows aggressive voltage scaling. The principal idea includes the following: 1) isolate and predict the set of possible paths that may become critical under process variations; 2) ensure that they are activated rarely; and 3) avoid possible delay failures in the critical paths by dynamically switching to two-cycle operation (assuming all standard operations are single cycle), when they are activated. This allows us to operate the circuit at reduced supply voltage while achieving the required yield. Simulation results on a set of benchmark circuits with Berkeley-predictive-technology-model [BPTM 70 nm: Berkeley predictive technology model] 70-nm devices that show an average of 60% improvement in power with small overhead in performance and 18% overhead in die area compared to conventional design. We also present two applications of the proposed methodology that include the following: 1) pipeline design for low power and 2) temperature-adaptive circuit design.

international on line testing symposium | 2005

A novel on-chip delay measurement hardware for efficient speed-binning

Arijit Raychowdhury; Swaroop Ghosh; Kaushik Roy

With the aggressive scaling of the CMOS technology parametric variation of the transistor threshold voltage causes significant spread in the circuit delay as well as leakage spectrum. Consequently, speed binning of the high performance VLSI chips is essential and it costs significant amount of test application time. Further, the knowledge of the actual delay in the critical path of the circuit enables efficient use of typical low power methodologies e.g., voltage scaling, adaptive body biasing etc. In this paper, the authors have proposed a novel on-chip, low overhead and process tolerant delay measurement circuit which can estimate the critical path delay in a single clock period. This has the advantage of efficient on-chip speed binning.

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems | 2006

A Novel Delay Fault Testing Methodology Using Low-Overhead Built-In Delay Sensor

Swaroop Ghosh; Swarup Bhunia; Arijit Raychowdhury; Kaushik Roy

A novel integrated approach for delay-fault testing in external (automatic-test-equipment-based) and test-per-scan built-in self-test (BIST) using on-die delay sensing and test point insertion is proposed. A robust, low-overhead, and process-tolerant on-chip delay-sensing circuit is designed for this purpose. An algorithm is also developed to judiciously insert delay-sensor circuits at the internal nodes of logic blocks for improving delay-fault coverage with little or no impact on the critical-path delay. The proposed delay-fault testing approach is verified for transition- and segment-delay-fault models. Experimental results for external testing (BIST) show up to 31% (30%) improvement in fault coverage and up to 67.5% (85.5%) reduction in test length for transition faults. An increase in the number of robustly detectable critical-path segments of up to 54% and a reduction in test length for the segment-delay-fault model of up to 76% were also observed. The delay and area overhead due to insertion of the delay-sensing hardware have been limited to 2% and 4%, respectively

international electron devices meeting | 2011

Dynamic behavior of SRAM data retention and a novel transient voltage collapse technique for 0.6V 32nm LP SRAM

Yih Wang; Eric Karl; Mesut Meterelliyoz; Fatih Hamzaoglu; Yong-Gee Ng; Swaroop Ghosh; Liqiong Wei; Uddalak Bhattacharya; Kevin Zhang

A novel transient voltage collapse (TVC) technique is presented to enable low-voltage operation in SRAM. By dynamically switching off the PMOS during write operations with a collapsed supply voltage below the data retention voltage, a minimum operating voltage (Vccmin) of 0.6V is demonstrated in a 32nm 12-Mb low-power (LP) SRAM. Data retention failure of unselected cells is mitigated by controlling the duration of voltage collapse. Circuit-process co-optimization is critical to ensure robust circuit design margin of TVC technique.

design automation conference | 2014

Modeling and Analysis of Domain Wall Dynamics for Robust and Low-Power Embedded Memory

Anirudh Iyengar; Swaroop Ghosh

Non-volatile memories are gaining significant attention for embedded cache application due to low standby power and excellent retention. Domain wall memory (DWM) is one possible candidate due to its ability to store multiple bits/cell in order to break the density barrier. Additionally, it provides low standby power, fast access time, good endurance and good retention. In this paper, we provide a physics-based model of domain wall that comprehends process variations (PV) and Joule heating. The proposed model has been used for circuit simulation. We also propose techniques to mitigate the impact of variability and Joule heating while enabling low-power and high frequency operation.

IEEE Transactions on Very Large Scale Integration Systems | 2010

Trifecta: A Nonspeculative Scheme to Exploit Common, Data-Dependent Subcritical Paths

Patrick Ndai; Nauman Rafique; Mithuna Thottethodi; Swaroop Ghosh; Swarup Bhunia; Kaushik Roy

Pipelined processor cores are conventionally designed to accommodate the critical paths in the critical pipeline stage(s) in a single clock cycle, to ensure correctness. Such conservative design is wasteful in many cases since critical paths are rarely exercised. Thus, configuring the pipeline to operate correctly for rarely used critical paths targets the uncommon case instead of optimizing for the common case. In this study, we describe Trifecta-an architectural technique that completes common-case, subcritical path operations in a single cycle but uses two cycles when the critical path is exercised. This increases slack for both single-and two-cycle operations and offers a unique advantage under process variation. In contrast with existing mechanisms that trade power or performance for yield, Trifecta improves the yield while preserving performance and power. We applied this technique to the critical pipeline stages of a superscalar out-of-order (OoO) and a single issue in-order processor, namely instruction issue and execute, respectively. Our experiments show that the rare two-cycle operations result in a small decrease (5% for integer and 2% for floating-point benchmarks of SPEC2000) in instructions per cycle. However, the increased delay slack causes an improvement in yield-adjusted-throughput by 20% (12.7%) for an in-order (InO) processor configuration.

design, automation, and test in europe | 2008

A novel low overhead fault tolerant Kogge-Stone adder using adaptive clocking

Swaroop Ghosh; Patrick Ndai; Kaushik Roy

As the feature size of transistors gets smaller, fabricating them becomes challenging. Manufacturing process follows various corrective design-for-manufacturing (DFM) steps to avoid shorts/opens/bridges. However, it is not possible to completely eliminate the possibility of such defects. If spare units are not present to replace the defective parts, then such failures cause yield loss. In this paper, we present a fault tolerant technique to leverage the redundancy present in high speed regular circuits such as Kogge-Stone adder (KSA). Due to its regularity and speed, KSA is widely used in ALU design. In KSA, the carries are computed fast by computing them in parallel. Our technique is based on the fact that even and odd carries are mutually exclusive. Therefore, defect in even bit can only corrupt the even Sum outputs whereas the odd Sums are computed correctly (and vice versa). To efficiently utilize the above property of KSA in presence of defects, we perform addition in two-clock cycles. In cycle-1, one of the correct set of bits (even or odd) are computed and stored at output registers. In cycle-2, the operands are shifted by one bit and the remaining sets of bits (odd or even) are computed and stored. This allows us to tolerate the defect at the cost of throughput degradation while maintaining high frequency and yield. The proposed technique can tolerate any number of faults as long as they are confined to either even or odd bits (but not in both). Further, this technique is applicable for any type of fault model (stuck-at, bridging, complete opens/shorts). We performed simulations on 64-bit KSA using 180 nm devices. The results indicate that the proposed technique incur less that 1 % area overhead. Note that there is very little throughput degradation (<0.3%) for the fault-free adders. The proposed technique utilizes the existing scan flip-flops for storage and shifting operation to minimize the area/performance overhead. Finally, the proposed technique is used in a superscalar processor, whereby the faulty adder is assigned lower priority than fault-free adders to reduce the overall throughput degradation. Experiments performed using Simplescalar for a superscalar pipeline (with four integer adders) show throughput degradation of 0.5% in the presence of a single defective adder.

international solid-state circuits conference | 2014

13.1 A 1Gb 2GHz embedded DRAM in 22nm tri-gate CMOS technology

Fatih Hamzaoglu; Umut Arslan; Nabhendra Bisnik; Swaroop Ghosh; Manoj B. Lal; Nick Lindert; Mesut Meterelliyoz; Randy B. Osborne; Joodong Park; Shigeki Tomishima; Yih Wang; Kevin Zhang

CMOS technology scaling continues to drive higher levels of integration in VLSI design, which adds more compute engines on a die. To meet the overall performance-scaling needs, high-speed and high-bandwidth memory is becoming increasingly important. Conventional VLSI systems often rely on on-die SRAMs to address the performance gap between CPU and main memory, DRAM. However, with the rapid growth in capacity needs for high-performance memory, SRAM is not always sufficient to meet the demands of bandwidth-intense applications. Embedded DRAM (eDRAM) has been explored as an alternative to satisfy the high-performance and density needs in memory [1-3]. In this paper, a high-performance eDRAM based on a 22nm tri-gate CMOS technology is introduced. This eDRAM technology enables the integration of an eDRAM cell into the logic technology platform [4]. The design features a well-balanced configuration to achieve both optimal array efficiency and bandwidth. By leveraging the high-performance and low-voltage tri-gate transistor at 22nm generation, the eDRAM achieves a wide range in operating voltage, from 1.1V down to 0.7V, which is essential for low-power logic applications.

international conference on computer aided design | 2006

A new paradigm for low-power, variation-tolerant circuit synthesis using critical path isolation

Swaroop Ghosh; Swarup Bhunia; Kaushik Roy

Design considerations for robustness with respect to variations and low power operations typically impose contradictory design requirements. Low power design techniques such as voltage scaling, dual-Vth etc. can have a large negative impact on parametric yield. In this paper, we propose a novel paradigm for low-power variation-tolerant circuit design, which allows aggressive voltage scaling. The principal idea is to (a) isolate and predict the set of possible paths that may become critical under process variations, (b) ensure that they are activated rarely, and (c) avoid possible delay failures in the critical paths by dynamically switching to two-cycle operation (assuming all standard operations are single cycle), when they are activated. This allows us to operate the circuit at reduced supply voltage while achieving the required yield. Simulation results on a set of benchmark circuits at 70nm process technology show average power reduction of 60% with less than 10% performance overhead and 18% overhead in die-area compared to conventional synthesis. Application of the proposed methodology to pipelined design is also investigated

Explore More