Abbas BanaiyanMofrad | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Abbas BanaiyanMofrad is active.

Explore More

Publication

Featured researches published by Abbas BanaiyanMofrad.

compilers, architecture, and synthesis for embedded systems | 2011

FFT-cache: a flexible fault-tolerant cache architecture for ultra low voltage operation

Abbas BanaiyanMofrad; Houman Homayoun; Nikil D. Dutt

Caches are known to consume a large part of total microprocessor power. Traditionally, voltage scaling has been used to reduce both dynamic and leakage power in caches. However, aggressive voltage reduction causes process-variation-induced failures in cache SRAM arrays, which compromise cache reliability. In this paper, we propose Flexible Fault-Tolerant Cache (FFT-Cache) that uses a flexible defect map to configure its architecture to achieve significant reduction in energy consumption through aggressive voltage scaling, while maintaining high error reliability. FFT-Cache uses a portion of faulty cache blocks as redundancy - using block-level or line-level replication within or between sets - to tolerate other faulty caches lines and blocks. Our configuration algorithm categorizes the cache lines based on degree of conflict of their blocks to reduce the granularity of redundancy replacement. FFT-Cache thereby sacrifices a minimal number of cache lines to avoid impacting performance while tolerating the maximum amount of defects. Our experimental results on SPEC2K benchmarks demonstrate that the operational voltage can be reduced down to 375mV, which achieves up to 80% reduction in dynamic power and up to 48% reduction in leakage power with small performance impact and area overhead.

design automation conference | 2014

Power / Capacity Scaling: Energy Savings With Simple Fault-Tolerant Caches

Mark Gottscho; Abbas BanaiyanMofrad; Nikil D. Dutt; Alex Nicolau; Puneet Gupta

Complicated approaches to fault-tolerant voltage-scalable (FTVS) SRAM cache architectures can suffer from high overheads. We propose static (SPCS) and dynamic (DPCS) variants of power/capacity scaling, a simple and low-overhead fault-tolerant cache architecture that utilizes insights gained from our 45nm SOI test chip. Our mechanism combines multi-level voltage scaling with power gating of blocks that become faulty at each voltage level. The SPCS policy sets the runtime cache VDD statically such that almost all of the cache blocks are not faulty. The DPCS policy opportunistically reduces the voltage further to save more power than SPCS while limiting the impact on performance caused by additional faulty blocks. Through an analytical evaluation, we show that our approach can achieve lower static power for all effective cache capacities than a recent complex FTVS work. This is due to significantly lower overheads, despite the failure of our approach to match the min-VDD of the competing work at fixed yield. Through architectural simulations, we find that the average energy saved by SPCS is 55%, while DPCS saves an average of 69% of energy with respect to baseline caches at 1 V. Our approach incurs no more than 4% performance and 5% area penalties in the worst case cache configuration.

IEEE Embedded Systems Letters | 2015

Exploiting Partially-Forgetful Memories for Approximate Computing

Majid Shoushtari; Abbas BanaiyanMofrad; Nikil D. Dutt

While the memory subsystem is already a major contributor to energy consumption of embedded systems, the guard-banding required for masking the effects of ever increasing manufacturing variations in memories imposes even more energy overhead. In this letter, we explore how partially-forgetful memories can be used by exploiting the intrinsic tolerance of a vast class of applications to some level of error for relaxing this guard-banding in memories. We discuss the challenges to be addressed and introduce relaxed cache as an exemplar to address these challenges for partially-forgetful SRAM caches. Preliminary results show how adapting guard-bands to application characteristics can help the system save significant amount of cache leakage energy (up to 74%) while still generating acceptable quality results.

design automation conference | 2014

Multi-Layer Memory Resiliency

Nikil D. Dutt; Puneet Gupta; Alex Nicolau; Abbas BanaiyanMofrad; Mark Gottscho; Majid Shoushtari

With memories continuing to dominate the area, power, cost and performance of a design, there is a critical need to provision reliable, high-performance memory bandwidth for emerging applications. Memories are susceptible to degradation and failures from a wide range of manufacturing, operational and environmental effects, requiring a multi-layer hardware/software approach that can tolerate, adapt and even opportunistically exploit such effects. The overall memory hierarchy is also highly vulnerable to the adverse effects of variability and operational stress. After reviewing the major memory degradation and failure modes, this paper describes the challenges for dependability across the memory hierarchy, and outlines research efforts to achieve multi-layer memory resilience using a hardware/software approach. Two specific exemplars are used to illustrate multi-layer memory resilience: first we describe static and dynamic policies to achieve energy savings in caches using aggressive voltage scaling combined with disabling faulty blocks; and second we show how software characteristics can be exposed to the architecture in order to mitigate the aging of large register files in GPGPUs. These approaches can further benefit from semantic retention of application intent to enhance memory dependability across multiple abstraction levels, including applications, compilers, run-time systems, and hardware platforms.

2013 International Green Computing Conference Proceedings | 2013

REMEDIATE: A scalable fault-tolerant architecture for low-power NUCA cache in tiled CMPs

Abbas BanaiyanMofrad; Houam Homayoun; Vasileios Kontorinis; Dean M. Tullsen; Nikil D. Dutt

Technology scaling and process variation severely degrade the reliability of Chip Multiprocessors (CMPs), especially their large cache blocks. To improve cache reliability, we propose REMEDIATE, a scalable fault-tolerant architecture for low-power design of shared Non-Uniform Cache Access (NUCA) cache in Tiled CMPs. REMEDIATE achieves fault-tolerance through redundancy from multiple banks to maximize the amount of fault remapping, and minimize the amount of capacity lost in the cache when the failure rate is high. REMEDIATE leverages a scalable fault protection technique using two different remapping heuristics in a distributed shared cache architecture with non-uniform latencies. We deploy a graph coloring algorithm to optimize REMEDIATEs remapping configuration. We perform an extensive design space exploration of operating voltage, performance, and power that enables designers to select different operating points and evaluate their design efficacy. Experimental results on a 4×4 tiled CMP system voltage scaled to below 400mV show that REMEDIATE saves up to 50% power while recovering more than 80% of the faulty cache area with only modest performance degradation.

international conference on hardware/software codesign and system synthesis | 2012

A novel NoC-based design for fault-tolerance of last-level caches in CMPs

Abbas BanaiyanMofrad; Gustavo Girão; Nikil D. Dutt

Advances in technology scaling, coupled with aggressive voltage scaling results in significant reliability challenges for emerging Chip Multiprocessor (CMP) platforms, where error-prone caches continue to dominate the chip area. Network-on-Chip (NoC) fabrics are increasingly used to manage the scalability of these CMPs. We present a novel fault-tolerant scheme for Last Level Cache (LLC) in CMP architectures that leverages the interconnection network to protect the LLC cache banks against permanent faults. During a LLC access to a faulty area, the network detects and corrects the faults, returning the fault-free data to the requesting core. By leveraging the NoC interconnection fabric, we can implement any cache fault-tolerant scheme in an efficient, modular, and scalable manner. We perform extensive design space exploration on NoC benchmarks to demonstrate the utility and efficacy of our approach. The overheads of leveraging the NoC fabric are minimal: on an 8-core, 16-cache-bank CMP we demonstrate reliable access to LLCs with additional overheads of less than 3% in area and less than 7% in power.

ACM Transactions on Architecture and Code Optimization | 2015

DPCS: Dynamic Power/Capacity Scaling for SRAM Caches in the Nanoscale Era

Mark Gottscho; Abbas BanaiyanMofrad; Nikil D. Dutt; Alex Nicolau; Puneet Gupta

Fault-Tolerant Voltage-Scalable (FTVS) SRAM cache architectures are a promising approach to improve energy efficiency of memories in the presence of nanoscale process variation. Complex FTVS schemes are commonly proposed to achieve very low minimum supply voltages, but these can suffer from high overheads and thus do not always offer the best power/capacity trade-offs. We observe on our 45nm test chips that the “fault inclusion property” can enable lightweight fault maps that support multiple runtime supply voltages. Based on this observation, we propose a simple and low-overhead FTVS cache architecture for power/capacity scaling. Our mechanism combines multilevel voltage scaling with optional architectural support for power gating of blocks as they become faulty at low voltages. A static (SPCS) policy sets the runtime cache VDD once such that a only a few cache blocks may be faulty in order to minimize the impact on performance. We describe a Static Power/Capacity Scaling (SPCS) policy and two alternate Dynamic Power/Capacity Scaling (DPCS) policies that opportunistically reduce the cache voltage even further for more energy savings. This architecture achieves lower static power for all effective cache capacities than a recent more complex FTVS scheme. This is due to significantly lower overheads, despite the inability of our approach to match the min-VDD of the competing work at a fixed target yield. Over a set of SPEC CPU2006 benchmarks on two system configurations, the average total cache (system) energy saved by SPCS is 62% (22%), while the two DPCS policies achieve roughly similar energy reduction, around 79% (26%). On average, the DPCS approaches incur 2.24% performance and 6% area penalties.

design, automation, and test in europe | 2013

Modeling and analysis of fault-tolerant distributed memories for networks-on-chip

Abbas BanaiyanMofrad; Nikil D. Dutt; Gustavo Girão

Advances in technology scaling increasingly make Network-on-Chips (NoCs) more susceptible to failures that cause various reliability challenges. With increasing area occupied by different on-chip memories, strategies for maintaining fault-tolerance of distributed on-chip memories become a major design challenge. We propose a system-level design methodology for scalable fault-tolerance of distributed on-chip memories in NoCs. We introduce a novel reliability clustering model for fault-tolerance analysis and shared redundancy management of on-chip memory blocks. We perform extensive design space exploration applying the proposed reliability clustering on a block-redundancy fault-tolerant scheme to evaluate the tradeoffs between reliability, performance, and overheads. Evaluations on a 64-core chip multiprocessor (CMP) with an 8x8 mesh NoC show that distinct strategies of our case study may yield up to 20% improvements in performance gains and 25% improvement in energy savings across different benchmarks, and uncover interesting design configurations.

european test symposium | 2015

Protecting caches against multi-bit errors using embedded erasure coding

Abbas BanaiyanMofrad; Mojtaba Ebrahimi; Fabian Oboril; Mehdi Baradaran Tahoori; Nikil D. Dutt

Technology scaling advancement coupled with operational and environmental effects make embedded memories more vulnerable to both manufacturing and transient errors including multi-bit upsets. Conventional error correcting codes incur high latency, area, and power overheads to correct multi-bit errors. In this paper, we propose Embedded Erasure Coding (EEC), a low-cost technique that can correct multi-bit errors with low overheads. This technique employs interleaved parity bits to provide a fast and low-cost multi-bit error detection. Using the erasure coding concept, the error correction is done by reconstructing the contents of the erroneous cache blocks within each cache set. Our proposed technique trades the performance for higher reliability by reserving a part of the cache (e.g. one way) to store the erasure codes. Our simulation results show that EEC provides high reliability (100% error detection and correction) with lower area overhead as compared to other state-of-the-art techniques while imposing negligible performance overhead (3%).

ACM Transactions in Embedded Computing Systems | 2014

NoC-based fault-tolerant cache design in chip multiprocessors

Abbas BanaiyanMofrad; Gustavo Girão; Nikil D. Dutt

Advances in technology scaling increasingly make emerging Chip MultiProcessor (CMP) platforms more susceptible to failures that cause various reliability challenges. In such platforms, error-prone on-chip memories (caches) continue to dominate the chip area. Also, Network-on-Chip (NoC) fabrics are increasingly used to manage the scalability of these architectures. We present a novel solution for efficient implementation of fault-tolerant design of Last-Level Cache (LLC) in CMP architectures. The proposed approach leverages the interconnection network fabric to protect the LLC cache banks against permanent faults in an efficient and scalable way. During an LLC access to a faulty block, the network detects and corrects the faults, returning the fault-free data to the requesting core. Leveraging the NoC interconnection fabric, designers can implement any cache fault-tolerant scheme in an efficient, modular, and scalable manner for emerging multicore/manycore platforms. We propose four different policies for implementing a remapping-based fault-tolerant scheme leveraging the NoC fabric in different settings. The proposed policies enable design trade-offs between NoC traffic (packets sent through the network) and the intrinsic parallelism of these communication mechanisms, allowing designers to tune the system based on design constraints. We perform an extensive design space exploration on NoC benchmarks to demonstrate the usability and efficacy of our approach. In addition, we perform sensitivity analysis to observe the behavior of various policies in reaction to improvements in the NoC architecture. The overheads of leveraging the NoC fabric are minimal: on an 8-core, 16-cache-bank CMP we demonstrate reliable access to LLCs with additional overheads of less than 3% in area and less than 7% in power.

Explore More