2021 Design, Automation & Test in Europe Conference & Exhibition (DATE) | 2021
OR-ML: Enhancing Reliability for Machine Learning Accelerator with Opportunistic Redundancy
Reliability plays a central role in deep sub-micron and nanometre IC fabrication technology and has recently been reported to be one of the key issues affecting the inference phase of neural networks. State-of-the-art machine learning (ML) accelerators exploit massively computing parallelism observed in neural networks to achieve high energy efficiency. The topology of ML engines computing fabric, which constitutes large arrays of processing elements (PEs), has been increasing dramatically to incorporate the huge size and heterogeneity of the rapid evolving ML algorithm. However, it is commonly observed that activations of zero value lead to reduced PE utilization. In this work, we present a novel and low-cost approach to enhance the reliability of generic ML accelerators by Qpportunistically exploring the chances of runtime Redundancy provided by neighbouring PEs, named as OR-ML. In contrast to conventional redundancy techniques, the proposed technique introduces no additional computing resources, therefore significantly reduces the implementation overhead and achieves obvious level of protection. The design prototype is evaluated using emulated fault injection on FPGA, executing mainstream neural networks for objectionclassification and detection.