Embracing the Unreliability of Memory Devices for Neuromorphic Computing
Marc Bocquet, Tifenn Hirtzlin, Jacques-Olivier Klein, Etienne Nowak, Elisa Vianello, Jean-Michel Portal, Damien Querlioz
EEmbracing the Unreliability of Memory Devicesfor Neuromorphic Computing
Marc Bocquet ∗ , Tifenn Hirtzlin † , Jacques-Olivier Klein † , Etienne Nowak ‡ ,Elisa Vianello ‡ , Jean-Michel Portal ∗ and Damien Querlioz †∗ IM2NP, Univ. Aix-Marseille et Toulon, CNRS, France. † Universit´e Paris-Saclay, CNRS, C2N, 91120 Palaiseau, France. Email: [email protected] ‡ Universit´e Grenoble-Alpes, CEA, LETI, Grenoble, France.
Invited Paper
Abstract —The emergence of resistive non-volatile memoriesopens the way to highly energy-efficient computation near- orin-memory. However, this type of computation is not compatiblewith conventional ECC, and has to deal with device unreliability.Inspired by the architecture of animal brains, we present amanufactured differential hybrid CMOS/RRAM memory archi-tecture suitable for neural network implementation that functionswithout formal ECC. We also show that using low-energybut error-prone programming conditions only slightly reducesnetwork accuracy.
I. I
NTRODUCTION
Emerging nonvolatile memory technologies such as resis-tive, phase change and spin torque magnetoresistive memoriesoffer considerable opportunities to advance microelectronics,as these memories are faster than flash memories, whilebeing compact and compatible with the integration in thebackend-of-line of modern CMOS processes [1], [2]. However,although these technologies are usually more reliable thanflash memories, they remain considerably less reliable thanvolatile charge-based random access memories. Strategies forreducing errors due to device variation and limited enduranceinvolve costly materials and technology developments [3],energy-consuming special programming strategies [4], andquite universally, the reliance on advanced multiple errorcorrecting codes (ECC) [1], [5], requiring large area andenergy hungry decoding circuitry [6].The existence of errors in emerging memories is also asevere limitation for the development of in or near-memorycomputing schemes, which aim at achieving highly energy effi-cient computation by eliminating the von Neumann bottleneck[1], [7]. In or near-memory computing schemes are indeedhardly compatible with ECC, as computation is performedwith multiple row selection or in the sensing circuit [8], [9].These constrains are in sharp contrast with animal brains,which function with vastly unreliable, redundant, memorydevices (synapses) without using formal error correction [10],[11].In this work, we show through an example that in computingarchitectures inspired by brains (neuromorphic architectures),memory device variability can to a large extent be ignored, and
This work was supported by the ERC Grant NANOINFER (715872)and the ANR grant NEURONIC (ANR-18-CE24-0009). even embraced, and that this attitude can provide importantbenefits. We first present a differential memory architectureoptimized for the ECC-less in-memory implementation ofbiarized neural networks. We show based on experimentalmeasurements on a fabricated CMOS/RRAM hybrid chip andon network simulations that this architecture can mostly ignoredevice variation, and investigate the benefits of acceptingerrors. Based on a modeling study, we show that the samemethodology could be transferred to MRAM.II. A N I N -M EMORY C OMPUTING M EMORY B LOCK T HAT F UNCTIONS W ITH E RROR -P RONE D EVICESFig. 1. (a) Simplified schematic of our in-memory computing hybridCMOS/RRAM test chip. (b) Electron microscopy image of an RRAM cellintegrated in the backend-of-line of a 130 nm commercial CMOS technology.(c) Photography of the die.
In this work, we propose the use of a memory architecturewhere each bit is stored in a two-transistor/two-resistor (2T2R)cell. We implemented a kilobit version (2,048 devices) of thisarchitecture in a 130 nm CMOS technology, with hafniumoxide-based RRAM fully embedded in the backend-of-line(Fig. 1). This test chip was initially introduced in [8], [9]. Bitsare stored in a differential fashion between the two devicesto reduce errors. Doing so, during the read phase, a high a r X i v : . [ c s . ET ] J u l ig. 2. Circuit of the precharge sense amplifier (PCSA) used in the test chipof Fig. 1. (a) Standard version, (b) version augmented with XNOR operation,initially proposed in [12].Fig. 3. (a) Bit error rates (BER) measured using the PCSAs of the testchip as a function of 1T1R BER in the same conditions. (b) For comparison,improvements of BER obtained using standard Single Error Correction DoubleError Detection (SECDED) ECC. Figure adapted from [9]. resistive state (HRS) is always compared to a low resistivestate (LRS), doubling the memory read window with regardsto the conventional comparison to a reference value betweenHRS and LRS, as is used in one-transistor/one-resistor (1T1R)architectures [8]. This differential read scheme is operated byon-chip precharge sense amplifiers (PCSA), whose circuit ispresented in Fig. 2(a). These sense amplifiers can also beaugmented to directly perform logic operations during readoperations [12]. An example where a PCSA has been aug-mented to perform exclusive NOR (XNOR) operation is shownin Fig. 2(b). Such in-memory computing augmentations, whileapproaching logic and memory, make our system incompatiblewith conventional ECC scheme.Extensive experimental measurements on our test chipshowed that the 2T2R strategy indeed reduces bit errors whencompared to the classical 1T1R approach. RRAM deviceserror rate is directly linked to the current used during theprogramming operations, offering a knob of error rate tuningdepending on the application requirements. Fig. 3(a) compiles statistical measurements on the fabricated test chip, takenwith diverse programming currents, allowing evaluating the biterror rates (BER) benefits of the 2T2R approach in differentconditions. It is apparent in this Figure that the 2T2R strategyalways reduces the amount of bit errors, with the highestbenefits seen at lower BERs. The detailed methodology forobtaining Fig. 3(a) is presented in [9].Quite interestingly the error reduction benefits of the 2T2Rapproach are similar to the one of a Single Error Correct-ing Double Error Detecting ECC (SECDED, or extendedHamming), but without the high peripheral circuit overheadrequired by this ECC [6], and associated read performancedegradation (Fig. 3(b)). Moreover, this result is obtainedconsidering the same memory capacity (2T2R without ECCversus 1T1R plus extra bit for correction code storage).III. B ENEFITS AT THE N ETWORK L EVELFig. 4. Schematization of a full digital system implementing a BinarizedNeural Networks using in-memory computing blocks of Fig. 1.
Binarized Neural Networks (BNNs) [14], or the highlysimilar XNOR-NETs [15], are a recently proposed type ofneural network, where synaptic weights and neuron states cantake only binary values (meaning and − ) during inference(whereas these parameters assume real values in conventionalneural networks). Therefore, the equation for the activation A of a neuron in a conventional neural network A = f (cid:32)(cid:88) i W i X i (cid:33) , (1) ig. 5. Impact of the BER of memories on applications of BinarizedNeural Network: handwritten digit recognition (MNIST), image recognition(CIFAR-10, ImageNet TOP-1 and TOP-5). Details about the neural networkarchitectures are provided in [9]. < 1110 P r o g r a mm i n g f a il u r e ( - ) I C // // V appReset (V) t Pulse (s) 28µA55µA 200µA
Fig. 6. Number of errors on a one kilobit array using the 2T2R strategy(with PCSA) for different programming conditions (compliance current I C ,RESET voltage V appReset , and programmming pulse duration t pulse ). Errorbars represent the minimum and the maximum number of errors over five trialsof the experiment. Figure adapted from [9]. (where X i are inputs of the neuron, W i the synaptic weightsand f its nonlinear activation function) simplifies into A = sign (cid:16) POPCOUNT i ( XN OR ( W i , X i )) − T (cid:17) . (2) POPCOUNT is an integer function that counts the numberones. sign is the sign function, and T is the threshold ofthe neuron, obtained during training by the use of the batch-normalization technique [16].BNNs can achieve surprisingly high accuracy in vision[15], [17] or signal-processing [18] tasks. BNNs have highlyreduced memory requirements with regards to real neuralnetworks, and have the added benefit of not requiring anymultiplication, as this operation is replaced by XNOR logicoperations. These advantages make BNNs outstanding candi-dates for in-memory computing [19]–[26]. Fig. 7. Mean programming energy (per bit) of RRAM cells in the (a) SETand (b) RESET processes for the programming conditions shown in Fig. 6.Fig. 8. Endurance measurement for two devices (bit line BL and bit line barBLb), programmed in weak conditions ( V appReset = 1 . V , I C = 200 µA , t pulse = 1 µs ). Figure adapted from [9]. The architecture of Fig. 1 is particularly adapted for theECC-less implementation of such neural networks. For ex-ample, Fig. 4 shows a full system using memory circuits ofFig. 1 to implement a BNN. The architecture uses the senseamplifier of Fig. 2(b) [12] to implement XNOR operationsdirectly in each memory circuit during the read phase, whereasthe
POPCOUNT operation, as well as neuron activation areperformed on foot of array columns using fully digital circuits.Refs. [9], [27] describe this architecture in detail, as well assome its variations, and show that this architecture features ig. 9. Accuracy on the CIFAR-10 image recognition task of a 28nm-technology MRAM based Binarized Neural Network, as a function of MRAMprogramming energy (varying programming time). Computed using the modelof [13], considering or ignoring MOSFET and magnetic tunnel junction devicevariation. outstanding energy-efficiency properties.We now evaluate the impact of errors in memories inthis architecture. Fig. 5 shows simulations of the architectureprogrammed to perform several tasks: the classic MNISThandwritten digit recognition task [28], the CIFAR-10 imageclassification task [29], and the challenging ImageNet clas-sification task, which consists in classifying high-resolutionimages into 1,000 classes [30]. The detailed architecture ofthe BNNs used on these three tasks in presented in [9]. Allthese tasks were simulated with various bit error rates on thememory devices. Quite astonishingly, we see on all these threetasks that bit error rate as high as − can be tolerated withlittle consequence on the accuracy of the implemented neu-ral network. This highlights that when implementing BNNs,memory perfection is far from being required. Some dedicatedtraining strategies could enhance this error tolerance evenfurther [31].The combination of the fact that the 2T2R approach allowsreducing the amount of bit errors, and that the BNN appli-cation features inherent tolerance to bit errors has importantconsequences in practice. It allows us to use RRAM devicesin regimes where they are extremely unreliable. This canprovide important energy savings: we can use devices withvery weak programming conditions (low current and voltages,short programming time), where they feature high amountsof bit errors. Figs. 6 and 7 show statistical measurements ofour test chip in various conditions, and highlight the energybenefits of accepting more errors. Finally, operating devices inhigh BER regimes allows using conditions where they featureoutstanding endurance. Fig. 8 for example shows endurancemeasurements of two devices programmed with low RESETvoltages ( . V ). An endurance of more than cycles isseen, which is particularly high for such technology. This typeof high cyclability opens the way to the possibility of trainingneural networks on chip, as seen in the results reported in[32]. A more detailed analysis of the energy benefits (which can reach a factor ten) of embracing bit errors in RRAM-basedBNNs, and of the associated endurance benefits, is presentedin [9].The strategy reported in this work is not limited to RRAM,and can be applied to other types of memories. Fig. 9 shows,based on neural network simulation, the energy that could besaved by varying the programming time of 28 nm Spin TorqueMagnetoresistive RAM (ST-MRAM) using the same approachas the one presented here. We see that high energy savings canbe achieved. The methodology and model for obtaining theseresults are presented in [13].IV. C ONCLUSION
Digital computing usually assumes and requires perfectionin the memory bits, and this accuracy comes at importantcosts in terms of area and energy consumption. In contrast,neuromorphic circuits, including fundamentally digital onessuch as binarized neural networks can get away with imperfectmemory cells. In this work, we use a differential approach toreduce errors and to be compatible with in or near-memorycomputing. This differential coding, in combination with theinherent tolerance of neural network, shows that it is possibleon one side to embrace memories as “non ideal” withoutnoticeable impact on neural network accuracy, and on the otherside to get important benefits in terms of tuning of operatingconditions (endurance, energy), opening the way to on-chiplearning. R
EFERENCES[1] D. Ielmini and H.-S. P. Wong, “In-memory computing with resistiveswitching devices,”
Nature Electronics , vol. 1, no. 6, p. 333, 2018.[2] S. Yu, “Neuro-inspired computing with emerging nonvolatile memorys,”
Proc. IEEE , vol. 106, no. 2, pp. 260–285, 2018.[3] F. M. Bayat, M. Prezioso, B. Chakrabarti, H. Nili, I. Kataeva, andD. Strukov, “Implementation of multilayer perceptron network withhighly uniform passive memristive crossbar circuits,”
Nature commu-nications , vol. 9, no. 1, pp. 1–7, 2018.[4] M.-F. Chang, J.-J. Wu, T.-F. Chien, Y.-C. Liu, T.-C. Yang, W.-C. Shen,Y.-C. King, C.-J. Lin, K.-F. Lin, Y.-D. Chih et al. , “19.4 embedded1mb reram in 28nm cmos with 0.27-to-1v read using swing-sample-and-couple sense amplifier and self-boost-write-termination scheme,” in
Proc. ISSCC . IEEE, 2014, pp. 332–333.[5] O. Golonzka, J.-G. Alzate, U. Arslan, M. Bohr, P. Bai, J. Brockman,B. Buford, C. Connor, N. Das, B. Doyle et al. , “Mram as embeddednon-volatile memory solution for 22ffl finfet technology,” in . IEEE, 2018, pp. 18–1.[6] S. Gregori, A. Cabrini, O. Khouri, and G. Torelli, “On-chip errorcorrecting techniques for new-generation flash memories,”
Proc. IEEE ,vol. 91, no. 4, pp. 602–616, 2003.[7] Editorial, “Big data needs a hardware revolution,”
Nature , vol. 554, no.7691, p. 145, Feb. 2018.[8] M. Bocquet, T. Hirztlin, J.-O. Klein, E. Nowak, E. Vianello, J.-M.Portal, and D. Querlioz, “In-memory and error-immune differential rramimplementation of binarized deep neural networks,” in
IEDM Tech. Dig.
IEEE, 2018, p. 20.6.1.[9] T. Hirtzlin, M. Bocquet, B. Penkovsky, J.-O. Klein, E. Nowak,E. Vianello, J.-M. Portal, and D. Querlioz, “Digital biologically plausibleimplementation of binarized neural networks with differential hafniumoxide resistive memory arrays,”
Frontiers in Neuroscience , vol. 13, p.1383, 2020.[10] A. A. Faisal, L. P. Selen, and D. M. Wolpert, “Noise in the nervoussystem,”
Nature reviews neuroscience , vol. 9, no. 4, p. 292, 2008.[11] K. Klemm and S. Bornholdt, “Topology of biological networks andreliability of information processing,”
Proceedings of the NationalAcademy of Sciences , vol. 102, no. 51, pp. 18 414–18 419, 2005.12] W. Zhao et al. , “Synchronous non-volatile logic gate design basedon resistive switching memories,”
IEEE Transactions on Circuits andSystems I: Regular Papers , vol. 61, no. 2, pp. 443–454, 2014.[13] T. Hirtzlin, B. Penkovsky, J.-O. Klein, N. Locatelli, A. F. Vincent,M. Bocquet, J.-M. Portal, and D. Querlioz, “Implementing binarizedneural networks with magnetoresistive ram without error correction,” arXiv preprint arXiv:1908.04085 , 2019.[14] M. Courbariaux, I. Hubara, D. Soudry, R. El-Yaniv, and Y. Ben-gio, “Binarized neural networks: Training deep neural networks withweights and activations constrained to+ 1 or-1,” arXiv preprintarXiv:1602.02830 , 2016.[15] M. Rastegari, V. Ordonez, J. Redmon, and A. Farhadi, “Xnor-net:Imagenet classification using binary convolutional neural networks,” in
Proc. ECCV . Springer, 2016, pp. 525–542.[16] S. Ioffe and C. Szegedy, “Batch normalization: Accelerating deepnetwork training by reducing internal covariate shift,” arXiv preprintarXiv:1502.03167 , 2015.[17] X. Lin, C. Zhao, and W. Pan, “Towards accurate binary convolutionalneural network,” in
Advances in Neural Information Processing Systems ,2017, pp. 345–353.[18] B. Penkovsky, M. Bocquet, T. Hirtzlin, J.-O. Klein, E. Nowak,E. Vianello, J.-M. Portal, and D. Querlioz, “In-memory resistive ramimplementation of binarized neural networks for medical applications,”in
Design, Automation and Test in Europe Conference (DATE) , 2020.[19] S. Yu, Z. Li, P.-Y. Chen, H. Wu, B. Gao, D. Wang, W. Wu, and H. Qian,“Binary neural network with 16 mb rram macro chip for classificationand online training,” in
IEDM Tech. Dig.
IEEE, 2016, pp. 16–2.[20] E. Giacomin, T. Greenberg-Toledo, S. Kvatinsky, and P.-E. Gaillardon,“A robust digital rram-based convolutional block for low-power imageprocessing and learning applications,”
IEEE Transactions on Circuitsand Systems I: Regular Papers , vol. 66, no. 2, pp. 643–654, 2019.[21] X. Sun, X. Peng, P.-Y. Chen, R. Liu, J.-s. Seo, and S. Yu, “Fully parallelrram synaptic array for implementing binary neural network with (+ 1,-1) weights and (+ 1, 0) neurons,” in
Proc. ASP-DAC . IEEE Press,2018, pp. 574–579.[22] X. Sun, S. Yin, X. Peng, R. Liu, J.-s. Seo, and S. Yu, “Xnor-rram:A scalable and parallel resistive synaptic architecture for binary neuralnetworks,” algorithms , vol. 2, p. 3, 2018.[23] T. Tang, L. Xia, B. Li, Y. Wang, and H. Yang, “Binary convolutionalneural network on rram,” in
Proc. ASP-DAC . IEEE, 2017, pp. 782–787.[24] Z. Zhou, P. Huang, Y. Xiang, W. Shen, Y. Zhao, Y. Feng, B. Gao, H. Wu,H. Qian, L. Liu et al. , “A new hardware implementation approach ofbnns based on nonlinear 2t2r synaptic cell,” in . IEEE, 2018, pp. 20–7.[25] D. Bankman, L. Yang, B. Moons, M. Verhelst, and B. Murmann, “Analways-on 3.8 µ j/86% cifar-10 mixed-signal binary cnn processor with allmemory on chip in 28-nm cmos,” IEEE Journal of Solid-State Circuits ,vol. 54, no. 1, pp. 158–172, 2018.[26] C.-C. Chang, M.-H. Wu, J.-W. Lin, C.-H. Li, V. Parmar, H.-Y. Lee, J.-H.Wei, S.-S. Sheu, M. Suri, T.-S. Chang et al. , “Nv-bnn: An accurate deepconvolutional neural network based on binary stt-mram for adaptive aiedge,” in .IEEE, 2019, pp. 1–6.[27] T. Hirtzlin, B. Penkovsky, M. Bocquet, J.-O. Klein, J.-M. Portal, andD. Querlioz, “Stochastic computing for hardware implementation ofbinarized neural networks,”
IEEE Access , vol. 7, pp. 76 394–76 403,2019.[28] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learningapplied to document recognition,”
Proc. IEEE , vol. 86, no. 11, pp. 2278–2324, 1998.[29] A. Krizhevsky and G. Hinton, “Learning multiple layers of features fromtiny images,” Citeseer, Tech. Rep., 2009.[30] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classificationwith deep convolutional neural networks,” in
Advances in neural infor-mation processing systems , 2012, pp. 1097–1105.[31] T. Hirtzlin, M. Bocquet, J.-O. Klein, E. Nowak, E. Vianello, J.-M. Portal,and D. Querlioz, “Outstanding bit error tolerance of resistive ram-basedbinarized neural networks,” arXiv preprint arXiv:1904.03652 , 2019.[32] T. Hirtzlin, M. Bocquet, M. Ernoult, J.-O. Klein, E. Nowak, E. Vianello,J.-M. Portal, and D. Querlioz, “Hybrid analog-digital learning withdifferential rram synapses,” in2019 IEEE International Electron DevicesMeeting (IEDM)