Implementation of Ternary Weights with Resistive RAM Using a Single Sense Operation per Synapse
Axel Laborieux, Marc Bocquet, Tifenn Hirtzlin, Jacques-Olivier Klein, Etienne Nowak, Elisa Vianello, Jean-Michel Portal, Damien Querlioz
11 Implementation of Ternary Weightswith Resistive RAM Using a SingleSense Operation per Synapse
Axel Laborieux,
Student Member, IEEE,
Marc Bocquet, Tifenn Hirtzlin,
Student Member, IEEE,
Jacques-Olivier Klein,
Member, IEEE,
Etienne Nowak, Elisa Vianello,
Member, IEEE,
Jean-Michel Portal
Member, IEEE, and Damien Querlioz
Member, IEEE
Abstract —The design of systems implementing low precisionneural networks with emerging memories such as resistiverandom access memory (RRAM) is a significant lead for reducingthe energy consumption of artificial intelligence. To achievemaximum energy efficiency in such systems, logic and memoryshould be integrated as tightly as possible. In this work, we focuson the case of ternary neural networks, where synaptic weightsassume ternary values. We propose a two-transistor/two-resistormemory architecture employing a precharge sense amplifier,where the weight value can be extracted in a single senseoperation. Based on experimental measurements on a hybrid130 nm CMOS/RRAM chip featuring this sense amplifier, weshow that this technique is particularly appropriate at lowsupply voltage, and that it is resilient to process, voltage, andtemperature variations. We characterize the bit error rate inour scheme. We show based on neural network simulation on theCIFAR-10 image recognition task that the use of ternary neuralnetworks significantly increases neural network performance,with regards to binary ones, which are often preferred forinference hardware. We finally evidence that the neural networkis immune to the type of bit errors observed in our scheme, whichcan therefore be used without error correction.
Index Terms —Neural Networks, Resistive Memory, QuantizedNeural Networks, Low Voltage Operation, Sense Amplifier.
I. I
NTRODUCTION
Artificial Intelligence has made tremendous progress inrecent years due to the development of deep neural networks.Its deployment at the edge, however, is currently limited by thehigh power consumption of the associated algorithms [1]. Lowprecision neural networks are currently emerging as a solution,as they allow the development of low power consumptionhardware specialized in deep learning inference [2]. The mostextreme case of low precision neural networks, the BinarizedNeural Network (BNN), also called XNOR-NET, is receivingparticular attention as it is especially efficient for hardwareimplementation: both synaptic weights and neuronal activa-tions assume only binary values [3], [4]. Remarkably, this type
Axel Laborieux, Tifenn Hirtzlin, Jacques-Olivier Klein, andDamien Querlioz are with Universit´e Paris-Saclay, CNRS, Centre deNanosciences et de Nanotechnologies, 91120 Palaiseau, France. email:[email protected] Bocquet and Jean-Michel Portal are with Institut Mat´eriaux Mi-cro´electronique Nanosciences de Provence, Univ. Aix-Marseille et Toulon,CNRS, France.Etienne Nowak and Elisa Vianello are with Universit´e Grenoble-Alpes,CEA, LETI, Grenoble, France.This work was supported by the ERC Grant NANOINFER (715872) andthe ANR grant NEURONIC (ANR-18-CE24-0009). of neural network can achieve high accuracy on vision tasks[5]. One particularly investigated lead is to fabricate hardwareBNNs with emerging memories such as resistive RAM ormemristors [6]–[13]. The low memory requirements of BNNs,as well as their reliance on simple arithmetic operations,make them indeed particularly adapted for “in-memory” or“near-memory” computing approaches, which achieve superiorenergy-efficiency by avoiding the von Neumann bottleneckentirely.Ternary neural networks [14] (TNN, also called GatedXNOR-NET, or GXNOR-NET [15]), which add the value to synaptic weights and activations, are also consideredfor hardware implementations [16]–[19]. They are compara-tively receiving less attention than binarized neural networks,however. In this work, we highlight that implementing TNNsdoes not necessarily imply considerable overhead with regardsto BNNs. We introduce a two-transistor/two-resistor mem-ory architecture for TNN implementation. The array uses aprecharge sense amplifier for reading weights, and the ternaryweight value can be extracted in a single sense operation,by exploiting the fact that latency of the sense amplifierdepends on the resistive states of the memory devices. Thiswork extends a hardware developed for the energy-efficientimplementation of BNNs [6], where the synaptic weights areimplemented in a differential fashion. We, therefore, show thatit can be extended to TNNs without overhead on the memoryarray.The contribution of this work is as follows. After presentingthe background of the work (section II): • We demonstrate experimentally, on a fabricated 130 nmRRAM/CMOS hybrid chip, a strategy for implementingternary weights using a precharge sense amplifier, whichis particularly appropriate when the sense amplifier isoperated at low supply voltage (section III). • We analyze the bit errors of this scheme experimentallyand their dependence on the RRAM programming con-ditions (section V). • We verify the robustness of the approach to process,voltage, and temperature variations (section IV). • We carry simulations that show the superiority of TNNsover BNNs on the canonical CIFAR-10 vision task, andevidence the error resilience of hardware TNNs (sec-tion VI). a r X i v : . [ c s . ET ] J u l • We discuss the results, and compare our approach withthe idea of storing three resistance levels per device.Partial and preliminary results of this work have beenpresented at a conference [20]. This journal version adds theexperimental characterization of bit errors in our architecture,supported by a comprehensive analysis of the impact onprocess, voltage, and temperature variations, and their impactat the neural network level, together with a detailed analysisof the use of ternary networks over binarized ones.II. B
ACKGROUND
The main equation in conventional neural networks is thecomputation of the neuronal activation A j = f ( (cid:80) i W ji X i ) , where A j , the synaptic weights W ji , and input neuronalactivations X i assume real values, and f is a non-linearactivation function. Binarized neural networks (BNNs) are aconsiderable simplification of conventional neural networks, inwhich all neuronal activations ( A j , X i ) and synaptic weights W ji can only take binary values meaning +1 and − . Neu-ronal activation then becomes: A j = sign (cid:32)(cid:88) i XN OR ( W ji , X i ) − T j (cid:33) , (1)where sign is the sign function, T j is a threshold associatedwith the neuron, and the XN OR operation is defined inTable I. Training BNNs is a relatively sophisticated operation,during which each synapse needs to be associated with areal value in addition to its binary value (see Appendix).Once training is finished, these real values can be discarded,and the neural network is entirely binarized. Due to theirreduced memory requirements, and reliance on simple arith-metic operations, BNNs are especially appropriate for in- ornear- memory implementations. In particular, multiple groupsinvestigate the implementation of BNN inference with resistivememory tightly integrated at the core of CMOS [6]–[13].Usually, resistive memory stores the synaptic weights W ji .However, this comes with a significant challenge: resistivememory is prone to bit errors, and in digital applications, istypically used with strong error-correcting codes (ECC). ECC,which requires large decoding circuits [21], goes against theprinciples of in- or near- memory computing. For this reason,[6] proposes a two-transistor/two-resistor (2T2R) structure,which reduces resistive memory bit errors, without the needfor ECC decoding circuit, by storing synaptic weights in adifferential fashion. This architecture allows the extremely effi-cient implementation of BNNs, and using the resistive memorydevices in very favorable programming conditions (low energy,high endurance). It should be noted that systems using thisarchitecture function with row-by-row read operations, anddo not use the in-memory computing technique of using theKirchhoff current law to perform the sum operation of neuralnetworks, while reading all devices at the same time [22],[23]. This choice limits the parallelism of such architectures,while at the same time avoiding the need of analog-to-digitalconversion and analog circuits such as operational amplifiers,as discussed in detail in [24]. TABLE IT
RUTH T ABLES OF THE
XNOR
AND
GXNOR G
ATES W ji X i XNOR − − − − − −
11 1 1 W ji X i GXNOR − − − − − −
11 1 10 X X Fig. 1. (a) Electron microscopy image of a hafnium oxide resistive memorycell (RRAM) integrated in the backend-of-line of a
CMOS process.(b) Photograph and (c) simplified schematic of the hybrid CMOS/RRAM testchip characterized in this work.
In this work, we show that the same architecture can beused for a generalization of BNNs – ternary neural networks(TNNs) , where neuronal activations and synaptic weights A j , X i , and W ji can now assume three values: +1 , − , and .Equation (1) now becomes: A j = φ (cid:32)(cid:88) i GXN OR ( W ji , X i ) − T j (cid:33) . (2) GXN OR is the “gated” XNOR operation that realizes theproduct between numbers with values +1 , − and (Table I). φ is an activation function that outputs +1 if its input is greaterthan a threshold ∆ , − if the input is lesser than − ∆ and otherwise. We show experimentally and by circuit simulationin sec. III how the 2T2R BNN architecture can be extendedto TNNs with practically no overhead, in sec. V its bit errors,and in sec. VI the corresponding benefits in terms of neuralnetwork accuracy. In the literature, the name “Ternary Neural Networks” is sometimes alsoused to refer to neural networks where the synaptic weights are ternarized,but the neuronal activations remain real or integer [25], [26].
III. T HE O PERATION OF
A P
RECHARGE S ENSE A MPLIFIER C AN P ROVIDE T ERNARY W EIGHTS
WL SENWLgnd SEN Qb R BL R BLb
BL BLb
XOR
QQb
SEN
Q Data
QQb
Fig. 2. Schematic of the precharge sense amplifier fabricated in the test chip. V ( V ) (a) LRS/HRS SEN0.00.51.0 V ( V ) BLBLb0.00.51.0 V ( V ) QQb0 50 100 150 200 250time (ns)0.00.51.0 V ( V ) XOR (b) HRS/HRS
SENBLBLbQQb0 50 100 150 200 250time (ns)XOR
Fig. 3. Circuit simulation of the precharge sense amplifier of Fig. 2 witha supply voltage of . , using thick oxide transistors (nominal voltage of ), if the two devices are programmed in an (a) LRS / HRS ( / )or (b) HRS/HRS ( / ) configuration. In this work, we use the architecture of [6], where synap-tic weights are stored in a differential fashion. Each bit isimplemented using two devices programmed either as lowresistance state (LRS) / high resistance state (HRS) to meanweight +1 or HRS/LRS to mean weight − . Fig. 1 presentsthe test chip used for the experiments. This chip cointegrates CMOS and resistive memory in the back-end-of-line,between levels four and five of metal. The resistive memorycells are based on thick hafnium oxide (Fig. 1(a)). Alldevices are integrated with a series NMOS transistor. After aninitial forming step (consisting in the application of a voltageramp from zero volts to . at a rate of / s , and witha current limited to a compliance of µ A ), the devices canswitch between high resistance state (HRS) and low resistancestate (LRS), through the dissolution or creation of conductivefilaments of oxygen vacancies. Programming into the HRS is obtained by the application of a negative RESET voltage pulse(typically between . and . during µ s ). Programminginto the LRS is obtained by the application of a positive SETpulse (also typically between . and . during µ s ), withcurrent limited to a compliance current through the choice ofthe voltage applied on the transistor gate through the wordline. More details on the RRAM technology are provided in[24].Our experiments are based on a , devices array in-corporating all sense and periphery circuitry, illustrated inFig. 1(b-c). The ternary synaptic weights are read using on-chip precharge sense amplifiers (PCSA), presented in Fig. 2,and initially proposed in [27] for reading spin-transfer magne-toresistive random access memory. Fig. 3(a) shows an electri-cal simulation of this circuit to explain its working principle,using the Mentor Graphics Eldo simulator. These first simula-tions are presented in the commercial ultra-low leakagetechnology, used in our test chip, with a low supply voltage of . [28], with thick oxide transistors (the nominal voltagein this process for thick oxide transistor is ). Since thetechnology targets ultra-low leakage applications the thresholdvoltages are significantly high (around . ), thus a supplyvoltage of . significantly reduces the overdrive of thetransistors ( V GS − V T H ).In the first phase (SEN=0), the outputs Q and Qb areprecharged to the supply voltage V DD . In the second phase(SEN= V DD ), each branch starts to discharge to the ground.The branch that has the resistive memory (BL or BLb) withthe lowest electrical resistance discharges faster and causes itsassociated inverter to drive the output of the other inverter tothe supply voltage. At the end of the process, the two outputsare therefore complementary and can be used to tell whichresistive memory has the highest resistance and therefore thesynaptic weight. We observed that the convergence speed ofa PCSA depends heavily on the resistance state of the tworesistive memories. This effect is particularly magnified whenthe PCSA is used with a reduced overdrive, as presented here:the operation of the sense amplifier is slowed down, with re-gards to nominal voltage operation, and the convergence speeddifferences between resistance values become more apparent.Fig. 3(b) shows a simulation where the two devices, BL andBLb, were programmed in the HRS. We see that the twooutputs converge to complementary values in more than ,whereas less than were necessary in Fig. 3(a), where thedevices are programmed in complementary LRS/HRS states.These first simulations suggest a technique for implement-ing ternary weights using the memory array of our test chip.Similarly to when this array is used to implement BNN, wepropose to program the devices in the LRS/HRS configura-tion to mean the synaptic weight , and HRS/LRS to meanthe synaptic weight − . Additionally, we use the HRS/HRSconfiguration to mean synaptic weight , while the LRS/LRSconfiguration is avoided. The sense operation is performedduring a duration of . If at the end of this period, outputsQ and Qb have differentiated, causing the output of the XORgate to be 1, output Q determines the synaptic weight ( or − ). Otherwise, the output of the XOR gate is 0, and theweight is determined to be . Fig. 4. Two devices have been programmed in four distinct programmingconditions, presented in (a), and measured using an on-chip sense amplifier.(b) Proportion of read operations that have converged in , over 100 trials.
This type of coding is reminiscent to the one used by the2T2R ternary content-addressable memory (TCAM) cell of[29], where the LRS/HRS combination is used for coding ,the HRS/LRS combination for coding , and the HRS/HRScombination for coding “don’t care” (or X).Experimental measurements on our test chip confirm thatthe PCSA can be used in this fashion. We first focus on onesynapse of the memory array. We program one of the twodevices (BLb) to a resistance of . We then program itscomplementary device BL to several resistance values, and foreach of them perform 100 read operations of duration ,using on-chip PCSAs. These PCSAs are fabricated using thick-oxide transistors, designed for a nominal supply voltage of V ,and here used with a supply voltage of . V , close to theirthreshold voltage ( . V ), to reduce their overdrive, and thusto exacerbate the PCSA delay variations. The use of thickoxide transistors in this test chip allows us to investigate thebehavior of the devices at high voltages, without the concernof damaging the CMOS periphery circuits. Fig. 4 plots theprobability that the sense amplifier has converged during theread time. In , the read operation is only converged if theresistance of the BL device is significantly lower than .To evaluate this behavior in a broader range of program-ming conditions, we repeated the experiment on 109 devicesand their complementary devices of the memory array pro-grammed, each 14 times, with various resistance values in theresistive memory, and performed a read operation in withan on-chip PCSA. The memory array of our test chip featuresone separate PCSA per column. Therefore, 32 different PCSAsare used in our results. Fig. 5(a) shows, for each couple ofresistance values R BL and R BLb if the read operation wasconverged with Q = V DD (blue), meaning a weight of ,converged with Q = 0 (red), meaning a weight of − , or not Fig. 5. For 109 device pairs programmed with multiple R BL /R BLb configuration, value of the synaptic weight measured by the on-chip senseamplifier using the strategy described in body text and reading time. converged (grey) meaning a weight of .The results confirm that LRS/HRS or HRS/LRS configura-tions may be used to mean weights and − , and HRS/HRSfor weight . When both devices are in HRS (resistance higherthan , the PCSA never converges within (weightof ). When one device is in LRS (resistance lower than , the PCSA always converges within (weight of ± ). The separation between the (or − ) and regions isnot strict, and for intermediate resistance values, we see thatthe read operation may or may not converge in . Fig. 5(b)summarizes the different operation regimes of the PCSA.IV. I MPACT OF P ROCESS , V
OLTAGE , AND T EMPERATURE V ARIATIONS
We now verify the robustness of the proposed scheme toprocess, voltage, and temperature variation. For this purpose,we performed extensive circuit simulations of the operationof the sense amplifier, reproducing the conditions of theexperiments of Fig. 5, using the same resistance values for the
Fig. 6. Three Monte Carlo SPICE-based simulation of the experiments of Fig. 5, in three situations: (a) slow transistors ( ◦ C temperature, . supplyvoltage), (b) experimental conditions ( ◦ C temperature, . supply voltage), (c) fast transistors ( ◦ C temperature, . supply voltage). The simulationsinclude local and global process variations, as well as transistor mismatch, in a way that each point in the Figure is obtained using different transistorparameters. All results are plotted in the same manner and with the same conventions as Fig. 5.TABLE IIE RROR R ATES ON T ERNARY W EIGHTS M EASURED E XPERIMENTALLY
Programming Type 1 Type 2 Type 3Conditions ( ←→ − ) ( ± → ) ( → ± )Fig. 7(a) < − <
1% 6 . Fig. 7(b) < − <
1% 18 . RRAM devices, and including process, voltage, and temper-ature variations. The results of the simulations are processedand plotted using the same format as the experimental resultsof Fig. 5, to ease comparison.These simulations are obtained using the Monte Carlosimulator provided by the Mentor Graphics Eldo tool withparameters validated on silicon, provided by the design kit ofour commercial CMOS process. Each point in the graphesof Fig. 6 therefore features different transistor parameters.We included global and local process variations, as well astransistor mismatch, in order to capture the whole range oftransistor variabilities observed in silicon. In order to assessthe impact of voltage and temperature variations, these simu-lations are presented in three conditions: slow transistors ( ◦ C temperature, and . supply voltage, Fig. 6(a)), experimen-tal conditions ( ◦ C temperature, and . supply voltage,Fig. 6(b)), and fast transistors ( ◦ C temperature, and . supply voltage, Fig. 6(a)).In all three conditions, the simulation results appear verysimilar to the experiments. Three clear regions are observed:non-convergence of the sense amplifier within ns for de-vices in HRS/HRS, and convergence within this time to a +1 or − value for devices in LRS/HRS and HRS/LRS,respectively. However, the frontier between these regimes ismuch sharper in the simulations as in the experiments. As thedifferent data points in Fig. 6 differ by process and mismatchvariations, this suggests that process variation do not cause the stochasticity observed in the experiments of Fig. 5, and thatthey have little impact in our scheme.We also see that the frontier between the different senseregimes in all three operating conditions remains firmly withinthe − k Ω range, suggesting that even high variations ofvoltage ( ± . V ) and temperature ( ± ◦ C ) do not endangerthe functionality of our scheme. Logically, in the case of fasttransistors, the frontier is shifted toward higher resistances,whereas in the case of slow transistors, it is shifted towardlower resistances. Independent simulations allowed verifyingthat this change is mostly due to the voltage variations: thetemperature variations have almost negligible impact on theproposed scheme.We also observed that the impact of voltage variationsincreased importantly when reducing the supply voltage. Forexample, with a supply voltage of . V instead of the . V value considered here, variations of the supply voltage of ± . V can impact the mean switching delay of the PCSA,by a factor two. The thick oxide transistors used in this workhave a nominal voltage of V , and a typical threshold voltageof approximately . V . Therefore, although our scheme isespecially adapted for supply voltages far below the nominalvoltage, it is not necessarily adapted for voltages in thesubthreshold regime, or very close to the threshold voltage.V. P ROGRAMMABILITY OF T ERNARY W EIGHTS
To ensure reliable functioning of the ternary sense operation,we have seen that devices in LRS should be programmedto electrical resistance below , and devices in HRS toresistances above (Fig. 5(b)). The electrical resistanceof resistive memory devices depends considerably on theirprogramming conditions [24], [30]. Fig. 7 shows the distri-butions of LRS and HRS resistances using two programmingconditions, over the , devices of the array, differentiatingdevices connected to bit lines and to bit lines bar. We seethat in all cases, the LRS features a tight distribution. TheSET process is indeed controlled by a compliance current that Resistance ( ) D i s t r i b u t i o n () BL/BLb R HRS R LRS Resistance ( ) D i s t r i b u t i o n () BL/BLb R HRS R LRS
Fig. 7. Distribution of the LRS and HRS states programmed with a SETcompliance of µ A , RESET voltage of . and programming pulses of(a) µ s and (b) µ s . Measurements are performed , RRAM devices,separating bit line (full lines) and bit line bar (dashed lines) devices. naturally stops the filament growth at a targeted resistancevalue [31]. An appropriate choice of the compliance currentcan ensure LRS below in most situations.On the other hand, the HRS shows a broad statisticaldistribution. In the RESET process, the filament indeed breaksin a random process, making it extremely hard to controlthe final state [31], [32]. The use of stronger programmingconditions leads to higher values of the HRS.This asymmetry between the variability of LRS and HRSmeans that in our scheme, the different ternary weight valuesfeature different error rates naturally. The ternary error ratesin the two programming conditions of Fig. 7(a) are listed inTable II. Errors of Type 1, where a weight values of and − are inverted are the least frequent. Errors of Type 2, wherea weight value of or − are replaced by a weight valueof are infrequent as well. On the other hand, due to thelarge variability of the HRS, weight values have a significantprobability to be measured as or − (Type 3 errors): . in the conditions of Fig. 7(a), and . in the conditions ofFig. 7(b).Some resistive memory technologies with large memorywindows, such as specifically optimized conductive bridgememories [33], would feature lower Type 3 error rates. Simi-larly, program-and-verify strategies [34]–[36] may reduce thiserror rate. Nevertheless, the higher error rate for zeros than for and − weights is an inherent feature of our architecture.Therefore, in the next section, we assess the impact of theseerrors on the accuracy of neural networks.VI. N ETWORK -L EVEL I MPLICATIONS
We first investigate the accuracy gain when using ternarizedinstead of binarized networks. We trained BNN and TNNversions of networks with Visual Geometry Group (VGG) typearchitectures [37] on the CIFAR-10 task of image recognition,consisting in classifying 1,024 pixels color images amongten classes (airplane, automobile, bird, cat, deer, dog, frog,horse, ship, and truck) [38]. Simulations are performed usingPyTorch 1.1.0 [39] on a cluster of eight Nvidia GeForce RTX2080 GPUs. The architecture of our networks consists of six convolu-tional layers with kernel size three. The number of filters atthe first layer is called N and is multiplied by two every twolayers. Maximum-value pooling with kernel size two is usedevery two layers and batch-normalization [40] every layer. Theclassifier consists of one hidden layer of 512 units. For theTNN, the activation function has a threshold ∆ = 5 · − (as defined in section II). The training methods for both theBNN and the TNN are described in the Appendix. The trainingis performed using the AdamW optimizer [41], [42], withminibatch size . The initial learning rate is set to . , andthe learning rate schedule from [42], [43] (Cosine annealingwith two restarts, for respectively , , epochs) isused, resulting in a total of epochs. Training data isaugmented using random horizontal flip, and random choicebetween cropping after padding and random small rotations.No error is added during the training procedure, as ourdevice is meant to be used for inference. The synaptic weightsencoded by device pairs would be set after the model hasbeen trained on a computer. Fig. 8 shows the maximumtest accuracy resulting from these training simulations, fordifferent sizes of the model. The error bars represent onestandard deviation of the training accuracies. TNNs alwaysoutperform BNNs with the same model size (and, therefore,the same number of synapses). The most substantial differenceis seen for smaller model size, but a significant gap remainseven for large models. Besides, the difference in the number ofparameters required to reach a given accuracy for TNNs andBNNs increases with higher accuracies. There is, therefore, adefinite advantage to use TNNs instead of BNNs.Fig. 8 compared fully ternarized (weights and activations)with regards to fully binarized (weights and activations) ones.Table III lists the impact of weight ternarization for differenttypes of activations (binary, ternary, and real activation). Allresults are reported on a model of size N = 128 , trainedon CIFAR-10, and are averaged over five training procedures.We observe that for BNNs and TNNs with quantized acti-vations, the accuracy gains provided by ternary weights overbinary weights are . and . points and are statisticallysignificant over the standard deviations. This accuracy gain is Fig. 8. Simulation of the maximum test accuracy reached during one trainingprocedure, averaged over five trials, for BNNs and TNNs with various modelsizes on the CIFAR-10 dataset. Error bar is one standard deviation.
Fig. 9. Simulation of the impact of Bit Error Rate on the test accuracy atinference time for model size N = 128 TNN in (a) and BNN in (b). Type1 errors are sign switches (e.g. +1 mistaken for − ), Type 2 errors are ± mistaken for , and Type 3 errors are mistaken for ± , as described in theinset schematics. Errors are sampled at each mini batch and the test accuracyis averaged over five passes through the test set. Error bars are one standarddeviation. more important than the gain provided by ternary activationsover binary activations, which is about . points. This biggerimpact of weight ternarization over ternary activation maycome from the ternary kernels having a better expressingpower over binary kernels, which are often redundant inpractical settings [3]. The gain of ternary weights drops to . points if real activation is allowed (using rectified linearunit, or ReLU, as activation function, see appendix), and isnot statistically significant considering the standard deviations.Quantized activations are vastly more favorable in the contextof hardware implementations, and in this situation, there isthus a statistically significant benefit provided by ternaryweights over binary weights.We finally investigate the impact of bit errors in BNNs andTNNs to see if the advantage provided by using TNNs in ourapproach remains constant when errors are taken into account.Consistently with the results reported in section V, three typesof errors are investigated: Type 1 errors are sign switches, e.g., +1 mistaken for − , Type 2 errors are only defined for TNNsand correspond to ± mistaken for , and Type 3 errors are mistaken for ± , as illustrated in the inset schematic ofFig. 9(a).Fig. 9(a) shows the impact of these errors on the test TABLE IIIC
OMPARISON OF THE GAIN IN TEST ACCURACY FOR A N = 128 MODELSIZE ON
CIFAR-10
OBTAINED BY WEIGHT TERNARIZATION INSTEAD OFBINARIZATION FOR THREE TYPES OF ACTIVATION QUANTIZATION . Activations
Binary Ternary Full Precision
Weights
Binary . ± .
08 91 . ± .
09 93 . ± . Ternary . ± .
12 92 . ± .
05 94 . ± . Gain of ternarization 0.84 0.86 0.26 accuracy for different values of the error rate at inferencetime. These simulation results are presented on CIFAR-10 witha model size N = 128 . Errors are randomly and artificiallyintroduced in the weights of the neural network . Bit errors areincluded at the layer level and sampled at each mini batch ofthe test set. Type 1 errors switch the sign of a synaptic weightwith a probability equal to the rate of type 1 errors. Type 2errors set a non-zero synaptic weight to 0 with a probabilityequal to the type 2 error rate. Type 3 errors set a synapticweight of 0 to ± with a probability equal to the type 3 errorrate, the choice of the sign ( +1 or − ) is made with 0.5probability. Fig. 9 is obtained by averaging the test accuracyobtained for five passes through the test set for increasing biterror rate.Type 1 errors have the most impact on neural networkaccuracy. As seen in Fig. 9(b), the impact of these errors issimilar to the impact of weight errors in a BNN. On the otherhand, Type 3 errors have the least impact, with bit error ratesas high as degrading surprisingly little the accuracy. Thisresult is fortunate, as we have seen in section V that Type 3errors are the most frequent in our architecture.We also performed simulations considering all three typesof error at the same time, with error rates reported in Table IIcorresponding to the programming conditions of Fig. 7(a) and7(b). For Type 1 and Type 2 errors, we considered the upperlimits listed in Table II. For both the conditions of Fig. 7(a)and 7(b), we observed only a slight degradation on CIFAR-10test accuracy by . , due mostly to the Type 2 errors.The fact that mistaking a weight for a ± weight (Type 3error) has much less impact than mistaking a ± weight fora weight (Type 2 error) can seem surprising. However, itis known, theoretically and practically, that in BNNs, someweights have little importance to the accuracy of the neuralnetworks [44]. They typically correspond to synapses thatfeature a weight in a TNN, whereas synapses with ± weights in a TNN correspond to “important” synapses of aBNN. It is thus understandable that errors on such synapseshave more impact on the final accuracy of the neural network.VII. C OMPARISON WITH T HREE -L EVEL P ROGRAMMING
An alternative approach to implementing ternary weightswith resistive memory can be to program the individual devicesinto there separate levels. This idea is feasible, as the resistancelevel of the LRS can to a large extent be controlled through the choice of the compliance current during the SET operationin many resistive memory technologies [24], [31].The obvious advantage of this approach is that it requiresa single device per synapse. This idea also brings severalchallenges. First, the sense operation has to be more complex.The most natural technique is to perform two sense operations,comparing the resistance of a device under test to two differentthresholds. Second, this technique is much more prone to biterrors than our technique, as states are not programmed in adifferential fashion [24]. Additionally, this approach does notfeature the natural resilience to Type 1 and Type 2 errors, andType 2 and Type 3 errors will typically feature similar rates.Finally, unlike ours, this approach is prone to resistive drift,inherent to some resistive memory technologies [45].These comments suggest that the choice of a techniquefor storing ternary weights should be dictated by technology.Our technique is especially appropriate for resistive memoriesnot supporting single-device multilevel storage, with higherror rates, or resistance drift. The three-levels per devicesapproach would be the most appropriate with devices withwell controlled analog storage properties.VIII. C
ONCLUSION
In this work, we revisited a differential memory archi-tecture for BNNs. We showed experimentally on a hybridCMOS/RRAM chip that, its sense amplifier can differentiatenot only the LRS/HRS and HRS/LRS states, but also theHRS/HRS states in a single sense operation. This featureallows the architecture to store ternary weights, and to providea building block for ternary neural networks. We showed byneural network simulation on the CIFAR-10 task the benefitsof using ternary instead of binary networks, and the highresilience of TNNs to weights errors, as the type of errorsobserved experimentally in our scheme is also the type oferrors to which TNNs are the most immune. This resilienceallows the use of our architecture without relying on anyformal error correction. Our approach also appears resilientto process, voltage, and temperature variation if the supplyvoltage remains reasonably higher than the threshold voltageof the transistors.As this behavior of the sense amplifier is exacerbated atsupply voltages below the nominal voltage, our approachspecially targets extremely energy-conscious applications suchas uses within wireless sensors or medical applications. Thiswork opens the way for increasing the edge intelligencein such contexts, and also highlights that the low voltageoperation of circuits may sometimes provide opportunities fornew functionalities.A
CKNOWLEDGMENTS
The authors would like to thank M. Ernoult and L. HerreraDiez for fruitful discussions.A
PPENDIX : T
RAINING A LGORITHM OF B INARIZED AND T ERNARY N EURAL N ETWORKS
During the training of BNNs and TNNs, each quantized(binary or ternary) weight is associated with a real hidden
Algorithm 1
Training procedure for binary and ternary neuralnetworks. W h are the hidden weights, θ BN = ( γ l , β l ) areBatch Normalization parameters, U W and U θ are the param-eter updates prescribed by the Adam algorithm [41], ( X, y ) is a batch of labelled training data, and η is the learningrate. “cache” denotes all the intermediate layers computationsneeded to be stored for the backward pass. Quantize is either φ or sign as defined in section II. “ · ” denotes the element-wise product of two tensors with compatible shapes. Input : W h , θ BN = ( γ l , β l ) , U W , U θ , ( X, y ) , η . Output : W h , θ BN , U W , U θ . W Q ← Quantize( W h ) (cid:46) Computing quantized weights A ← X (cid:46)
Input is not quantized for l = 1 to L do (cid:46) For loop over the layers z l ← W Q l A l (cid:46) Matrix multiplication A l ← γ l · z l − E( z l ) √ Var( z l )+ (cid:15) + β l (cid:46) Batch Normalization [40] if l < L then (cid:46) If not the last layer A l ← Quantize( A l ) (cid:46) Activation is quantized end if end for ˆ y ← A L C ← Cost(ˆ y, y ) (cid:46) Compute mean loss over the batch ( ∂ W C, ∂ θ C ) ← Backward( C, ˆ y, W Q , θ BN , cache) (cid:46) Cost gradients ( U W , U θ ) ← Adam( ∂ W C, ∂ θ C, U W , U θ ) W h ← W h − ηU W θ BN ← θ BN − ηU θ return W h , θ BN , U W , U θ weight. This approach to training quantized neural networkwas introduced in [3] and is presented in Algorithm 1.The quantized weights are used for computing neuron values(equations (1) and (2)), as well as the gradients values inthe backward pass. However, training steps are achieved byupdating the real hidden weights. The quantized weight isthen determined by applying to the real value the quantizingfunction Quantize , which is φ for ternary or sign for binaryas defined in section II. The quantization of activations isdone by applying the same function Quantize , except for realactivation, which is done by applying a rectified linear unit(
ReLU( x ) = max(0 , x ) ).Quantized activation functions ( φ or sign ) have zero deriva-tives almost everywhere, which is an issue for backpropagatingthe error gradients through the network. A way around thisissue is the use of a straight-through estimator [46], whichconsists in taking the derivative of another function instead ofthe almost everywhere zero derivatives. Throughout this work,we take the derivative of Hardtanh , which is 1 between -1and 1 and 0 elsewhere, both for binary and ternary activations.The simulation code used in this work is available publiclyin the Github repository: https://github.com/Laborieux-Axel/Quantized VGG R
EFERENCES[1] X. Xu, Y. Ding, S. X. Hu, M. Niemier, J. Cong, Y. Hu, and Y. Shi,“Scaling for edge inference of deep neural networks,”
Nature Electron- ics , vol. 1, no. 4, p. 216, 2018.[2] I. Hubara, M. Courbariaux, D. Soudry, R. El-Yaniv, and Y. Bengio,“Quantized neural networks: Training neural networks with low pre-cision weights and activations,”
The Journal of Machine LearningResearch , vol. 18, no. 1, pp. 6869–6898, 2017.[3] M. Courbariaux, I. Hubara, D. Soudry, R. El-Yaniv, and Y. Ben-gio, “Binarized neural networks: Training deep neural networks withweights and activations constrained to+ 1 or-1,” arXiv preprintarXiv:1602.02830 , 2016.[4] M. Rastegari, V. Ordonez, J. Redmon, and A. Farhadi, “Xnor-net:Imagenet classification using binary convolutional neural networks,” in
Proc. ECCV . Springer, 2016, pp. 525–542.[5] X. Lin, C. Zhao, and W. Pan, “Towards accurate binary convolutionalneural network,” in
Advances in Neural Information Processing Systems ,2017, pp. 345–353.[6] M. Bocquet, T. Hirztlin, J.-O. Klein, E. Nowak, E. Vianello, J.-M.Portal, and D. Querlioz, “In-memory and error-immune differential rramimplementation of binarized deep neural networks,” in
IEDM Tech. Dig.
IEEE, 2018, p. 20.6.1.[7] S. Yu, Z. Li, P.-Y. Chen, H. Wu, B. Gao, D. Wang, W. Wu, and H. Qian,“Binary neural network with 16 mb rram macro chip for classificationand online training,” in
IEDM Tech. Dig.
IEEE, 2016, pp. 16–2.[8] E. Giacomin, T. Greenberg-Toledo, S. Kvatinsky, and P.-E. Gaillardon,“A robust digital rram-based convolutional block for low-power imageprocessing and learning applications,”
IEEE Transactions on Circuitsand Systems I: Regular Papers , vol. 66, no. 2, pp. 643–654, 2019.[9] X. Sun, S. Yin, X. Peng, R. Liu, J.-s. Seo, and S. Yu, “Xnor-rram:A scalable and parallel resistive synaptic architecture for binary neuralnetworks,” algorithms , vol. 2, p. 3, 2018.[10] Z. Zhou, P. Huang, Y. Xiang, W. Shen, Y. Zhao, Y. Feng, B. Gao, H. Wu,H. Qian, L. Liu et al. , “A new hardware implementation approach ofbnns based on nonlinear 2t2r synaptic cell,” in . IEEE, 2018, pp. 20–7.[11] M. Natsui, T. Chiba, and T. Hanyu, “Design of mtj-based nonvolatilelogic gates for quantized neural networks,”
Microelectronics journal ,vol. 82, pp. 13–21, 2018.[12] T. Tang, L. Xia, B. Li, Y. Wang, and H. Yang, “Binary convolutionalneural network on rram,” in
Proc. ASP-DAC . IEEE, 2017, pp. 782–787.[13] J. Lee, J. K. Eshraghian, K. Cho, and K. Eshraghian, “Adaptive precisioncnn accelerator using radix-x parallel connected memristor crossbars,” arXiv preprint arXiv:1906.09395 , 2019.[14] H. Alemdar, V. Leroy, A. Prost-Boucle, and F. P´etrot, “Ternary neuralnetworks for resource-efficient ai applications,” in . IEEE, 2017, pp. 2547–2554.[15] L. Deng, P. Jiao, J. Pei, Z. Wu, and G. Li, “Gxnor-net: Trainingdeep neural networks with ternary weights and activations without full-precision memory under a unified discretization framework,”
NeuralNetworks , vol. 100, pp. 49–58, 2018.[16] K. Ando, K. Ueyoshi, K. Orimo, H. Yonekawa, S. Sato, H. Nakahara,M. Ikebe, T. Asai, S. Takamaeda-Yamazaki, T. Kuroda et al. , “Breinmemory: A 13-layer 4.2 k neuron/0.8 m synapse binary/ternary recon-figurable in-memory deep neural network accelerator in 65 nm cmos,”in
Proc. VLSI Symp. on Circuits . IEEE, 2017, pp. C24–C25.[17] A. Prost-Boucle, A. Bourge, F. P´etrot, H. Alemdar, N. Caldwell, andV. Leroy, “Scalable high-performance architecture for convolutionalternary neural networks on fpga,” in . IEEE, 2017,pp. 1–7.[18] Z. Li, P.-Y. Chen, H. Xu, and S. Yu, “Design of ternary neural networkwith 3-d vertical rram array,”
IEEE Transactions on Electron Devices ,vol. 64, no. 6, pp. 2721–2727, 2017.[19] B. Pan, D. Zhang, X. Zhang, H. Wang, J. Bai, J. Yang, Y. Zhang,W. Kang, and W. Zhao, “Skyrmion-induced memristive magnetic tunneljunction for ternary neural network,”
IEEE Journal of the ElectronDevices Society , vol. 7, pp. 529–533, 2019.[20] A. Laborieux, M. Bocquet, T. Hirtzlin, J.-O. Klein, L. H. Diez,E. Nowak, E. Vianello, J.-M. Portal, and D. Querlioz, “Low powerin-memory implementation of ternary neural networks with resistiveram-based synapse,” in , 2020.[21] S. Gregori, A. Cabrini, O. Khouri, and G. Torelli, “On-chip errorcorrecting techniques for new-generation flash memories,”
Proc. IEEE ,vol. 91, no. 4, pp. 602–616, 2003.[22] M. Prezioso et al. , “Training and operation of an integrated neuromor-phic network based on metal-oxide memristors,”
Nature , vol. 521, no.7550, p. 61, 2015. [23] S. Ambrogio et al. , “Equivalent-accuracy accelerated neural-networktraining using analogue memory,”
Nature , vol. 558, p. 60, 2018.[24] T. Hirtzlin, M. Bocquet, B. Penkovsky, J.-O. Klein, E. Nowak,E. Vianello, J.-M. Portal, and D. Querlioz, “Digital biologically plausibleimplementation of binarized neural networks with differential hafniumoxide resistive memory arrays,”
Frontiers in Neuroscience , vol. 13, p.1383, 2020.[25] N. Mellempudi, A. Kundu, D. Mudigere, D. Das, B. Kaul, and P. Dubey,“Ternary neural networks with fine-grained quantization,” arXiv preprintarXiv:1705.01462 , 2017.[26] E. Nurvitadhi, G. Venkatesh, J. Sim, D. Marr, R. Huang, J. OngGee Hock, Y. T. Liew, K. Srivatsan, D. Moss, S. Subhaschandra et al. , “Can fpgas beat gpus in accelerating next-generation deep neuralnetworks?” in
Proc. ACM/SIGDA Int. Symp. Field-Programmable GateArrays . ACM, 2017, pp. 5–14.[27] W. Zhao, C. Chappert, V. Javerliac, and J.-P. Noziere, “High speed,high stability and low power sensing amplifier for mtj/cmos hybrid logiccircuits,”
IEEE Transactions on Magnetics , vol. 45, no. 10, pp. 3784–3787, 2009.[28] R. G. Dreslinski, M. Wieckowski, D. Blaauw, D. Sylvester, andT. Mudge, “Near-threshold computing: Reclaiming moore’s law throughenergy efficient integrated circuits,”
Proc. IEEE , vol. 98, no. 2, pp. 253–266, Feb 2010.[29] R. Yang, H. Li, K. K. Smithe, T. R. Kim, K. Okabe, E. Pop, J. A. Fan,and H.-S. P. Wong, “Ternary content-addressable memory with mos 2transistors for massively parallel data search,”
Nature Electronics , vol. 2,no. 3, pp. 108–114, 2019.[30] A. Grossi, E. Nowak, C. Zambelli, C. Pellissier, S. Bernasconi,G. Cibrario, K. El Hajjam, R. Crochemore, J. Nodin, P. Olivo et al. ,“Fundamental variability limits of filament-based rram,” in
IEDM Tech.Dig.
IEEE, 2016, pp. 4–7.[31] M. Bocquet, D. Deleruyelle, H. Aziza, C. Muller, J.-M. Portal,T. Cabout, and E. Jalaguier, “Robust compact model for bipolar oxide-based resistive switching memories,”
IEEE transactions on electrondevices , vol. 61, no. 3, pp. 674–681, 2014.[32] D. R. B. Ly et al. , “Role of synaptic variability in resistive memory-based spiking neural networks with unsupervised learning,”
J. Phys. D:Applied Physics , 2018.[33] E. Vianello, O. Thomas, G. Molas, O. Turkyilmaz, N. Jovanovi´c,D. Garbin, G. Palma, M. Alayan, C. Nguyen, J. Coignus et al. , “Resistivememories for ultra-low-power embedded computing design,” in . IEEE, 2014, pp. 6–3.[34] S. R. Lee, Y.-B. Kim, M. Chang, K. M. Kim, C. B. Lee, J. H. Hur,G.-S. Park, D. Lee, M.-J. Lee, C. J. Kim et al. , “Multi-level switchingof triple-layered taox rram with excellent reliability for storage classmemory,” in . IEEE,2012, pp. 71–72.[35] F. Alibart, L. Gao, B. D. Hoskins, and D. B. Strukov, “High precisiontuning of state for memristive devices by adaptable variation-tolerantalgorithm,”
Nanotechnology , vol. 23, no. 7, p. 075201, 2012.[36] C. Xu, D. Niu, N. Muralimanohar, N. P. Jouppi, and Y. Xie, “Under-standing the trade-offs in multi-level cell reram memory design,” in . IEEE,2013, pp. 1–6.[37] K. Simonyan and A. Zisserman, “Very deep convolutional networks forlarge-scale image recognition,” arXiv preprint arXiv:1409.1556 , 2014.[38] A. Krizhevsky and G. Hinton, “Learning multiple layers of features fromtiny images,” Citeseer, Tech. Rep., 2009.[39] A. Paszke, S. Gross, S. Chintala, G. Chanan, E. Yang, Z. DeVito, Z. Lin,A. Desmaison, L. Antiga, and A. Lerer, “Automatic differentiation inpytorch,” in
NIPS-W , 2017.[40] S. Ioffe and C. Szegedy, “Batch normalization: Accelerating deepnetwork training by reducing internal covariate shift,” arXiv preprintarXiv:1502.03167 , 2015.[41] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980 , 2014.[42] I. Loshchilov and F. Hutter, “Fixing weight decay regularization inadam,” arXiv preprint arXiv:1711.05101 , 2017.[43] ——, “Sgdr: Stochastic gradient descent with warm restarts,” arXivpreprint arXiv:1608.03983 , 2016.[44] A. Laborieux, M. Ernoult, T. Hirtzlin, and D. Querlioz, “Synap-tic metaplasticity in binarized neural networks,” arXiv preprintarXiv:2003.03533 , 2020.[45] J. Li, B. Luan, and C. Lam, “Resistance drift in phase change memory,”in .IEEE, 2012, pp. 6C–1. [46] Y. Bengio, N. L´eonard, and A. Courville, “Estimating or propagatinggradients through stochastic neurons for conditional computation,” arXivpreprint arXiv:1308.3432 , 2013.PLACEPHOTOHERE Axel Laborieux
Axel Laborieux received the M.S.degree in condensed matter physics from the Uni-versit´e Paris-Saclay, France, in 2018, where he iscurrently pursuing the Ph.D. degree in neuromor-phic computing. His research interest includes thebenefits brought by complex synapse behaviors inbinarized neural networks and their physical imple-mentation using spintronic nanodevices.PLACEPHOTOHERE
Marc Bocquet
Marc Bocquet received the M.S.degree in electrical engineering and the Ph.D. de-gree in electrical engineering from the University ofGrenoble, France, in 2006 and 2009, respectively. Heis currently an Associate Professor with the Instituteof Materials, Microelectronics, and Nanosciencesof Provence, IM2NP, Universit´e of Aix-Marseilleand Toulon. His research interests include memorymodel, memory design, characterization, and relia-bility.PLACEPHOTOHERE
Tifenn Hirtzlin
Tifenn Hirtzlin received the M.S.degree in nanosciences and electronics from theUniversit´e Paris-Sud, France, in 2017, where he iscurrently pursuing the Ph.D. degree in electricalengineering. His research interest includes designingintelligent memory-chip for low energy hardwaredata processing using bio-inspired concepts as aprobabilistic approach to brain function and moreconventional neural network approaches.PLACEPHOTOHERE
Jacques-Olivier Klein
Jacques-Olivier Klein (M90)received the Ph.D. degree from the Universit´e Paris-Sud, France, in 1995, where he is currently a FullProfessor. He focuses on the architecture of circuitsand systems based on emerging nanodevices in thefield of nanomagnetism and bio-inspired nanoelec-tronics. He is also a Lecturer with the InstitutUniversitaire de Technologie (IUT), Cachan. He hasauthored more than 100 technical papers. PLACEPHOTOHERE
Etienne Nowak
Etienne Nowak received the M.Sc.degree in microelectronics from Grenoble Univer-sity, Grenoble, France; Polito di Torino, Turin, Italy;and the Ecole Polytechnique F´ed´erale de Lausanne,Lausanne, Switzerland, in 2007, and the Ph.D. de-gree from the Institut National Polytechnique deGrenoble, Grenoble, France, in 2010. From 2010 to2014, he was a Senior Engineer at the Semiconduc-tor Research and Development Center, SamsungElectronics, Hwaseong, South Korea, where he wasinvolved in the first generations of vertical nandflash memory. He joined CEA-Leti, Grenoble, France, in 2014, as a ProjectManager on emerging nonvolatile memory. He published over 30 papers andholds two patents on these topics. Since 2017, he has been appointed asthe Head of the Advanced Memory Device Laboratory, CEA-Leti, Grenoble,France, dedicated to nonvolatile memory backend technologies.PLACEPHOTOHERE
Elisa Vianello
Elisa Vianello received the Ph.D.degree in microelectronics from the University ofUdine, Udine, Italy, and the Polytechnic Instituteof Grenoble, Grenoble, France, in 2009. She hasbeen a Scientist with the Laboratoire dElectroniquedes Technologies de lInformation, CommissariatlEnergie Atomique et aux Energies Alternatives,Grenoble, since 2011. Her current research interestsinclude resistive switching memory devices and se-lectors and the use of nanotechnologies for memory-centric computing and neuromorphic systems.PLACEPHOTOHERE
Jean-Michel Portal
Jean-Michel Portal graduatedin Electronic Engineering in 1996 and received thePh.D. degree in Computer Sciences in 1999. iscurrently a Full professor in Electronics at Aix-Marseille University, where he heads the electronicdepartment of the Institute of Materials, Microelec-tronics, and Nanosciences of Provence (IM2NP).His research interests include emerging non-volatilememory design and neuromorphic applications. Heis author or co-author of more than 200 articlesin International Refereed Journals and Conferences,and is a co-inventor of six patents. He has supervised 20 Ph.D. students. Heis a recipient of the NanoArch 2012, Newcas 2013 and IEEE Transactions onCircuits and Systems Guillemin-Cauer 2017 Best Paper Awards.PLACEPHOTOHERE