A Single-Cycle MLP Classifier Using Analog MRAM-based Neurons and Synapses
AA Single-Cycle MLP Classifier Using AnalogMRAM-based Neurons and Synapses
Ramtin Zand
Department of Computer Science and Engineering, University of South Carolina, Columbia, SC 29208. ([email protected])
Abstract —In this paper, spin-orbit torque (SOT) magnetore-sistive random-access memory (MRAM) devices are leveraged torealize sigmoidal neurons and binarized synapses for a single-cycle analog in-memory computing (IMC) architecture. First,an analog SOT-MRAM based neuron bitcell is proposed whichachieves × reduction in power-area-product compared tothe previous most power- and area-efficient analog sigmoidalneuron design. Next, proposed neuron and synapse bitcells areused within memory subarrays to form an analog IMC-basedmultilayer perceptron (MLP) architecture for the MNIST patternrecognition application. The architecture-level results exhibit thatour analog IMC architecture achieves at least two and fourorders of magnitude performance improvement compared toa mixed-signal analog/digital IMC architecture and a digitalGPU implementation, respectively, while realizing a comparableclassification accuracy. Index Terms —Analog computing, in-memory computing, mag-netic random access memory (MRAM), multi-layer perceptron(MLP), sigmoidal neuron, spin orbit torque (SOT).
I. I
NTRODUCTION
In-memory computing (IMC) has attracted considerableattention in recent years as a hardware accelerator for artificialneural networks (ANNs) [1], [2]. The main objective of theIMC architectures as alternatives for von-Neumann architec-tures is avoiding the processor-memory bottleneck to realizean energy-efficient and area-sparing computation. To achievethis goal, various techniques have been investigated from using3D integration technology [3] to leveraging beyond-CMOSmemristive devices [4]. Recently, various resistive technolo-gies have been proposed to be used within IMC architecturessuch as resistive random access memory (ReRAM) [4], phase-change memory (PCM) [5], and magnetoresistive random-access memory (MRAM) [1].Most of the previous IMC approaches operate in the digitaldomain [1], [6], meaning that they leverage resistive memorycrossbars to implement Boolean logic operations such asXNOR/XOR within memory subarrays, which can be utilizedto implement multiplication operation in binarized neuralnetworks [7]. While digital IMC approaches provide impor-tant energy and area benefits, they are not fully leveragingthe true potential of resistive memory devices that can berealized in the analog domain. On the other hand, mixed-signal analog/digital IMC architectures [5], [8] leverage theresistive crossbars to compute multiply and accumulation(MAC) operation in O(1) time complexity, however, theystill require transferring data to digital processors to computeactivation functions. Thus, in addition to the energy that is
Fig. 1. (a) SOT-MRAM cell. Positive current along + x induces a spininjection current + z direction. The injected spin current produces the requiredspin torque for aligning the magnetic direction of the free layer in + y directions, and vice versa. (b) SOT-MRAM Top view. consumed to transfer data between processor and memory,signal conversion blocks are required to convert data fromanalog to digital domain and vice versa, which can lead toconsiderable energy overheads. In this paper, we use spin-orbit torque (SOT)-MRAM technology to implement bothsynapses and neurons within analog IMC subarrays that can beconcatenated to form a multilayer perceptron (MLP) classifierthat operates in a single clock cycle.II. SOT-MRAM BASED N EURONS AND S YNAPSES
Fig. 1 shows a simplified structure of a SOT-MRAM cellincluding a magnetic tunnel junction (MTJ) with two ferro-magnetic (FM) layers, which are separated by a thin oxidelayer. MTJ has two different resistance levels, which are de-termined according to the angle ( θ ) between the magnetizationorientation of the FM layers. The resistance of the MTJ inparallel (P) and antiparallel (AP) magnetization configurationscan be obtained using the following equations [9]: R ( θ ) = 2 R MTJ (1 +
T MR )2 +
T MR (1 + cos θ )= (cid:40) R P = R MTJ , θ = 0 R AP = R MTJ (1 +
T MR ) , θ = π (1) T MR ( T, V b ) = T MR / V b V ) (2) where R MT J = RAArea , in which the resistance-area product(RA) value of the MTJ depends on the material compositionof its layers. TMR is the tunneling magnetoresistance, whichrelies on temperature (T) and bias voltage ( V b ). V is a fittingparameter, and T M R is a material-dependent constant.In the MTJ structure, the magnetization direction of elec-trons in one of the FM layers is fixed (pinned layer), while theelectrons’ directions in the other FM layer (free layer) can be a r X i v : . [ c s . ET ] D ec ABLE IP
ARAMETERS OF THE
SHE-MRAM
DEVICE [9].
Parameter Description Value
MT J
Area l MTJ × w MTJ × π nm × nm × π HM V l HM × w HM × t HM nm × nm × nmRA resistance-area product 10 Ω .µm V Fitting parameter 0.65
T MR tunneling magnetoresistance 100Fig. 2. (a) The SOT-MRAM based neuron, (b) The VTC curves showingvarious operating regions of PMOS (MP) and NMOS (MN) transistors. switched. In [10], Liu et al. have shown that passing a chargecurrent through a heavy metal (HM) generates a spin-polarizedcurrent using the spin Hall Effect (SHE), which can switch themagnetization direction of the free layer, as described in Fig. 1.The ratio of the generated spin current to the applied chargecurrent is normally greater than one leading to an energy-efficient switching operation [11], [12]. Herein, we use (1)and (2) to develop a Verilog-A model of the SOT-MRAMdevice using the parameters listed in Table I [9]. The SOT-MRAM model is utilized along with the 14nm HP-FinFETPTM library to implement the neuron and synapse circuitsthat are described in the following. A. SOT-MRAM Based Neuron
Fig. 2 (a) shows the bitcell structure of the proposed neuron,which includes two SOT-MRAM devices and a CMOS-basedinverter. The magnetization configurations of SOT-MRAM1and SOT-MRAM2 devices should be in P and AP states,respectively. The SOT-MRAMs in the neuron’s circuit create avoltage divider, which reduces the slope of the linear operatingregion in the inverter’s voltage transfer characteristic (VTC)curve. The reduction in the slope of the linear region in theCMOS inverter creates a smooth high-to-low output voltagetransition, which enables the realization of a sigmoid activa-tion function. Fig. 2 (b) shows the SPICE circuit simulationresults of the proposed SOT-MRAM based neuron using V DD = 0 . V and V SS = 0 V . The results verify that theneuron can approximate a sigmoid ( − x ) activation functionthat is biased around b = ( V DD − V SS ) voltage. The non-zerobias voltage can be canceled at both circuit- and algorithm-level, as described in the next sections. Fig. 3. The SOT-MRAM based binary synapse.
B. SOT-MRAM Based Synapse
SOT-MRAM cells are capable of realizing two resistivelevels, i.e. R P and R AP . The combination of two SOT-MRAMcells and a differential amplifier can produce the positive andnegative weights required for the implementation of a binarysynapse. Fig. 3 shows a neuron with Y i = X i × W i as itsinput, in which X i is the input signal and W i is a binarizedweight. The corresponding circuit implementation is alsoshown in the figure, which includes two SOT-MRAM cellsand a differential amplifier as the synapse. The output of thedifferential amplifier ( Y i ) is proportional to ( I + − I − ), where I + = X i G + i and I − = X i G − i . Thus, Y i ∝ X i ( G + i − G − i ), inwhich G + i and G − i are the conductance of SOT-MRAM1 andSOT-MRAM2, respectively. The conductance of SOT-MRAMscan be adjusted to realize negative and positive weights in abinary synapse. For instance, for W i = − , SOT-MRAM1and SOT-MRAM2 should be in AP and P states, respectively.According to Eq. (1) R AP > R P , which means G AP < G P since G = 1 /R , therefore G + i < G − i and Y i < .III. P ROPOSED
SOT-MRAM
BASED
MLP A
RCHITECTURE
Fig. 4 exhibit the training and inference paths of a proposed n × m SOT-MRAM based single layer perceptron, which areshown separately for simplicity. The synaptic connections aredesigned in the form of a crossbar architecture, in whichthe number of columns and rows are defined based on thenumber of nodes in input and output layers, respectively.During the training phase, the resistance of the SOT-MRAMbased synapses will be tuned using the bit-lines (BLs) andsource-lines (SLs) which are shared among different rows,as shown in Fig. 4 (a). The write word line (WWL) controlsignals will only activate one row in each clock cycle, thus theentire array can be updated using j clock cycles, where j isequal to the number of neurons in the output layer. Moreover,to tune the states of the SOT-MRAMs in the neurons accordingto requirements mentioned Section II.A, the BL and SL controlsignals for the neuron are set to VDD and VSS, respectively,as shown in Fig. 4 (a).In the inference phase, the BL and SL control signals arein high-impedance (Hi-Z) state, and read word line (RWL)and WWL control signals are connected to VDD and GND,respectively. This will stop the write operation in synapses, ig. 4. An n × m SOT-MRAM based single-layer perceptron. (a) The training path, and (b) inference path.TABLE IIT
HE REQUIRED SIGNALING TO CONTROL THE PROPOSED
SOT-MRAM
BASED PERCEPTRON ARRAY . Operation WWL RWL BL SL IN
Training W i = +1 VDD GND VDD GND Hi-Z W i = − VDD GND GND VDD Hi-ZInference GND VDD Hi-Z Hi-Z VIN and generate I + and I − currents shown in Fig. 4 (b). Theamplitude of produced currents depends on the input ( IN )signals and the resistances of SOT-MRAM synapses. Eachrow includes a shared differential amplifier, which generatesan output voltage proportional to (cid:80) i ( I + i,n − I − i,n ) for the n th row, where i is the total number of nodes in the inputlayer. Finally, the outputs of the differential amplifiers areconnected to the SOT-MRAM based sigmoidal neurons. Theentire inference operation occurs in parallel and in a singleclock cycle. The required signaling to control the training andinference operations is listed in Table II. One of the mainadvantages of the proposed SOT-MRAM based perceptronarchitecture is that it can be readily concatenated to form anMLP classifier, which can still operate in a single clock cycleas it will be shown in Section V.IV. H ARDWARE - AWARE L EARNING M ECHANISM
To train the proposed SOT-MRAM based MLP classifier,a hardware-aware learning mechanism should be developedwhich incorporates the characteristics and limitations of ourSOT-MRAM based neurons and synapses. Herein, we usea two-stage teacher-student approach, in which both teacherand student networks have identical topologies. Table IIIprovides the notations and descriptions for teacher and studentnetworks, in which x is the input of the network and y i and o i are the input and output of the i th neuron, respectively.To incorporate the features of the SOT-MRAM basedsynapses and neurons within our training mechanism, we havemade two modifications to the approaches previously used fortraining binarized neural networks [7], [13]. First, we haveused binarized biases in the student networks instead of real-valued biases. Second, since our SOT-MRAM neuron real- TABLE IIIT
HE NOTATIONS AND DESCRIPTIONS OF THE PROPOSED LEARNINGMECHANISM FOR THE
SOT-MRAM
BASED
MLP.Teacher Network Student NetworkWeights W i ∈ R W i ∈ {− , +1 } Biases B i ∈ R W i ∈ {− , +1 } Transfer Function y i = w i x + b i y i = w i x + b i Activation Function o i = sigmoid ( − y i ) o i = sigmoid ( − y i ) izes real-valued sigmoidal activation function ( sigmoid ( − x ) )without any computation overheads, we could avoid binarizingthe activation functions and reduce the possible informationloss in the teacher or student networks [7]. Herein, after eachweight update in the teacher network we clip the real-valuedweights within the [ − , interval, and then use the belowdeterministic binarization approach to binarize the weights: W ij = (cid:40) +1 , ¯ w ij ≥ ∆ B − , ¯ w ij < ∆ B (3) where ∆ B = 0 is threshold parameters for binarized weights.Finally, once all the binarized weights are trained we will usea mapping mechanism to convert them to resistive states inSOT-MRAM based synapses according to Section II.B. It isworth noting that, the stochastic binarization [13] scheme canalso be used to quantize the weights and biases. However,the stochastic rounding approach exhibits its advantages indeeper neural networks which are not the focus of this paper.In fact, we initially leveraged stochastic rounding in oursimulations and while the training times were approximately10-fold longer, the obtained accuracy values were comparableto those realized by the deterministic rounding approach.V. S IMULATION R ESULTS
A. Circuit-Level Simulation of SOT-MRAM based Neuron
Herein, we used the SPICE circuit simulator to measurethe power consumption of our proposed SOT-MRAM basedsigmoid neuron. The results obtained show the average powerconsumption of µW for the SOT-MRAM based sigmoidneuron. Moreover, the layout design of the proposed neuron ABLE IVP
ERFORMANCE COMPARISON FOR VARIOUS ANALOG SIGMOIDAL NEURONIMPLEMENTATIONS .[14] [15] Proposed HereinPower Consumption 7.4 × × × Area Consumption 10 × × × Power-Area Product 74 × × × circuit shows an area consumption of λ × λ , in which λ is a technology-dependent parameter. Herein, we used the14nm FinFET technology, which leads to the approximate areaconsumption of . µm . Table IV provides a comparison be-tween our SOT-MRAM based sigmoidal neuron and previouspower- and area-efficient analog neurons [14], [15].To provide a fair comparison in terms of area and powerdissipation, we have utilized the general scaling method [16]to normalize the power dissipation and area of the designslisted in Table IV. Voltage and area scale at different rates of U = . V DD x and S = nmtech − node , respectively, where V DD x and tech − node are the nominal voltage and technology nodeused in the studied neuron designs. It shall be noted that weused 0.8 (V) nominal voltage and 14nm FinFET technologyin our design. Moreover, power and area consumption valuesare scaled with respect to /U and /S , respectively [16].The results obtained exhibit that the proposed SOT-MRAMbased neuron achieves significant area reduction, while realiz-ing comparable power consumption compared to the existingpower- and area-efficient analog neuron implementations. Thisleads to a × and × reduction in power-area product com-pared to the designs introduced in [14] and [15], respectively. B. Architecture-level Simulation
Herein, we developed a Python-based simulation frameworkbased on [17] to realize the SPICE implementation of our SOT-MRAM based MLP classifier. Fig. 5 (a) depicts the circuit re-alization of a × × SOT-MRAM based MLP classifier.A comparison between the MNIST [18] classification accuracyof SOT-MRAM MLP classifier and conventional real-valuedand binarized MLP architectures is shown in Fig. 5 (b). Theresults show a comparable maximum classification accuracyof 86.54% and 85.56% in the first 10 epochs for binarizedand SOT-MRAM based MLP classifiers, respectively.Moreover, Table V provides a comparison between theanalog IMC-based MLP classifier proposed herein and varioushardware implementations of a × × binarized MLParchitecture. As listed in the table, our analog MLP classifiercompletes the recognition task in a single clock cycle, whilea highly-parallel digital implementation on GPU and a high-performance mixed-signal IMC architecture require at least and clock cycles, respectively, to complete the similartask. It is worth noting that digital CPU or GPU implemen-tations can support higher clock frequencies as listed in thetable. However, the difference between total clock cycles is solarge that our analog IMC realization can still achieve at leastfour and five orders of magnitude performance improvementcompared to GPU and CPU implementation, respectively. Fig. 5. (a) The × × SOT-MRAM based MLP circuit, (b) Accuracycomparison for MNIST application using a × × MLP.TABLE VP
ERFORMANCE COMPARISON AMONG VARIOUS IMPLEMENTATIONS OFTHE BINARIZED × × MLP
CLASSIFIER .Architecture Domain Frequency(GHz) Total ClocksMAC Act. Func.CPU Digital Digital 3.7 (1) − (*) GPU Digital Digital 1.35 (2) − (*) IMC [1] Digital Digital 0.667 (3) − (*) IMC [5] Analog Digital 0.2-0.667 (4) − (*) Proposed Here Analog Analog 0.2-0.667 (4) (1) Implemented on Intel Core i9-10900X. (2)
Implemented on NVIDIA GeForce RTX 2080 Ti. (3)
Not reported in [1]. Estimated according to Everspin STT-MRAM. (4)
Not reported in [5]. Clock frequency is expected to be reduced due tothe parasitic effects in analog domain. (*)
We have reported a range instead of exact values to compensate forpossible variations in users’ programming and implementation skills.
VI. C
ONCLUSION
In this paper, we proposed a power- and area-efficientSOT-MRAM based sigmoidal neuron, which was leveragedalong-with SOT-MRAM based binary synapses to construct ananalog IMC architecture for MLP classifiers. The developedneuron and synapse bitcells could be implemented withinthe same memory subarray, enabling a single-cycle operationfor the analog IMC-based MLP architecture while removingthe need for signal conversion units. We implemented a × × SOT-MRAM based MLP using the SPICE circuitsimulator and compared its performance with various hardwarerealizations of an MLP classifier. The results exhibited atleast two orders of magnitude increase in the processingspeed of our analog IMC architecture compared to the highestperformance MLP classifier implemented on a mixed-signalanalog/digital IMC architecture.
EFERENCES[1] S. Angizi, Z. He, A. Awad, and D. Fan, “Mrima: An mram-based in-memory accelerator,”
IEEE Transactions on Computer-Aided Design ofIntegrated Circuits and Systems , pp. 1–1, 2019.[2] D. Ielmini and H.-S. P. Wong, “In-memory computing with resistiveswitching devices,”
Nature Electronics , vol. 1, no. 6, pp. 333–343, 2018.[3] J. Ahn, S. Hong, S. Yoo, O. Mutlu, and K. Choi, “A scalable processing-in-memory accelerator for parallel graph processing,” in
Proceedings ofthe 42nd Annual International Symposium on Computer Architecture ,ser. ISCA ’15. New York, NY, USA: Association for ComputingMachinery, 2015, p. 105–117.[4] P. Chi, S. Li, C. Xu, T. Zhang, J. Zhao, Y. Liu, Y. Wang, and Y. Xie,“Prime: A novel processing-in-memory architecture for neural networkcomputation in reram-based main memory,” in
Proceedings of the 43rdInternational Symposium on Computer Architecture , ser. ISCA ’16.IEEE Press, 2016, p. 27–39.[5] K. Spoon, S. Ambrogio, P. Narayanan, H. Tsai, C. Mackin, A. Chen,A. Fasoli, A. Friz, and G. W. Burr, “Accelerating deep neural networkswith analog memory devices,” in , 2020, pp. 1–4.[6] S. Resch, S. K. Khatamifard, Z. I. Chowdhury, M. Zabihi, Z. Zhao, J.-P.Wang, S. S. Sapatnekar, and U. R. Karpuzcu, “Pimball: Binary neuralnetworks in spintronic memory,”
ACM Trans. Archit. Code Optim. ,vol. 16, no. 4, Oct. 2019.[7] M. Rastegari, V. Ordonez, J. Redmon, and A. Farhadi, “Xnor-net:Imagenet classification using binary convolutional neural networks,”in
Computer Vision – ECCV 2016 , B. Leibe, J. Matas, N. Sebe, andM. Welling, Eds., 2016, pp. 525–542.[8] O. Krestinskaya, A. P. James, and L. O. Chua, “Neuromemristive circuitsfor edge computing: A review,”
IEEE Transactions on Neural Networksand Learning Systems , vol. 31, no. 1, pp. 4–23, 2020.[9] Y. Zhang, W. Zhao, Y. Lakys, J. O. Klein, J. V. Kim, D. Ravelosona, andC. Chappert, “Compact modeling of perpendicular-anisotropy cofeb/mgo magnetic tunnel junctions,”
IEEE Transactions on Electron Devices ,vol. 59, no. 3, pp. 819–826, March 2012.[10] L. Liu, C. Pai, Y. Li, H. W. Tseng, D. C. Ralph, and R. A. Buhrman,“Spin-torque switching with the giant spin hall effect of tantalum,”
Science , vol. 336, no. 6081, pp. 555–558, 2012.[11] R. Zand, A. Roohi, and R. F. DeMara, “Energy-efficient and process-variation-resilient write circuit schemes for spin hall effect mramdevice,”
IEEE Transactions on Very Large Scale Integration (VLSI)Systems , vol. 25, no. 9, pp. 2394–2401, Sep. 2017.[12] R. Zand, A. Roohi, D. Fan, and R. F. DeMara, “Energy-efficientnonvolatile reconfigurable logic using spin hall effect-based lookuptables,”
IEEE Transactions on Nanotechnology , vol. 16, no. 1, pp. 32–43, Jan 2017.[13] M. Courbariaux, Y. Bengio, and J.-P. David, “Binaryconnect: Trainingdeep neural networks with binary weights during propagations,” in
Advances in Neural Information Processing Systems 28 , 2015, pp. 3123–3131.[14] G. Khodabandehloo, M. Mirhassani, and M. Ahmadi, “Analog imple-mentation of a novel resistive-type sigmoidal neuron,”
IEEE Transac-tions on Very Large Scale Integration (VLSI) Systems , vol. 20, no. 4,pp. 750–754, 2012.[15] J. Shamsi, A. Amirsoleimani, S. Mirzakuchaki, A. Ahmade,S. Alirezaee, and M. Ahmadi, “Hyperbolic tangent passive resistive-type neuron,” in , 2015, pp. 581–584.[16] A. Stillmaker and B. Baas, “Scaling equations for the accurate predictionof CMOS device performance from 180 nm to 7 nm,”
Integration ,vol. 58, pp. 74–81, 6 2017.[17] R. Zand, K. Y. Camsari, S. Datta, and R. F. Demara, “Composableprobabilistic inference networks using mram-based stochastic neurons,”
J. Emerg. Technol. Comput. Syst. , vol. 15, no. 2, Mar. 2019.[18] Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learningapplied to document recognition,”