[PDF] Spin-Hall MTJ Cells for Intra-Column Competition in Hierarchical Temporal Memory

Abstract

We propose a dedicated winner-take-all circuit to efficiently implement the intra-column competition between cells in Hierarchical Temporal Memory which is a crucial part of the algorithm. All inputs and outputs are charge-based for compatibility with standard CMOS. The circuit incorporates memristors for competitive advantage to emulate a column with a cell in a predictive state. The circuit can also detect columns 'bursting' by passive averaging and comparison of the cell outputs. The proposed spintronic devices and circuit are thoroughly described and a series of simulations are used to predict the performance. The simulations indicate that the circuit can complete a nine-cell, nine-input competition operation in under 15 ns at a cost of about 25 pJ.

Full PDF

GGENERIC COLORIZED JOURNAL, VOL. XX, NO. XX, XXXX 2019 1

Spin-Hall MTJ Cells for Intra-ColumnCompetition in Hierarchical Temporal Memory

Andrew W. Stephan and Steven J. Koester,

Fellow, IEEE

Abstract — We propose a dedicated winner-take-all cir-cuit to efﬁciently implement the intra-column competitionbetween cells in Hierarchical Temporal Memory which isa crucial part of the algorithm. All inputs and outputs arecharge-based for compatibility with standard CMOS. Thecircuit incorporates memristors for competitive advantageto emulate a column with a cell in a predictive state. Thecircuit can also detect columns ’bursting’ by passive av-eraging and comparison of the cell outputs. The proposedspintronic devices and circuit are thoroughly described anda series of simulations are used to predict the performance.The simulations indicate that the circuit can complete anine-cell, nine-input competition operation in under 15 nsat a cost of about 25 pJ.

Index Terms — Hierarchical Temporal Memory, Neuromor-phic Computing, Spintronics, Spin Hall, Magnetic TunnelJunction.

I. I

NTRODUCTION

Hierarchical Temporal Memory (HTM) is an emergingneuromorphic algorithm inspired by the structural propertiesof the neocortex [2]. HTM boasts powerful recognition andprediction abilities [3], [4]. The conceptual architecture isfairly complex, with many different functions required toimplement it. A comprehensive processor architecture hasbeen proposed for this purpose [5], [6]. HTM consists oftwo primary components, the spatial pooler and the temporalmemory. The spatial pooler consists of a set of columns withproximal connections to the input space. The input space is asparse distributed representation (SDR) of data in the formof a binary matrix. Each column activates upon receivinginput which exceeds its threshold value. The temporal memoryportion of HTM divides each column into multiple cells thatshare the same proximal connections and compete with oneanother to represent the column. Axial connections betweencells in different columns can give some cells a competitiveadvantage, but the proximal connections must still be solelyresponsible for surpassing the threshold.Implementing the full HTM structure in hardware is energy-expensive due to its complexity. The inclusion of dedicatedcircuitry capable of efﬁciently performing speciﬁc HTM-related tasks can reduce this load. This provides motivationto design a variable-threshold analog winner-take-all (WTA)circuit with competitive advantage. In this work we propose anefﬁcient spintronic implementation of a WTA circuit meant toemulate the cells within an HTM column. The circuit includes

Manuscript submitted July 20, 2020. This work was supported bySeagate Technology PLC. an option for competitive advantage so that certain cells canbe biased to win even when all of the competitors receivethe same input, which emulates the ’predictive state’ of HTM.We also consider how to detect a column ’bursting’ whichindicates that the threshold was exceeded but no cells were in apredictive state. In the following sections we will describe thedevice and circuit design, analyze its performance and explainour simulation methodology.In part the purpose of this work is to study the effect of thecompetitive advantage term on the operation of the spintronicWTA circuit, and determine which values should be used.This knowledge will guide future efforts, especially the choiceof memristive devices needed to induce the advantage. Thiswork does not deal with the overall HTM architecture, butfocuses speciﬁcally on the proposal for an efﬁcient implemen-tation of individual columns. Althrough spintronic elementsare involved, the input and output is voltage-based, whichallows the columns to be paired with any other charge-basedimplementation of the other HTM functions to emulate a fullHTM architecture. We use beyond-CMOS methods to developa WTA circuit in this work to explore the potential advantagesconferred by the inherently analog and non-linear nature ofcertain spintronic devices. As will be discussed below, wedetermine that the transition from CMOS to spintronic WTAimplementations results in a tradeoff between delay and energycost.

II. D

ESIGN

A. Spintronic Cell Design

The cell is based on a well-known device, the spin-Hall-effect (SHE) driven magnetic tunnel junction (MTJ) voltagedivider [7]–[11]. The particular version of SHE-MTJ we useis derived from [1], as the analog WTA circuit requires non-digital behavior from the MTJs. The MTJ free layer (FL) is incontact with a heavy metal (HM) which, when charge ﬂowsthrough it, produces a spin-polarized current which can beused to drive the FL. The conductance G MT J of the MTJvaries with the relative angle φ of the FL magnetization to thepinned layer (PL) as G MT J = 12 ( G P + G AP ) + 12 ( G P − G AP ) cosφ, (1)where G P and G AP are the conductance when the FL isparallel or antiparallel to the pinned layer, respectively. Theoutput potential of the voltage divider and ultimately that ofthe attached inverter [12] thus depend on φ . The equationsgoverning the cell dynamics will be covered in more detail a r X i v : . [ c s . ET ] J u l GENERIC COLORIZED JOURNAL, VOL. XX, NO. XX, XXXX 2019

Fig. 1. (a) Basic cell design including MTJ, reference resistor andoutput inverter. (b) Output potential vs. input current in steady-state. Twodifferent FL geometries are considered, with a step-like transition and asmooth transition. in Section IV. The attached inverter gives the cell a stablevoltage-based read path that avoids perturbing the voltagedivider (see Fig. 1). As in [1], we choose the anisotropiccharacteristics of the MTJ to generate a smooth linear responseloop, in this case by utilizing shape anisotropy. The MTJ FLwidth dimension is shorter than the length dimension, creatinga smooth transition due to the demagnetization ﬁeld. We assignthe PL orientation such that a negative current produces a spintorque in the parallel direction while positive current drives theFL in the antiparallel direction.

B. WTA Circuit Design

An HTM column consists of multiple cells that receivethe same proximal input and compete in a WTA fashion torepresent the column if the input exceeds some threshold. Thethreshold may be tuned by an input bias. In this work wedraw much of the WTA cell design from [1]. The workingsof individual cells is studied in detail in that work. Forthis application, we simplify the pooler circuit by removingthe second stage of each activation pair because the neuralactivation function is not needed. We assume the input spacetakes the form of a set of voltage sources. Current is providedto the cells by a memristor crossbar array joining the cells withthe input space. An example is shown in Fig. 2. The precisenature of the memristors is not treated here but many examplesexist in the literature including ﬁlamentary, MTJ-based andferroelectric memristors [14]–[17], any of which would besuitable for this purpose. Some architectures also incorporatean intermediary device which reads the input and transmits acorresponding signal to the cells. Besides the current from theproximal connections, each cell has an additional connectionfrom each of the output inverters of its neighbors. The resultis an inhibitory connection between each cell that inducesmore negative torque in the receiver cell as the magnetizationof the source cell becomes more positive. The strength ofthe inhibitory connections depends on the conductance of thememristor joining each output inverter to the HM input. Ahigher conductance gives the source cell a competitive advan-tage, which is measured as the ratio of the conductance to thatof the other cells. The complete WTA circuit design is shown

Fig. 2. An set of inputs connected to the column via an arrayof memristors. If the conductances in each row of the crossbar areidentical, each cell receives the same net input as is standard in HTM. in Fig. 3, where the crossbars are represented as simple currentsources and the inverters are represented by the standard circuitsymbol for brevity. The operating parameters are carefullychosen such that the low sensing potential V S on the voltagedivider and the inverter low rail potential − V DD are matched.The result is that the inhibitory output connections cease toprovide current when the source cell magnetization reachesthe − state. This allows the cells to achieve an equilibriumby balancing the excitatory proximal connections with theintra-column inhibitory connections. Example results for fourdifferent cases are given in Fig. 4. If all cells compete equallywhen receiving a negative input current sufﬁciently large toexcite them, they all achieve a similar steady-state outputbelow the value expected according to the proximal input butabove the minimum output. Alternatively if the competition isuneven due to a certain cell being in a predictive state, that celldrives the others to the minimum state while itself achievinga higher state. This result is achieved by giving the predictivecell a higher output conductance on its inhibitory connections.A detailed breakdown of the WTA circuit performance is givenin Section. III. III. R

ESULTS

There are two crucial questions that determine the successof this WTA circuit in its intended function for the HTMarchitecture. The ﬁrst is the question of whether it can emulatea predictive state via competitive advantage for one cell. Thesecond is the question of whether it can emulate a columnbursting. This situation occurs in an HTM when the proximalinput is large but none of the cells is in a predictive state.The discussion below resulted from a series of simulationsof the full 9-cell circuit using a custom simulator writtenin Matlab, which includes empirical approximations of theinverter behavior based on previous HSPICE simulations [1].

TEPHAN et al. : SPIN-HALL MTJ CELLS FOR INTRA-COLUMN COMPETITION IN HIERARCHICAL TEMPORAL MEMORY 3

Fig. 3. Winner-take-all circuit design. A column consists of multiplecells, each of which receives the same proximal input current. Each cellalso receives an inhibitory current from each of the other cells.Fig. 4. Outcomes for four different basic cases. Each case assumes a 9-cell column competing with identical inputs. When the input is insufﬁcientto excite the cells, all cells quickly reach a -0.5 V output and remainthere. When the input is sufﬁcient and the cells compete on equal terms,all cells achieve an equilibrium with one another at an above-minimumoutput. When the input is sufﬁcient and the cells compete on unequalterms, the cell with advantage quickly drives the others to the minimumoutput.

A. Predictive State

To determine whether the WTA circuit can succeed inemulating a predictive state, we performed a series of Monte-Carlo simulations and averaged the results. Each 100-roundensemble assumed a speciﬁc proximal input value and a com-petitive advantage, encoded as the ratio between the inhibitoryoutput conductance of the predictive cell and that of theother cells. In Fig. 5(a) we show the designated predictivecell output vs. the input current. The larger the competitiveadvantage is, the greater the winner output grows as a functionof input magnitude. Meanwhile in Fig. 5(b) we show that asthe competitive advantage grows, the other cell outputs shrink.Fig. 6 shows directly the difference between the winner celland the mean of the others, as the other cells deviate very

Fig. 5. (a) Predictive cell output vs. input current. (b) Average of othercell outputs vs. input current.Fig. 6. Difference between the predictive cell output and the averageof the other cell outputs for various inputs as a function of competitiveadvantage. The dashed line indicates the minimum required separationof 70 mV. little from one another. The dashed line indicates a differenceof 70 mV, which is the smallest signal difference which isstill large enough for a digital inverter to differentiate. Witha reference potential of V S + 35 mV, the inverter in Fig.7 can produce outputs that differ by 500 mV using inputsof V S and V S + 70 mV. Here we note that the choice ofcompetitive advantage can be used to determine the effectiveinput threshold for detection of a predictive state event. Inorder to noticeably differentiate even at weak excitation inputssuch as +1 µA (see Fig. 1), a competitive advantage of atleast 1.6 is required. This yields sufﬁcient output separationto differentiate the winner cell from the others. Alternatively,a low advantage of 1.1 can be chosen in order to enforce a0 µA threshold because at this advantage level only negativecurrents are shown to produce results which exceed V S + 70 mV. In general the current threshold which is enforced dependson the magnitude of the competitive advantage used. B. Bursting

To simulate a column going bust, we assume no cells arein the predictive state, which corresponds to a competitiveadvantage of 1. As shown in Fig. 4, the cells will all behaveidentically in the case of going bust, reaching a steady-stateslightly above the minimum output. This can be detected bymeasuring at least two outputs. We assume the mean output of

GENERIC COLORIZED JOURNAL, VOL. XX, NO. XX, XXXX 2019

Fig. 7. Inverter behavior based on HSPICE simulation with linearapproximation included.Fig. 8. Difference between the column average output the minimumcell potential vs. input current. The dashed line indicates the minimumrequired separation of 70 mV. all cells is measured, and the column is determined to be bustif the average exceeds some threshold but no single cell has alarge output. To estimate the detection capability, we assumethat this average output must exceed V S by at least 70 mV,sufﬁcient for a digital inverter to differentiate the signals asmentioned above. Fig. 8 shows the average potential differencevs. input. We note that if the input is below 0 µA , which wouldsufﬁce to excite the cells if not for the inhibitory connections,a passive averaging circuit can detect that the column has gonebust. We note that while V Avg − V S can be expected to exceed70 mV in cases with a single predictive cell, a simple digitalcomparison circuit can differentiate those cases from the onesin which all cells are partially excited. C. Energy Usage

The average time to complete a WTA function depends onthe competitive advantage of the predictive cell. If there is nopredictive cell–or no competitive advantage–then the processis quite fast, requiring about 3 ns. At most the WTA processtakes less than 60 ns to ﬁnish. If there is a predictive cell,then the process ﬁnishes more quickly if the advantage islarger, as shown in Fig. 9. This is because the additional

Fig. 9. Time and energy required to complete the WTA function vscompetitive advantage. Greater advantage leads to a faster solution. advantage causes the predictive cell to drive its competitorswith more current, suppressing them more swiftly. When thereis no competitive advantage, the process ﬁnishes most quicklysince the cells quickly reach an equilibrium. In this case thereis no need to wait for one cell to differentiate itself from therest by suppressing their outputs, as all cells behave nearly asone, barring noise. The energy consumption is also given inthat ﬁgure. The cost is at most 120 pJ for a nine-cell column,or about 13 pJ per cell. The relevant power calculations aregiven in Section IV C.

D. Process Variation

Here we consider the effects of process variation on theoutcome of the WTA cell in the predictive state case usingMonte-Carlo simulations. Each device in the simulated circuitis randomly assigned a set of parameters drawn from normaldistributions at the beginning of every round of simulation. Asbefore, each data point is the mean result of an ensemble ofat least 100 rounds. We consider four different independentvariables: transistor threshold, MTJ parallel resistance, MTJantiparallel resistance and MTJ base switching current I c ,which is the value of I at which H SHE equals H I (3-5). As in [1], Typical standard deviations of 5% for eachMTJ parameter were chosen after consulting [20], [21] anda transistor threshold deviation of 20 mV was selected basedon the Pelgrom plots in [22], assuming 200 nm gate width.Fig. 10 shows the output difference between the winner celland the others with competitive advantage as a parameter asin Fig. 6. Incorporating process variation makes it somewhatmore difﬁcult for the predictive cell to differentiate itself fromthe others. A competitive advantage of 2.0 is required when aweak proximal excitation of 1 µA is applied, compared to 1.6when ideal devices are used. Similarly an advantage of 1.2is required to enforce the 0 µA threshold as opposed to 1.1with ideal devices. While these differences are notable, theyindicate that the circuit is not prevented from proper func-tion by process variation so long as the modiﬁed advantagerequirements are applied. We also note that simulation of acolumn going bust with process variation incorporated showedno noticeable difference in results. TEPHAN et al. : SPIN-HALL MTJ CELLS FOR INTRA-COLUMN COMPETITION IN HIERARCHICAL TEMPORAL MEMORY 5

Fig. 10. Difference between the predictive cell output and the averageof the other cell outputs for various inputs as a function of competitiveadvantage. The dashed line indicates the minimum required separationof 70 mV. This simulation accounts for process variation in the transistorsand MTJs of the WTA circuit. TABLE IS

IMULATION P ARAMETERS

Symbol Quantity Value K crystalline anisotropy 10 kJ/m V ferromagnet volume 1800 nm M S saturation magnetization 1 MA/m α Gilbert damping 0.01 t HM heavy metal thickness 5 nm R HM heavy metal resistance ≈ Ω θ spin-Hall angle 0.3 [18], [19] V S low sensing voltage -0.5 V V S high sensing voltage 0.4 V R M MTJ RA product 8 Ω µm [12] T MR tunnel magnetoresistance ratio 1.5 [12] R R reference resistor 3.25 - 140 k Ω∆ t simulation time step 0.5 ps V DD inverter rail voltage ± τ inverter intrinsic delay 4.5 ps C g inverter gate capacitance 6.6 fF R T inverter on-resistance 10.8 k Ω IV. S

IMULATION M ETHODS

The physical parameters used in the simulation are availablein Table I. In choosing the magnetic saturation, weak in-planecrystalline anisotropy, MTJ resistance and TMR, we selectedvalues typical for spintronic devices after consulting [11], [12].

A. MTJs

The cell is simulated using the fourth-order Runge-Kuttamethod to predict the behavior of the circuit and magneticFL. The FL is treated using the macrospin approximation andthe Landau-Lifshitz-Gilbert (LLG) equation d ˆ m dt = − γµ (cid:16) ( ˆ m × H Eff ) − α (cid:0) ˆ m × ( ˆ m × H Eff ) (cid:1)(cid:17) , (2)where ˆ m indicates the unit magnetization of the FL and H Eff is the effective ﬁeld on the FL. The symbols γ, µ and α are the gyromagnetic ratio, vaccuum permeability and Gilbert damping respectively. Bold font indicates vector quantities.The effective ﬁeld consists of two terms, H Eff = H I + H SHE , (3)where H SHE is the effective SHE-ﬁeld and H I representsall the intrinsic ﬁeld terms. H SHE is proportional to the spincurrent I S [23], [24]: H SHE = 1 µ I S q (cid:126) αV M S ˆ y , (4)where (cid:126) is the reduced Planck constant. The spin current I S is in turn proportional to the charge current I ﬂowing throughthe HM layer: I S = θ t HM L F M I, (5)where θ is the spin-Hall angle. The intrinsic ﬁeld H I includesthe demagnetization and anisotropy ﬁelds as well as thethermal noise ﬁeld which consists of a multivariate Gaussianrandom variable with zero mean and variance σ T = (cid:115) k B T αγM S V ∆ t , (6)where k B , T , M S , V and ∆ t are the Boltzmann constant,temperature, magnetic saturation, FL volume and simulationtime step respectively. We account for this term as it can causenoise in the cell voltage readouts. B. Inverters

The voltage dividers provide the gate potential for theinverters and are treated with standard circuit equations whiletaking into account the changing RC gate delay due to thevarying MTJ resistance. The output of the inverter is treatedwith a ﬁrst-order approximation based on HSPICE simulationsusing the 16-nm node Predictive Technology Model [13] (seeFig. 7). The inverter gate width for the HSPICE simulationswas 200 nm. C. Energy calculations

The power consumed by the WTA circuit comes from threesources: inverter rail-to-rail leakage P I , crossbar input power P CB and voltage divider leakage P V D . Assuming nine cellsin a column with nine powered connections to the input spaceeach, there are nine voltage divider stacks and 162 inverters,so N V D = 9 and N I = 162 . Using HSPICE simulations toestimate the rail-to-rail leakage we ﬁnd that P I = 7 . µW .Assuming an average inhibition output resistance of R = 280 k Ω , each crossbar connection drains an average of P CB = ( E [ V In − V S ) ] R = 1 . µW . Finally, the voltage divider leakageis estimated as P V D = ( V S − V S ) R R + E [ R MTJ ] = 30 . µW . The overallWTA delay τ is estimated as the time at which the averageoutput comes to within 5 mV of its steady-state value. Thetotal energy cost is E = τ · (cid:0) N I · ( P I + P CB ) + N V D · P V D (cid:1) .The distribution of τ and E is shown in Fig. 9. We note ofcourse that the values in Fig. 9 vary depending on the numberof cells per column and the average number of connectionseach column has to the input space. GENERIC COLORIZED JOURNAL, VOL. XX, NO. XX, XXXX 2019

This work is comparable to [25]–[27] which describe purelyCMOS-based WTA implementations. Although the circuit inthis work consumes more energy per operation, it requiressigniﬁcantly less time per input set. We ascribe this differenceto the several additional layers of computation required bythe CMOS circuits which introduce more delay. However,spintronic WTA circuit involves more leakage current, whichaccounts for the increased energy cost despite its reducedruntime.

V. C

ONCLUSION

The HTM algorithm is a powerful recognition and pre-diction tool with the potential to revolutionize neuromorphicsystems. Each portion of the HTM algorithm that can beimplemented using efﬁcient dedicated circuits signiﬁcantlyreduces the overall computational burden. The intra-columndynamics are an important part of HTM, and we have demon-strated a novel spintronic circuit based upon spin-Hall MTJswith a simple design that can emulate these dynamics quicklyand efﬁciently. To the best of the authors’ knowledge, thereare no proposals for column circuits which more efﬁcientlyimplement the intra-column competition aspect of HTM. R EFERENCES [1] MAAP placeholder.[2] J. Hawkins and S. Blakeslee,

On Intelligence.

New York, NY, USA:Macmillan, 2007.[3] J. Xing, T. Wang, Y. Leng and J. Fu, “A Bio-Inspired Olfactory ModelUsing Hierarchical Temporal Memory,”

Proc. 5th Int. Conf. Biomed.Eng. Informat., pp. 923–927, 2012.[4] D. E. padilla, R. Brinkworth and M. D. McDonnell, “Perforamnce of aHierarchical Temporal Memory Network in Noisy Sequence Learning,”

Proc. IEEE Int. Conf. Comput. Int. Cybern., pp. 45–51, 2013.[5] A. M. Zyarah and D. Kudithipudi, “Neuromorphic Architecture for theHierarchical Temporal Memory,”

IEEE Trans. Emerg. Top. in Comp.Int., vol. 3, no. 1, Feb. 2019, DOI:10.1109/TETCI.2018.2850314[6] A. M. Zyarah and D. Kudithipudi, “Neuromemristive Archi-tecture of HTM with On-Device Learning and Neurogenesis”,arXiv:1812.10730v1, Dec. 2018, DOI:10.1145/3300971.[7] A. Sengupta and K. Roy, “Encoding Neural and Synaptic Function-alities in Electron Spin: A Pathway to Efﬁcient Neuromorphic Com-puting,”

Appl. Phys. Rev., vol. 4, no. 4, pp. 041105–1–25, Dec. 2017,DOI:10.1063/1.5012763.[8] D. Morris, D. Bromberg, J.-G. Zhu and L. Pileggi, “mLogic: Ultra-Low Voltage Non-Volatile Logic Circuits Using STT-MTJ Devices,”

Proceedings 49th DAC , pp. 486–491, Jun. 2012.[9] S. Datta, S. Salahuddin and B. Behin-Aein, “Non-Volatile Spin Switchfor Boolean and Non-Boolean Logic,”

Appl. Phys. Lett.,

IEEE J.Expl. Sol.-Stat. Computat. Dev. and Circ., vol. 2 pp. 36-43, Nov. 2016,DOI:10.1109/JXCDC.2016.2633251.[11] W. Kang, Z. Wang, Y. Zhang, J.-O. Klein, W. Lv and W. Zhao,“Spintronic Logic Design Methodology Based on Spin Hall Effect-Driven Magnetic Tunnel Junctions”,

J. Phys. D: Appl. Phys., vol. 49,pp. 065008–1–11, Jan. 2016, DOI:10.1088/0022-3727/49/6/065008.[12] J.-G. Zhu and C. Park, “Magnetic Tunnel Junctions,”

Mater. Today, vol.9, no. 11, pp. 36–45, Nov. 2006, DOI:10.1016/S1369-7021(06)71693-5.[13]

Predictive Technology Model.

Accessed:Jul. 25, 2017. [Online]. Avail-able: http://ptm.asu.edu[14] D. Fan, Y. Shim, A. Raghunathan and K. Roy, “STT-SNN: A Spin-Transfer-Torque Based Soft-Limiting Non-Linear Neuron fro Low-Power Artiﬁcial Neural Networks,”

IEEE Trans. Nano., vol. 14, no. 6,pp. 1013–1012, Nov. 2015, DOI:10.1109/TNANO.2015.2437902.[15] S. H. Jo, T. Chang, I. Ebong, B. B. Bhadviya, P. Mazumder and W. Lu,“Nanoscale Memristor Device as Synapse in Neuromorphic Systems,”

NanoLett., vol. 10, no. 4, pp. 1297–1301, DOI:10.1021/nl904092h. [16] T. Li, S. Duan, J. Liu and L. Wang, “An Improved Design of RBF NeuralNetwork Control Algorithm Based on Spintronic Memristor CrossbarArray,”

Neural Comput. & Appl., vol. 30, no. 6, pp. 1939–1946, Sep.2018, DOI:10.1007/s00521-016-2715-8.[17] M. Jerry, P.-Y. Chen, J. Zhang, P. Sharma, K. Ni, S. Yu and S. Datta,“Ferroelectric FET Analog Synapse for Acceleration of Deep NeuralNetwork Training,” pp.6.2.1–6.2.4, Dec. 2017, DOI:10.1109/IEDM.2017.8268338.[18] C.-F. Pai, L. Liu, Y. Li, H. W. Tseng, D. C. Ralph and R. A. Buhrman,“Spin Transfer Torque Devices Utilizing the Giant Spin Hall Effect ofTungsten,”

Appl. Phys. Lett.,

Science, vol. 336, pp. 555–558 (2012).[20] W. Kang, L. Zhang, J.-O. Klein, Y. Zhang, D. Ravelosona, W. Zhao,“Reconﬁgurable Codesign of STT-MRAM Under Process Variations inDeeply Scaled Technology,”

IEEE Trans. Elec. Dev., vol. 62, no. 6, pp.1769–1777, Jun. 2015, DOI:10.1109/TED.2015.2412960[21] P. Wang, E. Eken, Z, W. Zhang, R. Joshi, R. kanj and Y. Chen, “AThermal and Process Variation Aware MTJ Switching Model and itsApplications in Soft Error Analysis,”

More than Moore Technologiesfor Next Generation Computer Design,

Chapter 5, pp. 101–125, SpringerNew York, 2015, DOI:10.1007/978-1-4939-2163-8.[22] M. D. Giles, N. Arkali Radhakrishna, D. Becher, A. Kornfeld, K.Maurice, S. Mudanai, S Natarajan, P. Newman, P. Packan and T. Rakshit,“High Sigma Measurement of Random THreshold Voltage Variation in14nm Logic FinFET technology,”

Proc. of VLSI Technology, pp. T150–T151, Aug. 2015, DOI:10.1109/VLSIT.2015.7223657.[23] D. C. Ralph and M. D. Stiles, “Spin Transfer Torques,”

J.Magn. Magn. Mater., vol. 320, pp. 1190-1216, Apr. 2008,DOI:10.1016/j.jmmm.2007.12.019.[24] W.H. Butler, T. Mewes, C. K. A. Mewes, P. B. Visscher, W. H.Rippard, S. E. Russek and R. Heindl, “Switching Distributions forPerpendicular Spin-Torque Devices Within the Macrospin Approxima-tion,”

IEEE Trans. Mag., vol. 48, no. 12, pp. 4684–4700, Dec. 2012,DOI:10.1109/TMAG.2012.2209122.[25] S. Ramakrishnan and J. Hasler, “Vector-Matrix Multiply and Winner-Take-All as an Analog Classiﬁer,”

IEEE Trans. VLSI Syst., vol. 22, no.2, pp. 353–361, Feb. 2014.[26] T. Kulej and F. Khateb, “Sub 0.5-V Bulk-Driven Winner Take All Circuitbased on a New Voltage Follower,”

Analog Int. Circ. and SIg. Proc., vol.90, no. 3, pp. 687–691, 2017.[27] Y.-C. Hung, B.-D. Liu and C.-Y. Tsai, “1-V Bulk-Driven CMOS AnalogProgrammable Winner-Takes-All Circuit,”