[PDF] Quantitative Evaluation of Hardware Binary Stochastic Neurons

Abstract

Recently there has been increasing activity to build dedicated Ising Machines to accelerate the solution of combinatorial optimization problems by expressing these problems as a ground-state search of the Ising model. A common theme of such Ising Machines is to tailor the physics of underlying hardware to the mathematics of the Ising model to improve some aspect of performance that is measured in speed to solution, energy consumption per solution or area footprint of the adopted hardware. One such approach to build an Ising spin, or a binary stochastic neuron (BSN), is a compact mixed-signal unit based on a low-barrier nanomagnet based design that uses a single magnetic tunnel junction (MTJ) and three transistors (3T-1MTJ) where the MTJ functions as a stochastic resistor (1SR). Such a compact unit can drastically reduce the area footprint of BSNs while promising massive scalability by leveraging the existing Magnetic RAM (MRAM) technology that has integrated 1T-1MTJ cells in ~Gbit densities. The 3T-1SR design however can be realized using different materials or devices that provide naturally fluctuating resistances. Extending previous work, we evaluate hardware BSNs from this general perspective by classifying necessary and sufficient conditions to design a fast and energy-efficient BSN that can be used in scaled Ising Machine implementations. We connect our device analysis to systems-level metrics by emphasizing hardware-independent figures-of-merit such as flips per second and dissipated energy per random bit that can be used to classify any Ising Machine.

Full PDF

QQuantitative Evaluation of Hardware Binary Stochastic Neurons

Orchi Hassan, Supriyo Datta, and Kerem Y. Camsari School of Electrical and Computer Engineering, Purdue University, West Lafayette, IN 47906 USA Department of Electrical and Computer Engineering, University of California, Santa Barbara, Santa Barbara, CA 93106,USA

Recently there has been increasing activity to build dedicated Ising Machines to accelerate the solution of combinatorialoptimization problems by expressing these problems as a ground-state search of the Ising model. A common theme ofsuch Ising Machines is to tailor the physics of underlying hardware to the mathematics of the Ising model to improvesome aspect of performance that is measured in speed to solution, energy consumption per solution or area footprintof the adopted hardware. One such approach to build an Ising spin, or a binary stochastic neuron (BSN), is a compactmixed-signal unit based on a low-barrier nanomagnet based design that uses a single magnetic tunnel junction (MTJ)and three transistors (3T-1MTJ) where the MTJ functions as a stochastic resistor (1SR). Such a compact unit candrastically reduce the area footprint of BSNs while promising massive scalability by leveraging the existing MagneticRAM (MRAM) technology that has integrated 1T-1MTJ cells in ∼ Gbit densities. The 3T-1SR design however can berealized using different materials or devices that provide naturally ﬂuctuating resistances. Extending previous work, weevaluate hardware BSNs from this general perspective by classifying necessary and sufﬁcient conditions to design a fastand energy-efﬁcient BSN that can be used in scaled Ising Machine implementations. We connect our device analysisto systems-level metrics by emphasizing hardware-independent ﬁgures-of-merit such as ﬂips per second and dissipated energy per random bit that can be used to classify any Ising Machine.

I. INTRODUCTION

In the era of internet of things (IoT), combinatorial opti-mization problems are ubiquitous . In fact, most of the real-problems that quantum computers are aiming to solve canbe formulated as combinatorial optimization problems.Fromdirecting trafﬁc ﬂow , to routing interconnections in inte-grated circuit design , to making ﬁnancial decisions , drugdiscoveries , etc. - all involve solving a form of combinatorialoptimization problems. The demand for solving these prob-lems faster and more efﬁciently is ever-increasing. But suchproblems typically fall into the category of NP-hard or NP-complete class in complexity theory , with no known poly-nomial time solution, making them notoriously difﬁcult tosolve in digital computers using traditional computing meth-ods. This has made the making way for a new paradigm incomputing: Ising computing. Ising computing maps combi-natorial optimization problems to an Ising model, and solvesit by searching for the ground state of the system describedby : E = − N ∑ i , j = J i j m i m j − N ∑ i = h i m i (1)where, m denotes the Ising spin, J is the coupling co-efﬁcientand h is the external bias. In the machine learning ﬁeld, thesame underlying principle is used to for Boltzmann Machines.The binary stochastic neurons (BSNs) of stochastic neuralnetworks are well suited to function as a ‘spin’ is such sys-tems, described mathematically by: m i = sgn [ tanh ( I i ) − r i ] (2)where r i is a random number between + −

1, and I i = − ∂ E / ∂ m i is the input to the neuron.Given the importance of optimization problems, a lot ofresearch has gone into developing algorithms and identify-ing appropriate hardware for Ising computing. Various ap-proaches including quantum computers based on quantum Ising Machines

Search for ground-state mconfiguration minimizing E

𝐸 = −12෍ 𝑖=1 𝑁 ෍ 𝑗=1 𝑁 𝐽 𝑖𝑗 𝑚 𝑖 𝑚 𝑗 − ෍ 𝑖=1 𝑁 ℎ 𝑖 𝑚 𝑖 E Spin Configuration (Total: 𝑁 ) Ising Problem

J = 0 J J −J −J −J J J −J −J −J h h h coupling coefficient bias m: binary spin 𝑚 𝑚 𝑚 𝑚 J J J J J ℎ ℎ ℎ ℎ J Memristive Crossbar Array

Combinatorial Optimization Problems 𝒎 mapped dedicated hardware (a)(b)

3T - 1MTJ BSN V IN V OUT

FIG. 1. 1MTJ-3T compact BSN hardware which utilizes the naturalphysics of low-barrier nanomagnets holds the promise to acceleratethe simulated annealing processors. annealing (QA) or adiabatic quantum optimization (AQC)implemented with superconducting circuits , coherent Isingmachines (CIMs) implemented with laser pulses , phase-change oscillators , or CMOS oscillators and digital an-nealers based on simulated annealing (SA) implementedwith digital circuits are being explored.In this paper we comprehensively evaluate and characterize a r X i v : . [ c s . ET ] J a n a stochastic magnetic tunnel junction (sMTJ) based realiza-tion of the Ising spin (eqn.2) where random numbers are gen-erated using the natural physics of low barrier nanomagnets in a compact design. A network of these BSN units can becoupled with a memristive crossbar array to perform thesynaptic operation as shown in Fig. 1 can drastically improvethe area requirements and accelerate computation speed ofIsing Machines. We evaluate the performance of the BSNdevice in terms of its energy and delay metrics and connectthese to the problem and substrate-independent metric of ﬂipsper second that the probabilistic system makes .Our evaluation of 1MTJ-3T BSN design considers differenttypes of low-barrier nanomagnet realizations of MTJs. As theMTJ essentially functions as a two-terminal stochastic resistor(SR), we ﬁrst take a general 3T-1SR design approach, clas-sifying necessary and sufﬁcient conditions for achieving theBSN operation for different types of SRs in Section II. We re-late these conditions to the different sMTJ realizations in Sec-tion III. We report the timescale of operation, power and en-ergy for each case based on benchmarked SPICE simulationsof the BSN hardware consisting of spintronic elements from amodular circuit framework coupled to 14nm FinFET PTMmodels , and provide analytical results for relevant quanti-ties in Section IV. Lastly, we use these device performancemetrics to project onto hardware performance ﬁgures of meritsuch as ﬂips per second that a probabilistic sampler makes.Our projections indicate orders of magnitude improvementpotential over current digital implementations. II. GENERAL APPROACH TO DESIGN OF BSN

Binary stochastic neurons (BSNs) are well suited to func-tion as a ‘spin’ in Ising machines for solving combinatorialoptimization problems . A compact and efﬁcient hard-ware realization of the BSN leveraging the natural physics ofstochastic nanomagnets can be made by using unstable mag-netic tunnel junctions (MTJs) as shown in Fig. 1.The compact design of BSN based on low-barrier magnet(LBM) stochastic MTJs (sMTJs) was ﬁrst proposed in 2017 .Using magnet and circuit physics to analyze the performance,it was reported that using an LBM in a circular disk geometrywith energy barriers below k B T as the free layer of an MTJresults in sub-ns response times requiring only ∼ a few fJ ofenergy per random bit . The proposed design and the perfor-mance analysis considers a very speciﬁc type of sMTJ whichhad circular in-plane magnetic anisotropy (IMA) whose ﬂuc-tuations are undisturbed by the current in the circuit for typicalcurrent drive conditions. However, in 2019, a version of theBSN design that was implemented in hardware to solve an8-bit factorization problem , consisted of an sMTJ with per-pendicular anisotropy (PMA) and a barrier of a few k B T as itsfree layer. Unlike the circular in-plane design, the PMA de-sign relied on its resistance being tunable by the spin-transfer-torque effect in order to achieve the BSN operation. This hascalled for an extension of our initial analysis presented in which we systematically perform in this paper.As the MTJs in the BSN circuit effectively act as a ﬂuctu-ating resistor, R and the design principle is independent of this realization, for establishing the fundamental design ruleswe approach it from a general perspective and we hope thesedesign rules stimulate discussion in the realization of differentstochastic resistors that use different mechanisms . A. Types of ﬂuctuating resistances

We categorize the ﬂuctuating R into four types. First basedon the ﬂuctuating nature it can be continuous or bipolar (tele-graphic). Second, it can be tunable or non-tunable dependingon whether it is affected by the current that is ﬂowing throughit. (b) Current Response I I ~3I (a) Fluctuation Nature R ρ ( R ) R ρ ( R ) (i) Continuous (ii) Bipolar(i) Non-tunable (ii) Tunable I ⟨ R ⟩⟨ R ⟩ I FIG. 2.

Categorizing Resistances: (a) Fluctuating nature: they canbe continuous or bipolar. The time dynamics and distribution areshown for each category. (b) Current-Tunability: The ﬂuctuationscould be unaffected by I or it could be a function of I as indicatedby their transfer characteristics. I is the current at the 50:50 pointwhere the resistance spends equal time in R P and R AP states. I isthe biasing current deﬁned as the slope of the (R vs I) curve at 50:50point. The pinning current is typically ∼ − I . A continuous resistor can have its resistance being anyvalue between [ R P → R AP ] while a bipolar resistor only as-sumes the two values R P and R AP as shown in Fig. 2(a). Thedistribution of continuous resistances can be of different typesas well. It can be uniform or follow slightly bimodal distri-bution in the case of an MTJ as shown in the ﬁgure. Differ-ent distributions typically result in different average R values,slightly bimodal or uniform distributions are better suited thanGaussian distributions for BSN realizations.The current I ﬂowing in the circuit can tune the probabilitydistribution of the resistance ﬂuctuations, and we call such re-sistors tunable resistors. When designing a BSN with currenttunable R, we need to know the current where ﬂuctuations areequal between the two extreme states (I ) and the currentrequired to pin the resistance to one of those states. An im-portant parameter in this case is the bias current I , which is Non-tunable Bipolar R Tunable Continuous RTunable Bipolar RNon-tunable Continuous R R p ap Specifications:

14 nm

FinFET I V DD Dsat

15 μA

FIG. 3.

Transfer Characteristics :

The BSN circuit is realized by coupling the ﬂuctuating resistor which is the physical realization of therandom variable r i in the BSN equation to an NMOS which provides the tunability, and then to an inverter which thresholds the output. Thefour types of resistances are coupled to a 14nm FinFET and the resistance parameters (based on experimental demonstrations of MTJs ) arechosen to match the transistor characteristics. All resistance types except for the bipolar non-tunable were able to achieve BSN operationfollowing eq. 2. To function as a BSN the bipolar resistances need some means of tuning their probability distribution. the slope of the R vs I curve at the 50-50 point. Typically, ∼ − current is required to pin the ﬂuctuating resistanceto one of its states. We will later provide analytical expres-sions for I for four cases of resistors that can be obtained byvarious MTJs (Fig. 9).Based on this analysis, we categorize the ﬂuctuating resis-tance into four types: Non-tunable continuous (NTC), Non-tunable bipolar (NTB), tunable continuous (TC) and tunablebipolar (TB). B. Performing the BSN function

We ﬁrst take a look at the transfer characteristics of the de-vice to see whether the four types of resistance can faithfullymimic BSN operation described by eqn.2. The ﬂuctuating R is a physical realization of the random variable r i , the NMOSacts as a constant current source that provides tunability, andthe inverter performs the sgn operation in eqn.2.Fig. 3, shows that while all other resistance types wereable to reproduce the desired sigmoidal average curve (cid:104) m i (cid:105) = tanh ( I i ) , the non-tunable bipolar resistor gives a staircase-like function instead.This is because of the ﬁxed delta func-tion like resistance distribution at the two extreme states (seeFig. 2(a)ii. As there is no continuity in the resistance distri-bution and no means of tuning the delta distribution itself, theBSN output ﬂuctuations are equal until either of the thresholdpoints are crossed, resulting in the stair-case like function.Mathematically, when the resistance is bipolar, it means r i is ±

1. So, for any input I i where | tanh ( I i ) | <

1, the output (cid:104) m (cid:105) is equal to zero. In ﬁg. 4(b), if we look at a simple invert-ible AND gate operation, it is evident that devices with 𝑚 𝑖 = sgn[tanh 𝐼 𝑖 − 𝒓 𝒊 [±𝟏]]𝑚 𝑖 = sgn[tanh 𝐼 𝑖 − 𝒓 𝒊 ] Transfer Characteristic p-bit : (a)

AND

Gate p-circuit : (b)

FIG. 4.

Non-tunable Continuous vs Bipolar Resistance : (a)Transfer Characteristics shows that while the continuous resistor re-sults in a sigmoidal output, the bipolar gives a stair-case like function.(b) The bipolar R is unable to follow the Boltzmann distribution ofthe invertible AND gate (description in ref. ). All states remainequally probable. stair-case like function cannot be used as BSN. This has beendemonstrated experimentally in ref. where a stable MTJwas used as a bipolar resistor whose distribution was tuned byan external ﬁeld. C. Parameter Dependence and Design Choices

Fig. 3 is created with a ﬁxed set of parameters for the resis-tor and coupled with a speciﬁc transistor technology, 14 nmFinFET models. In this section we explore how the transfercharacteristics are affected by different parameters of the re-sistors and FET characteristics and how to choose the rightcombination of R and FET to be coupled. Stochastic Region:

The stochastic region, which we deﬁnenext, is a function of the resistance ratio n for non-tunable re-sistors and biasing current I for tunable resistors as shown inFig.5, that needs to be matched with the transistor character-istics. Non-tunable Bipolar R Tunable Bipolar RNon-tunable Continuous R Tunable Continuous R

FIG. 5.

Effect of n and I : The stochastic region of the non-tunableresistances are determined by the resistance ratio n = R P / R AP , whilethe biasing current I of tunable resistances control the stochasticregion. For large biasing currents, the tunable resistors behave effec-tively like non-tunable resistances. Effect of n:

The resistance ratio n = R P / R AP is directly re-lated to the stochastic region ∆ v through the NMOS charac-teristics in case of non-tunable resistor designs. The edge ofthe stochastic region v ± is deﬁned by when V i = V DD / − [ I + R P , I − R AP ] ≈ ± is determined by theNMOS as shown in Fig. 6(c). For a desired ∆ v = v + − v − (stochastic region) and NMOS transistor, the required n = R AP / R P should approximately equal I + / I − . Ideally, the min-imum value of the resistance should be R P = ( V DD / ) / I + andto get full pinning, ∆ v should be less than V DD . For a 14nmFinFET, to get a stochastic region of ∆ v = − n should be around 2 −

50. The resistanceratio n is a measure for tunneling magneto-resistance, TMR( = ( n − ) × − with a maximum reportedTMR of 604% , so the resistance ratio of MTJs are wellwithin the desired range, but the general requirements we out-line should be applicable for other types of stochastic resistorsas well. Effect of I : In case of tunable resistances, the stochastic re- (a) (b)(c) (d) Δ𝑣 𝑣 + 𝑣 − Δ𝑣 𝑣 − 𝑣 + I + I p+ I − I p− 𝑣 − 𝑣 + Δ𝑣 FIG. 6.

Stochastic Region boundaries :

The stochastic regionboundaries [ v + , v − ] are set by different parameters for tunable andnon-tunable resistors. (a) Shows the BSN circuit with (b) the currenttransfer characteristics of the 14nm FinFET NMOS when V i ∼ V i ≈ = R AP / R P ≈ I + / I − . (d) Tunable R : Thestochastic range is determined by pinning current I P characteristicsof the resistance. The transfer characteristics of each stage in (c) and(d) indicates the stochastic range v + and v − and the relation to theNMOS characteristics in each case in (b). gion is independent of the resistance ratio and depends on thepinning current and thus the bias current ( I ± P ∝ I ) instead asshown in Fig. 6(d). For large bias currents ( I (cid:29) I ), the tunableresistances act essentially like non-tunable resistances. To getthe full range of R,the NMOS needs to be able to supply thepinning current. If the pinning current is ( − ) I as shown inFig. 2, then to get the full range of the resistance I + Pmax needsto be around ∼ ( − ) I . In case of 14nm FinFETs, I + max isaround ∼ µ A , restricting I to values less than 7 µ A . Choice of I : Another parameter that is important for theoperation of tunable resistors is the I which determines themidpoint of the sigmoid. I is the current at which the resis-tance on average spends equal time in R P and R AP states . Asthe circuit can only support positive current values, it needs tobe a positive quantity and preferably matched with the satu-ration point (V DS = V GS ) current I Dsat of the NMOS transis-tor. Changing I shifts the transfer characteristics laterally asshown in Fig. 7(a). R vs I:

One last requirement is that, for current tunable re-sistance with increasing current I , the resistance needs to in-crease from R P → R AP . This can be understood intuitively:Increasing I means the NMOS transistor is becoming moreconductive. If the MTJ concomitantly becomes more conduc-tive as I is increasing, the transfer characteristics can shownon-monotonic behavior as shown in Fig. 7(b). This require-ment holds true irrespective of whether the circuit’s R branchconsists of a PMOS-1R or 1R-NMOS topology. I → R AP R P R vs I (a) (b)

FIG. 7. (a)

Choice of I : I is ideally a positive quantity matchedwith the I Dsat of the transistor, changing I results in a lateral shiftof the sigmoid. (b) R vs I relationship: The output characteristicsalso depend on the nature of the resistance tunability with the circuitcurrent I . If R decreases with I (R AP → R P ), the opposing char-acteristics of the transistor current and resistance change result in anon-monotonic output. III. REALIZATION OF FLUCTUATING RESISTANCESWITH SMTJS

A magnetic-tunnel-junction (MTJ) whose free layer is alow-barrier magnet (LBM) could serve as a physical realiza-tion of ﬂuctuating resistors. Depending on the nature andcharacteristics of the LBM magnetization ﬂuctuations, we canget different types of R. Our previous analysis was restrictedto one type of LBM, the circular IMA with barrier < k B T, inthis section we extend it to include all possible LBMs.A general description of the energy associated with a mag-net is given by : E = H kp M s Ω ( − m x ) + H ki M s Ω ( − m z ) − ˆ H ext M s Ω · ˆ m (3)where, H kp = s / t − π M s is the perpendicular anisotropyﬁeld along the x-axis, K s is the surface anisotropy density, H ki is the in-plane anisotropy along z-axis, H ext is the externalﬁeld, M s is the saturation magnetization and Ω = π ( D / ) tis the volume of the magnet. By adjusting the thickness orthe shape of the magnet, the magnetic anisotropy of the mag-net can be scaled to behave like a low-barrier magnet .We use the stochastic LLG module from our spintronicslibrary to simulate the LBM dynamics. This model has been carefully benchmarked against general Fokker-Planck basedmethods . LBM Magnet Fluctuation Dynamics : By low-barrier mag-net we refer to magnets whose barrier is < k B T or so,whose magnetization ﬂuctuates randomly in presence of ther-mal noise. Interestingly, the magnetization dynamics of low-barrier magnets with barrier < k B T are different from thosewith a slightly higher barrier . The simple exponential de-pendence of retention time of the magnetization state on thebarrier height is not valid around or below k B T . (a) Circular IMA ( Δ < 𝑘 𝐵 𝑇 ): 𝐻 𝑘𝑝 ≈ −4𝜋𝑀 𝑠 ; 𝐻 𝑘𝑖 ≈ 0 (b) Isotropic ( Δ < 𝑘 𝐵 𝑇 ): 𝐻 𝑘𝑝 ≈ 0 (c) LBM IMA ( Δ~5𝑘 𝐵 𝑇 ): 𝐻 𝑘𝑝 ≈ −4𝜋𝑀 𝑠 ; 𝐻 𝑘𝑖 ≠ 0 (d) LBM PMA ( Δ~5𝑘 𝐵 𝑇 ): 𝐻 𝑘𝑝 ≠ 0 ~160𝑝𝑠~5𝑛𝑠~160𝑛𝑠~500𝑛𝑠 ( Δt = 1ps,T = 5μs )( Δt = 5ps,T = 5μs )( Δt = 50ps,T = 500μs )( Δt = 1ps,T = 1μs ) FIG. 8.

Low-barrier magnet ﬂuctuation dynamics:

We use thebenchmarked stochastic LLG module to simulate LBM dynamics.Each simulation is carried out with a time-step at least ×

100 smallerfor a time-duration × ∆ < k B T magnets have more continuous ﬂuctuations with (b) hav-ing a more uniform distribution than (a) while slightly higher barriermagnets have a more telegraphic ﬂuctuation. In both cases, the pres-ence of high demagnetization ﬁelds cause faster ﬂuctuations in IMAmagnets.

Fig. 8 shows the ﬂuctuation dynamics, the magnetizationdistribution, and the auto-correlation time ( τ CORR ) for lowbarrier magnets. Magnetization ﬂuctuations translate into re-sistance ﬂuctuations in MTJ, and we see that magnets withbarrier < k B T act like continuous resistances, while slightlyhigher barrier magnets, which have a more deﬁned two states,give telegraphic ﬂuctuations, and in both cases IMA magnetsﬂuctuate orders of magnitude faster than their PMA counter-parts due to a new mechanism where the demagnetization ﬁeldplays a central role . Current Response of LBM Magnets:

Magnetic ﬂuctua-tions can be tuned by spin-current. For high barrier magnets,the minimum current required to switch the magnetization iscalled the critical current , in case of low-barrier magnets,we refer to it as a biasing current, deﬁned by the inverse ofthe derivative taken at (cid:104) m (cid:105) =

0, mathematically expressed as:

R Type MTJ Free Layer 𝛕 𝐂𝐎𝐑𝐑 𝐈 𝐈 𝐇 Non-tunableContinuous

Δ < 𝑘 𝐵 𝑇 Circular IMA 𝛾 𝑀 𝑠 Ω 𝐻 𝐷 𝑘 𝐵 𝑇 ℏ 𝜋 𝐻 𝐷 𝑀 𝑆 Ω𝑘 𝐵 𝑇 𝐵 𝑇 𝑀 𝑆 Ω TunableContinuous

Δ < 𝑘 𝐵 𝑇 Isotropic ‘PMA’ ln(2) 1𝛾 𝑀 𝑠 Ω𝛼𝑘 𝐵 𝑇 6𝑞ℏ 𝛼𝑘 𝐵 𝑇 4𝑞𝛼ℏ 12 𝐻 𝑒𝑥𝑡 𝑀 𝑠 Ω 3𝑘 𝐵 𝑇𝑀 𝑆 Ω Non-tunable Bipolar 𝐵 𝑇 < Δ < 10𝑘 𝐵 𝑇 IMA ∝ 𝑒

Δ/𝑘 𝐵 𝑇 (1 + Τ𝐻 𝐷 𝑘 ) ℏ Δ 1 + 𝐻 𝐷 𝐾 ~𝐻 𝑘 Tunable

Bipolar 𝐵 𝑇 < Δ < 10𝑘 𝐵 𝑇 PMA ∝ 𝑒

ΤΔ 𝑘 𝐵 𝑇 ℏ 𝑒𝑥𝑡 𝑀 𝑠 Ω ~𝐻 𝑘 (sub-ns)~10 ns1 ns ~ 1 μs μs (0.1~1mA) (0.4~4 μ A)(0.5~25 μ A)(0.05~25 m A) (𝛼 < 0.1) (𝛼 < 0.1) FIG. 9. MTJ Free layer and its corresponding R type along with corresponding characteristic parameters and their analytical expression. Thenumbers in bracket indicates an approximate range of values for each parameter. The proportionality constant for correlation time of magnetswith ∆ > k B T is τ ∼ . − . I = ( (cid:104) m (cid:105) / I S ) − at low bias (I S ). The current required to pinthe magnetization, similar to switching current in high-barriermagnets is assumed to be ∼ − , as indicated in Fig. 2.IMA magnets have a much larger pinning current than PMAmagnets because of the large demagnetization ﬁeld presentdue to their disk shape , meaning transistors with muchlarger current ranges would be required for IMA magnet MTJsthan PMA for tunable resistors. (a) (b) FIG. 10. LBM response to spin-current with and without externalﬁelds for (a) circular IMA magnet (H ki ∼ , H kp ∼ − H D ) and (b)isotropic anisotropy magnet (H kp ∼ T = µ s , ∆ t = ∼

130 Oe and for isotropic magnet ∼

200 Oe.

An important thing to note here is the current tunability inpresence of an external ﬁeld which can arise, for example,due to the ﬁxed, stable layer that acts as a reference to thefree layer in the MTJ. In the case of high-barrier magnets,the spin-current induced magnetic switching hysteresis loopjust shifts in case of PMA magnets depending on the direc-tion of ﬁeld, but for IMA magnets the shape of the hysteresisand magnet dynamics is changed . The large demagnetiz-ing ﬁeld present perpendicular to the magnetization plane inIMA magnets causes the magnetization to precess around itwhen spin-current is applied in the opposite direction to theexternal ﬁeld. The same is observed in low-barrier magnets as shown in Fig. 10. The larger the external ﬁeld the more pro-nounced the effect is. The uniform precessional motion kicksin at high-ﬁeld, when the current is close to the biasing currentor higher applied in the opposite direction to the ﬁeld. Very re-cently, this has been observed experimentally for low ﬁelds .While this is an undesired effect in case of our BSN operation,this can be useful in context to oscillator based networks .This has important implications in terms of acting as a ﬂuc-tuating resistance in a BSN circuit. IMA magnets with exter-nal ﬁelds (i.e. uncompensated dipolar ﬁelds in MTJ ) greaterthan its pinning ﬁeld is not suited to function as a tunableor non-tunable resistor. IMA magnets with continuous mag-netization coupled to a transistor with small saturation cur-rent ( tens of µ A) compared to the biasing current of IMA( hundreds of µ A) can work as non-tunable resistors, and asexperimental observations in ref. suggest, it can withstandsmall (compared to its pinning ﬁeld) stray ﬁelds.PMA magnet MTJs with their small biasing current ( ∼ fewto few tens of µ A ) when coupled to typical transistors actas tunable resistors in BSN circuit. In this case the externalbias ﬁeld is actually preferred, since this enables positive I current .So, if we coupled an MTJ with a 14nm FinFET (V DD = . Dsat = µ A) , the table in Fig. 9 summarizes theresistance mapping and the associated parameters. IV. PERFORMANCE EVALUATION OF MTJ BASED BSN

In the ﬁnal section we compare the physical performanceof these different sMTJs in a BSN.

Timescale of Operation:

The two relevant timescales of op-eration for a BSN are, the correlation time τ C which is theaverage time it takes to produce new output at given inputand the response time τ N which is deﬁned as the average timeit takes for the circuit to give a random output with correctstatistics as the input is changed . Fig. 11 shows the twotimescales for the three types of ﬂuctuating resistances forMTJs with two different timescales. For simplicity we as-sumed the correlation time to be same for all types of magnets,but in reality they would follow the τ CORR relations indicatedin Fig. 9 . i Step response time 𝝉 𝑵 : time to give first random no.ii Correlation Time 𝝉 𝑪 : time to give new random no. (NTC) (TC) (TB) 𝜏 𝑁 𝜏 𝑁 𝜏 𝑁 (NTC) (TC) (TB) 𝜏 𝐶 𝜏 𝐶 𝜏 𝐶 FIG. 11. Timescale of Operation for each resistor type with two ﬂuc-tuation rates τ C ∼ [ ps , ps ] . The resistances are engineered tohave similar characteristic timescales but different ﬂuctuation behav-ior (tunable, non-tunable and continuous and bipolar ﬂuctuation) forcomparison purposes. Fig. 11 shows that the response time, τ N for non-tunable re-sistor is independent of the ﬂuctuation time of the resistance, itis rather proportional to the RC delay of the circuit. While forthe tunable cases, the response time is related to the character-istic timescales of the resistor. But the time to give new num-bers or ﬂip rate τ C at V IN = τ C ≈ τ CORR ). So for the tunablecase, the two said timescales of operation are likely to be simi-lar as they are governed by the magnet ﬂuctuation characteris-tics while for the non-tunable case, the response time which isRC dependent has the potential to be very short compared tothe magnet dependent correlation time. For most applicationsthis difference may not be of importance but for some appli-cations where the network is directed, like Bayesian inferencehaving two different timescales seems to be a requisite . Power:

Our SPICE simulations indicate that the averagepower consumed by the BSN circuit is (cid:104) P (cid:105) ≈ × V DD I Dsat32 .The 2 is for the two branches, the MTJ branch and the inverterbranch. This holds true for all types of resistors. For a 14nmFinFET with V DD = .

8V and I

Dsat ∼ µ A, (cid:104) P (cid:105) ∼ µ W.The MTJ branch power could be reduced by operating in sub-threshold region I

Dsub ∼ µ A, but this reduces the total powerby × . ×

10 increase in the RC re-sponse time. Given the ﬂexibility, it is preferable to designthe MTJ to operate in the saturation region of transistor. Fortunable case this means matching I ∼ I Dsat , for non-tunablethis means having (cid:104) R (cid:105) ≈ ( V DD / ) / I Dsat . Energy:

As there are two timescales associated with theBSN operation, we can deﬁne two energy as well, the en-ergy to give ﬁrst random number E N ∼ τ N (cid:104) P (cid:105) and the en-ergy expanded between producing a new random number E C = τ C (cid:104) P (cid:105) . Fig. 12(a) shows an energy delay plot indi- NTC TC TB

NTC TC TB

P ~20μW

CMOS processor f li p s p e r s ec ond ( Τ N τ ) FIG. 12. (a) Energy-Delay of each type of MTJ based BSN assum-ing an average power of 20 µ W and timescales in Fig. 10. (b) Plotsthe fps for different no. of neurons for each type of MTJs. For theprojections only BSN performance numbers are used, synapse wouldadd to the power and thus energy per ﬂip number. cating the ranges for each type of MTJs. When describingthe energy-delay performance of BSN instead of quoting twonumbers, we quote the larger number which is the correla-tion time τ C and the energy E C . The individual energy-delaynumbers can be used to project performance parameters forprocessors built with them. Hardware Projections:

Typically the performance of anIsing hardware is measured in terms of time and energy ittakes to solve a speciﬁc problem. Time to solution dependsnot only on the physical hardware performance but also onthe algorithm that is being implemented. Here, we empha-size measuring the hardware performance in terms of a purelyhardware metric ﬂips per second (fps) , which refers tothe maximum number of spin conﬁgurations the hardware cancycle through per second. It depends on the number of spinsin the system (N) and the time it takes for a spin to ﬂip ( τ ), f = N / τ .For the digital annealers the spin update time is usually de-termined by its clock period ( τ clk ) which ranges typically intens of ns range. To ensure ﬁdelity simultaneous updates ofconnected spins needs to be avoided forcing digital anneal-ers that operate on clock edge to update spins sequentially. Soin a network where all spins are connected effectively onlyone spin can update per clock cycle . But it need not beif some spins are unconnected (i.e. nearest neighbor , orking-graph connection, or if spins are parallelized by imple-menting special algorithms . Based on the reported totalspin number and clock speeds of digital annealing hardwaretoday which have about ∼

10K neurons that can update per ∼ ∼ / − = ﬂips per second as shownin Fig. 13.Compared to digital annealers the Ising spin hardware wepresented in this work can work autonomously, i.e, without asynchronizing clock or a sequencer . In this mode, thespeeds are governed by neuron ( τ neu ) and synapse ( τ syn ) timeonly, and to ensure ﬁdelity and avoid simultaneous updatesof connected BSNs the synapse needs to update faster thanthe the neuron ( τ syn < τ neu ). Sutton et. al. deﬁnes a met-ric s = τ syn / τ neu and shows that with s < Affiliates

BIFI Hitachi Fujitsu Tokyo Tech. UC Berkeley Purdue

Name

Janus II annealing machine Digital Annealer STATICA RBM-based Purdue-P (ApC)

Technology FPGA 40nm CMOS + FPGA 65nm CMOS 65 nm CMOS FPGA FPGA

Latest 2014 2019 2018 2020 2020 2020Connectivity Local (5,N-N) Local (8,King’s Graph) All-to-All All-to-All All-to-All Local (5,N-N)

Total Neurons, N Parallel Neurons N p N/2= 1,000 N/4= 7,500 1 N=512 N=150 N=8,100

Clock Frequency, f

250 MHz 100 MHz 100 MHz 320 MHz 70 MHz 125 MHz

Weight Precision

Neuron Time (MC step) 𝛕 = 𝟏/𝐟 flips per second ( N p /𝝉 ) ~2 × 10 ~2.5 × 10 [ r e p o r t e d ] [ d e r i v e d ] FIG. 13. ﬂips per second (fps) is a substrate and algorithm independent performance metric for simulated annealing processors much likethe ﬂops per second metric used for general purpose computers. It is a measure of how many ﬂips, and hence spin conﬁgurations the systemcan cycle through in a second. fps can be derived from the reported performance metrics of the processors following ref. . The reported andderived quantities as indicated. Current CMOS based annealing processors perform at ∼ fps. We project that MTJ based hardware canincrease by a few orders of magnitude. operation, the exact requirements are problem and architec-ture dependent. Memristive crossbar arrays paired with a fastsumming ampliﬁer synapse could operate very efﬁciently atas low as few tens of ps speeds .The digital annealers mimic the Ising spin using a combi-nation of random-number generators (LFSR, Xoshiro, etc.),look-up-tables (LUT) and comparators. The random numbergenerator (RNG) unit is one of the most are expensive ele-ments in the design . Even in the most optimized design,the RNG unit take up ∼

11% of the total logic gate area .The 3T-1MTJ design offers drastic reduction in the area foot-print, promising massive scalability leveraging existing 1T-1MTJ Magnetic RAM technology that already has 1Gbit inte-grated cells .Fig. 12(b) projects fps number considering τ ≡ τ neu ≈ τ CORR for different no of spins, N. An MTJ realization withcircular IMA, with ∼ ns timescale can offer almost two or-ders of magnitude speedup with <

10k neurons. If spins areimplemented in Gbit densities all stochastic implementationsseem to outperform the CMOS implementations. For suchsystems the upper bound for N is ultimately determined ei-ther by area or by power budget of the chip. Note that the fpsnumber does not reﬂect the connectivity of the spins or thealgorithm implemented by the hardware. It also does not indi-cate the solution accuracy obtainable for speciﬁc problems .What we highlight here is that using the natural physics ofthe MTJ we can design a very compact realization of eq. 2compared to current state of the art CMOS implementations,and despite being a magnetic circuit, low barrier magnet im-plementations even offer an overall speed up due to their fastﬂuctuation rates. V. CONCLUSION

In this paper, we presented a comprehensive evaluation ofnaturally stochastic magnetic building blocks for implement-ing probabilistic algorithms compactly and efﬁciently. Wegeneralized the proposed 1MTJ-3T design to a 1SR-3T de- sign and presented necessary design rules for BSN operationthat we hope will stimulate further interest in ﬁnding stochas-tic resistance (1SR) with suitable properties. We extended thephysical performance analysis of the 1MTJ-3T BSN design toinclude unstable MTJ’s with different low-barrier-magnets asfree layers. They are evaluated as physical realizations of thegeneral stochastic resistor (SR) with respect to 14nm FinFETtransistors. IMA magnets with barrier ≤ k B T proved to be thebest option, low-barrier PMA can function as current-tunableresistors as well. While careful optimization of the ﬁxed layerto cancel the stray ﬁelds in IMA MTJ is preferred, PMA canbeneﬁt from the presence of stray ﬁelds (can be a source ofthe I ). The most challenging set of working conditions areset for telegraphic IMA magnets, even if they are highly opti-mized and no stray ﬁelds are present in the circuit, they needto be coupled with high current transistors due to their highpinning currents, because if paired with low current transis-tors like 14nm FinFET results in a staircase-like functionalbehavior which does not work as a p-bit as we discussed.These BSNs are an integral part of Ising machines whichare often referred to as annealing processors. Using 1MTJ-3T BSN could speed up the operation of these processors byorders of magnitude. Another important application space forthese BSN is stochastic neural networks . In fact, binarystochastic neurons are desired for deep learning networks, butare typically avoided because it is harder to generate randombits in CMOS hardware . Use of this compact neuron thatrelies on MTJs natural physics to provide stochastic binariza-tion could accelerate computation in custom hardware byfaster evaluation of BSN function and also encourage algo-rithmic advancement using BSN. ACKNOWLEDGMENTS

This work was supported by the Center for ProbabilisticSpin Logic for Low-Energy Boolean and Non-Boolean Com-puting (CAPSL), one of the Nanoelectronic Computing Re-search (nCORE) Centers as task 2759.005, a SemiconductorResearch Corporation (SRC) program sponsored by the NSFthrough CCF 1739635.

Appendix A: Derivation for Pinning Field of LBM

Magnets are generally used to store information putting thefocus on the evaluating and predicting characteristics of stablehigh-barrier magnets. It is interesting to note that theoreticalpredictions and analytical derivations regarding low-barriermagnet ( ∆ ≤ k B T) dynamics typically receive less attentionas cases of ’least practical interest’ . We document the an-alytical expressions associated with LBM in Fig. 9. The ex-pressions for correlation time and biasing current can be foundin ref. , in this appendix we derive the bias ﬁeld.We derive the expressions for external magnetic ﬁeld H required to pin the magnetization of an LBM with ∆ ≤ k B There. We start from the energy expression for the magnet ( E )and derive the expressions presented in Fig. 9 from the steady-state average magnetization deﬁned by: (cid:104) m (cid:105) = (cid:90) θ = πθ = (cid:90) φ = πφ = − π sin θ d φ d θ m exp ( − E / k B T ) (cid:90) θ = π / θ = (cid:90) φ = πφ = − π sin θ d φ d θ exp ( − E / k B T ) (A1)where ( m x , m y , m z ) ≡ ( cos θ , sin θ sin φ , sin θ cos φ ) . a. Perpendicular Magnetic Anisotropy (PMA) In case of LBM with perpendicular magnetization, theanisotropy ﬁeld along x-axis H kp → E = − H ext M S Ω m x (A2)Evaluation eq. A1 wrt to this energy gives us: (cid:104) m x (cid:105) = coth ( H ext M S Ω / k B T ) − ( H ext M S Ω / k B T ) ≈ tanh ( H ext M S Ω / k B T ) . So to pin the magnetization toany of its state (cid:104) m x (cid:105) = ±

1, the required external ﬁeld forPMA magnets can be approximated by: | H ext ( PMA ) | = k B TM s Ω (A3) b. In-plane Magnetic Anisotropy (IMA) For LBM with in-plane magnets, the anisotropy ﬁeld alongz-axis H ki → D existsalong the z-axis which keeps the magnetization in-plane. Theenergy expression from eq. 1 in this case is : E = H D M S Ω m x − H ext M S Ω m z . (A4)Once again evaluating eq. A1 wrt to this energy for verylarge demagnetizing ﬁeld (H D → ∞ ) can be simpliﬁed to (cid:104) m z (cid:105) ≈ H ext M S Ω / k B T . So to pin the magnetization to anyof its state (cid:104) m z (cid:105) = ±

1, the required external ﬁeld for IMAmagnets can be approximated by: | H ext ( IMA ) | = k B TM s Ω (A5)The expression is independent of the demagnetization ﬁeld.These empirical expressions match our SPICE simulation re-sults quite well as shown in ﬁg. 14. M1: M 𝑆 Ω = 47×10 −18 emu

M2: M s Ω = 23×10 −17 emu (a) (b) H D = 2400πemu/ccH 𝐷′ = 4800πemu/cc FIG. 14.

Pinning Field of low-barrier magnets

The numericalevaluations of equations are compared to SPICE simulation for (a)Isotropic magnets and (b) circular IMA magnets which have ∆ ≤ k B T. The pinning ﬁelds are shown to be a function of M S Ω onlywhere M S =

600 emu / cc and the volume of magnet Ω is varied, Thepinning ﬁeld values for IMA magnets indicate that it is independentof the large demagnetization ﬁeld, H D . The precise correspondencebetween the analytical formulas and the numerical simulation alsoconstitutes as a benchmark to our ﬁnite temperature (stochastic) LLGformulation. M. Yamaoka, C. Yoshimura, M. Hayashi, T. Okuyama, H. Aoki, andH. Mizuno, “24.3 20k-spin ising chip for combinational optimization prob-lem with cmos annealing,” in (IEEE, 2015) pp. 1–3. F. Neukart, G. Compostella, C. Seidel, D. Von Dollen, S. Yarkoni, andB. Parney, “Trafﬁc ﬂow optimization using a quantum annealer,” Frontiersin ICT , 29 (2017). F. Barahona, M. Grötschel, M. Jünger, and G. Reinelt, “An application ofcombinatorial optimization to statistical physics and circuit layout design,”Operations Research , 493–513 (1988). C. Cook, H. Zhao, T. Sato, M. Hiromoto, and S. X.-D. Tan, “Gpu basedparallel ising computing for combinatorial optimization problems in vlsiphysical design,” arXiv preprint arXiv:1807.10750 (2018). G. Rosenberg, P. Haghnegahdar, P. Goddard, P. Carr, K. Wu, and M. L.De Prado, “Solving the optimal trading trajectory problem using a quantumannealer,” IEEE Journal of Selected Topics in Signal Processing , 1053–1060 (2016). H. Sakaguchi, K. Ogata, T. Isomura, S. Utsunomiya, Y. Yamamoto, andK. Aihara, “Boltzmann sampling by degenerate optical parametric oscilla-tor network for structure-based virtual screening,” Entropy , 365 (2016). F. Barahona, “On the computational complexity of ising spin glass models,”Journal of Physics A: Mathematical and General , 3241 (1982). A. Lucas, “Ising formulations of many np problems,” Frontiers in Physics , 5 (2014). B. Sutton, K. Y. Camsari, B. Behin-Aein, and S. Datta, “Intrinsic optimiza-tion using stochastic nanomagnets,” Scientiﬁc Reports , 44370 (2017). “Binary stochastic neurons in tensorﬂow (https://r2rt.com/binary-stochastic-neurons-in-tensorﬂow.html),”. M. W. Johnson, M. H. Amin, S. Gildert, T. Lanting, F. Hamze, N. Dickson,R. Harris, A. J. Berkley, J. Johansson, P. Bunyk, et al. , “Quantum annealingwith manufactured spins,” Nature , 194–198 (2011). P. L. McMahon, A. Marandi, Y. Haribara, R. Hamerly, C. Langrock, S. Ta-mate, T. Inagaki, H. Takesue, S. Utsunomiya, K. Aihara, et al. , “A fully pro-grammable 100-spin coherent ising machine with all-to-all connections,”Science , 614–617 (2016). S. Dutta, A. Khanna, H. Paik, D. Schlom, A. Raychowdhury, Z. Toroczkai,and S. Datta, “Ising hamiltonian solver using stochastic phase-transitionnano-oscillators,” arXiv preprint arXiv:2007.12331 (2020). H. Goto, K. Tatsumura, and A. R. Dixon, “Combinatorial optimization bysimulating adiabatic bifurcations in nonlinear hamiltonian systems,” Sci-ence advances , eaav2372 (2019). T. Wang and J. Roychowdhury, “Oim: Oscillator-based ising machinesfor solving combinatorial optimisation problems,” in

International Confer-ence on Unconventional Computation and Natural Computation (Springer,2019) pp. 232–256. I. Ahmed, P.-W. Chiu, and C. H. Kim, “A probabilistic self-annealing com-pute fabric based on 560 hexagonally coupled ring oscillators for solvingcombinatorial optimization problems,” in (IEEE, 2020) pp. 1–2. J. Chou, S. Bramhavar, S. Ghosh, and W. Herzog, “Analog coupled oscil-lator based weighted ising machine,” Scientiﬁc reports , 1–10 (2019). S. Kirkpatrick, C. D. Gelatt, and M. P. Vecchi, “Optimization by simulatedannealing,” science , 671–680 (1983). M. Baity-Jesi, R. A. Baños, A. Cruz, L. A. Fernandez, J. M. Gil-Narvión,A. Gordillo-Guerrero, D. Iñiguez, A. Maiorano, F. Mantovani, E. Mari-nari, et al. , “Janus ii: A new generation application-driven computer forspin-system simulations,” Computer Physics Communications , 550–559 (2014). T. Takemoto, M. Hayashi, C. Yoshimura, and M. Yamaoka, “2.6 a 2 × (IEEE, 2019) pp. 52–54. M. Aramon, G. Rosenberg, E. Valiante, T. Miyazawa, H. Tamura, andH. G. Katzgraber, “Physics-inspired optimization for quadratic uncon-strained problems using a digital annealer,” Frontiers in Physics , 48(2019). K. Yamamoto, K. Ando, N. Mertig, T. Takemoto, M. Yamaoka, H. Ter-amoto, A. Sakai, S. Takamaeda-Yamazaki, and M. Motomura, “7.3 stat-ica: A 512-spin 0.25 m-weight full-digital annealing processor with anear-memory all-spin-updates-at-once architecture for combinatorial opti-mization with complete spin-spin interactions,” in (IEEE, 2020) pp. 138–140. S. Patel, L. Chen, P. Canoza, and S. Salahuddin, “Ising model optimiza-tion problems on a fpga accelerated restricted boltzmann machine,” arXivpreprint arXiv:2008.04436 (2020). S. Patel, P. Canoza, and S. Salahuddin, “Logically synthesized, hardware-accelerated, restricted boltzmann machines for combinatorial optimizationand integer factorization,” arXiv preprint arXiv:2007.13489 (2020). K. Y. Camsari, S. Salahuddin, and S. Datta, “Implementing p-bits withembedded mtj,” IEEE Electron Device Letters , 1767–1770 (2017). L. Xia, P. Gu, B. Li, T. Tang, X. Yin, W. Huangfu, S. Yu, Y. Cao, Y. Wang,and H. Yang, “Technological exploration of rram crossbar array for matrix-vector multiplication,” Journal of Computer Science and Technology ,3–19 (2016). F. Cai, J. M. Correll, S. H. Lee, Y. Lim, V. Bothra, Z. Zhang, M. P. Flynn,and W. D. Lu, “A fully integrated reprogrammable memristor-cmos systemfor efﬁcient multiply–accumulate operations,” Nature Electronics , 290–299 (2019). F. M. Bayat, M. Prezioso, B. Chakrabarti, H. Nili, I. Kataeva, andD. Strukov, “Implementation of multilayer perceptron network with highlyuniform passive memristive crossbar circuits,” Nature communications ,1–7 (2018). B. Sutton, R. Faria, L. A. Ghantasala, K. Y. Camsari, and S. Datta,“Autonomous probabilistic coprocessing with petaﬂips per second,” arXivpreprint arXiv:1907.09664 (2019). M. M. Torunbalci, P. Upadhyaya, S. A. Bhave, and K. Y. Camsari, “Modu-lar compact modeling of mtj devices,” IEEE Transactions on Electron De-vices , 4628–4634 (2018). “Predictive Technology Model (PTM) (http://ptm.asu.edu/),”. O. Hassan, R. Faria, K. Y. Camsari, J. Z. Sun, and S. Datta, “Low- barrier magnet design for efﬁcient hardware binary stochastic neurons,”IEEE Magnetics Letters , 1–5 (2019). M. W. Daniels, A. Madhavan, P. Talatchian, A. Mizrahi, and M. D.Stiles, “Energy-efﬁcient stochastic computing with superparamagnetic tun-nel junctions,” Physical Review Applied , 034016 (2020). B. Parks, M. Bapna, J. Igbokwe, H. Almasi, W. Wang, and S. A. Majetich,“Superparamagnetic perpendicular magnetic tunnel junctions for true ran-dom number generators,” AIP Advances , 055903 (2018). J. Grollier, D. Querlioz, K. Camsari, K. Everschor-Sitte, S. Fukami,and M. D. Stiles, “Neuromorphic spintronics,” Nature Electronics , 1–11(2020). M. A. Abeed and S. Bandyopadhyay, “Low energy barrier nanomagnet de-sign for binary stochastic neurons: Design challenges for real nanomagnetswith fabrication defects,” IEEE Magnetics Letters , 1–5 (2019). J. L. Drobitch and S. Bandyopadhyay, “Reliability and scalability of p-bitsimplemented with low energy barrier nanomagnets,” IEEE Magnetics Let-ters , 1–4 (2019). W. A. Borders, A. Z. Pervaiz, S. Fukami, K. Y. Camsari, H. Ohno, andS. Datta, “Integer factorization using stochastic magnetic tunnel junctions,”Nature , 390–393 (2019). B. Parks, A. Abdelgawad, T. Wong, R. F. Evans, and S. A. Majetich, “Mag-netoresistance dynamics in superparamagnetic co- fe- b nanodots,” PhysicalReview Applied , 014063 (2020). S. Cheemalavagu, P. Korkmaz, K. V. Palem, B. E. Akgul, and L. N. Chakra-pani, “A probabilistic cmos switch and its realization by exploiting noise,”in

IFIP International Conference on VLSI (2005) pp. 535–541. N. Shukla, A. Parihar, E. Freeman, H. Paik, G. Stone, V. Narayanan,H. Wen, Z. Cai, V. Gopalan, R. Engel-Herbert, et al. , “Synchronizedcharge oscillations in correlated electron systems,” Scientiﬁc reports ,4964 (2014). S. Kumar, J. P. Strachan, and R. S. Williams, “Chaotic dynamics innanoscale nbo 2 mott memristors for analogue computing,” Nature ,318–321 (2017). B. Stampfer, F. Zhang, Y. Y. Illarionov, T. Knobloch, P. Wu, M. Waltl,A. Grill, J. Appenzeller, and T. Grasser, “Characterization of single de-fects in ultrascaled mos 2 ﬁeld-effect transistors,” ACS nano , 5368–5375(2018). J. Cai, B. Fang, L. Zhang, W. Lv, B. Zhang, T. Zhou, G. Finocchio, andZ. Zeng, “Voltage-controlled spintronic stochastic neuron based on a mag-netic tunnel junction,” Physical Review Applied , 034015 (2019). K. Y. Camsari, M. M. Torunbalci, W. A. Borders, H. Ohno, and S. Fukami,“Double free-layer magnetic tunnel junctions for probabilistic bits,” arXivpreprint arXiv:2012.06950 (2020). C. Lin, S. Kang, Y. Wang, K. Lee, X. Zhu, W. Chen, X. Li, W. Hsu, Y. Kao,M. Liu, et al. , “45nm low power cmos logic compatible embedded stt mramutilizing a reverse-connection 1t/1mtj cell,” in

Electron Devices Meeting(IEDM), 2009 IEEE International (IEEE, 2009) pp. 1–4. K. Y. Camsari, R. Faria, B. M. Sutton, and S. Datta, “Stochastic p-bits forinvertible logic,” Physical Review X , 031014 (2017). Y. Lv, R. P. Bloom, and J.-P. Wang, “Experimental demonstration of prob-abilistic spin logic by magnetic tunnel junctions,” IEEE Magnetics Letters , 1–5 (2019). B. R. Zink, Y. Lv, and J.-P. Wang, “Independent control of antiparallel-and parallel-state thermal stability factors in magnetic tunnel junctions fortelegraphic signals with two degrees of tunability,” IEEE Transactions onElectron Devices , 5353–5359 (2019). S. S. Parkin, C. Kaiser, A. Panchula, P. M. Rice, B. Hughes, M. Samant,and S.-H. Yang, “Giant tunnelling magnetoresistance at room temperaturewith mgo (100) tunnel barriers,” Nature materials , 862–867 (2004). S. Ikeda, J. Hayakawa, Y. Ashizawa, Y. Lee, K. Miura, H. Hasegawa,M. Tsunoda, F. Matsukura, and H. Ohno, “Tunnel magnetoresistance of604% at 300 k by suppression of ta diffusion in co fe b/ mg o/ co fe bpseudo-spin-valves annealed at high temperature,” Applied Physics Letters , 082508 (2008). P. Debashis, R. Faria, K. Y. Camsari, J. Appenzeller, S. Datta, and Z. Chen,“Experimental demonstration of nanomagnet networks as hardware forising computing,” in

Electron Devices Meeting (IEDM), 2016 IEEE Inter-national (IEEE, 2016) pp. 34–3. nanohub.org, “Modular approach to spintronics,” https://nanohub.org/groups/spintronics . J. Kaiser, A. Rustagi, K. Y. Camsari, J. Z. Sun, S. Datta, and P. Upad-hyaya, “Subnanosecond ﬂuctuations in low-barrier nanomagnets,” PhysicalReview Applied , 054056 (2019). W. T. Coffey and Y. P. Kalmykov, “Thermal ﬂuctuations of magneticnanoparticles: Fifty years after brown,” Journal of Applied Physics ,121301 (2012). M. R. Pufall, W. H. Rippard, S. Kaka, S. E. Russek, T. J. Silva, J. Katine,and M. Carey, “Large-angle, gigahertz-rate random telegraph switching in-duced by spin-momentum transfer,” Physical Review B , 214409 (2004). C. Safranski, J. Kaiser, P. Trouilloud, P. Hashemi, G. Hu, and J. Z. Sun,“Demonstration of nanosecond operation in stochastic magnetic tunneljunctions,” arXiv preprint arXiv:2010.14393 (2020). R. Faria, K. Y. Camsari, and S. Datta, “Low-barrier nanomagnets as p-bitsfor spin logic,” IEEE Magnetics Letters , 1–5 (2017). J. Z. Sun, “Spin-current interaction with a monodomain magnetic body: Amodel study,” Physical Review B , 570 (2000). R. Faria, K. Y. Camsari, and S. Datta, “Implementing bayesian networkswith embedded stochastic mram,” AIP Advances , 045101 (2018). M. Romera, P. Talatchian, S. Tsunegi, F. A. Araujo, V. Cros, P. Bortolotti,J. Trastoy, K. Yakushiji, A. Fukushima, H. Kubota, et al. , “Vowel recogni-tion with four coupled spin-torque nano-oscillators,” Nature , 230–234(2018). S. Jenkins, A. Meo, L. E. Elliott, S. K. Piotrowski, M. Bapna, R. W.Chantrell, S. A. Majetich, and R. F. Evans, “Magnetic stray ﬁeldsin nanoscale magnetic tunnel junctions,” Journal of Physics D: AppliedPhysics , 044001 (2019). R. Faria, J. Kaiser, K. Y. Camsari, and S. Datta, “Hardware design forautonomous bayesian networks,” arXiv preprint arXiv:2003.01767 (2020). S. V. Isakov, I. N. Zintchenko, T. F. Rønnow, and M. Troyer, “Optimisedsimulated annealing for ising spin glasses,” Computer Physics Communi-cations , 265–271 (2015). E. Aarts, E. H. Aarts, and J. K. Lenstra,

Local search in combinatorialoptimization (Princeton University Press, 2003). J. Kaiser, R. Faria, K. Y. Camsari, and S. Datta, “Probabilistic circuitsfor autonomous learning: A simulation study,” Frontiers in ComputationalNeuroscience (2020). F. Cai, S. Kumar, T. Van Vaerenbergh, R. Liu, C. Li, S. Yu, Q. Xia, J. J.Yang, R. Beausoleil, W. Lu, et al. , “Harnessing intrinsic noise in memristorhopﬁeld neural networks for combinatorial optimization,” arXiv preprintarXiv:1903.11194 (2019). H. Huang, J. Heilmeyer, M. Grözing, M. Berroth, J. Leibrich, andW. Rosenkranz, “An 8-bit 100-gs/s distributed dac in 28-nm cmos for opti-cal communications,” IEEE Transactions on Microwave Theory and Tech- niques , 1211–1218 (2015). M. Hu, C. E. Graves, C. Li, Y. Li, N. Ge, E. Montgomery, N. Davila,H. Jiang, R. S. Williams, J. J. Yang, et al. , “Memristor-based analog com-putation and neural network classiﬁcation with a dot product engine,” Ad-vanced Materials , 1705914 (2018). H. Gyoten, M. Hiromoto, and T. Sato, “Area efﬁcient annealing proces-sor for ising model without random number generator,” IEICE TRANSAC-TIONS on Information and Systems , 314–323 (2018). S. Aggarwal, H. Almasi, M. DeHerrera, B. Hughes, S. Ikegawa, J. Janesky,H. Lee, H. Lu, F. Mancoff, K. Nagel, et al. , “Demonstration of a reliable1 gb standalone spin-transfer torque mram for industrial applications,” in (IEEE, 2019)pp. 2–1. “Everspin enters pilot production phase for the world’s ﬁrst 28 nm 1 gbstt-mram component,” Everspin Technology (2019). X. Zhang, R. Bashizade, Y. Wang, C. Lyu, S. Mukherjee, and A. R. Lebeck,“Beyond application end-point results: Quantifying statistical robustness ofmcmc accelerators,” arXiv preprint arXiv:2003.04223 (2020). S. Nasrin, J. L. Drobitch, S. Bandyopadhyay, and A. R. Trivedi, “Lowpower restricted boltzmann machine using mixed-mode magneto-tunnelingjunctions,” IEEE Electron Device Letters , 345–348 (2019). C. D. Schuman, T. E. Potok, R. M. Patton, J. D. Birdwell, M. E. Dean, G. S.Rose, and J. S. Plank, “A survey of neuromorphic computing and neuralnetworks in hardware,” arXiv preprint arXiv:1705.06963 (2017). G. E. Hinton, “Training products of experts by minimizing contrastive di-vergence,” Neural computation , 1771–1800 (2002). M. Courbariaux, I. Hubara, D. Soudry, R. El-Yaniv, and Y. Bengio, “Bina-rized neural networks: Training neural networks with weights and activa-tions constrained to+ 1 or-1,” arXiv preprint arXiv:1602.02830 (2016). C.-H. Tsai, W.-J. Yu, W. H. Wong, and C.-Y. Lee, “A 41.3/26.7 pj perneuron weight rbm processor supporting on-chip learning/inference for iotapplications,” IEEE Journal of Solid-State Circuits , 2601–2612 (2017). S. Park, K. Bong, D. Shin, J. Lee, S. Choi, and H.-J. Yoo, “93tops/w scal-able deep learning/inference processor with tetra-parallel mimd architecturefor big-data applications,” in (IEEE, 2015) pp. 1–3. W. F. Brown Jr, “Thermal ﬂuctuations of a single-domain particle,” PhysicalReview , 1677 (1963). S. Sayed, K. Y. Camsari, R. Faria, and S. Datta, “Rectiﬁcation in spin-orbitmaterials using low-energy-barrier magnets,” Physical Review Applied11