[PDF] A Framework to Explore Workload-Specific Performance and Lifetime Trade-offs in Neuromorphic Computing

Abstract

Neuromorphic hardware with non-volatile memory (NVM) can implement machine learning workload in an energy-efficient manner. Unfortunately, certain NVMs such as phase change memory (PCM) require high voltages for correct operation. These voltages are supplied from an on-chip charge pump. If the charge pump is activated too frequently, its internal CMOS devices do not recover from stress, accelerating their aging and leading to negative bias temperature instability (NBTI) generated defects. Forcefully discharging the stressed charge pump can lower the aging rate of its CMOS devices, but makes the neuromorphic hardware unavailable to perform computations while its charge pump is being discharged. This negatively impacts performance such as latency and accuracy of the machine learning workload being executed. In this paper, we propose a novel framework to exploit workload-specific performance and lifetime trade-offs in neuromorphic computing. Our framework first extracts the precise times at which a charge pump in the hardware is activated to support neural computations within a workload. This timing information is then used with a characterized NBTI reliability model to estimate the charge pump's aging during the workload execution. We use our framework to evaluate workload-specific performance and reliability impacts of using 1) different SNN mapping strategies and 2) different charge pump discharge strategies. We show that our framework can be used by system designers to explore performance and reliability trade-offs early in the design of neuromorphic hardware such that appropriate reliability-oriented design margins can be set.

Full PDF

IIEEE COMPUTER ARCHITECTURE LETTERS, VOL. XX, NO. Y, MONTH YYYY 1

A Framework to Explore Workload-SpeciﬁcPerformance and Lifetime Trade-offs inNeuromorphic Computing

Adarsha Balaji, Shihao Song, Anup Das, Nikil Dutt, Jeff Krichmar,Nagarajan Kandasamy, Francky Catthoor

Abstract —Neuromorphic hardware with non-volatile memory (NVM) can implement machine learning workload in an energy-efﬁcientmanner. Unfortunately, certain NVMs such as phase change memory (PCM) require high voltages for correct operation. Thesevoltages are supplied from an on-chip charge pump. If the charge pump is activated too frequently, its internal CMOS devices do notrecover from stress, accelerating their aging and leading to negative bias temperature instability (NBTI) generated defects. Forcefullydischarging the stressed charge pump can lower the aging rate of its CMOS devices, but makes the neuromorphic hardwareunavailable to perform computations while its charge pump is being discharged. This negatively impacts performance such as latencyand accuracy of the machine learning workload being executed. In this paper, we propose a novel framework to exploitworkload-speciﬁc performance and lifetime trade-offs in neuromorphic computing. Our framework ﬁrst extracts the precise times atwhich a charge pump in the hardware is activated to support neural computations within a workload. This timing information is thenused with a characterized NBTI reliability model to estimate the charge pump’s aging during the workload execution. We use ourframework to evaluate workload-speciﬁc performance and reliability impacts of using 1) different SNN mapping strategies and 2)different charge pump discharge strategies. We show that our framework can be used by system designers to explore performance andreliability trade-offs early in the design of neuromorphic hardware such that appropriate reliability-oriented design margins can be set.

Index Terms —Neuromorphic computing, Non-voltaile Memory (NVM), Phase-Change Memory (PCM), wear-out, Negative BiasTemperature Instability (NBTI), Spiking Neural Networks (SNNs), and Inter-Spike Interval (ISI). (cid:70)

NTRODUCTION A neuromorphic hardware consists of artiﬁcial neuronsand synapses to implement spiking neural networks(SNNs) [1]. Emerging non-volatile memory (NVM) cellsorganized into crossbars are used to store synaptic strengths.Certain NVMs such as phase-change memory (PCM) re-quire high voltages ( ∼ V − V ) to read and program synapticstrengths. These high voltages not only create reliabilityissues for NVM cells in a crossbar, but also for the internalCMOS devices of the on-chip charge pump [2], which gen-erates these voltages. In this paper, we study one speciﬁchigh voltage related reliability issue of a charge pump inthe context of neuromorphic computing – that of thresholdvoltage ( V th ) stress. If the charge pump is activated toofrequently, its CMOS devices do not recover from stress,accelerating their aging and eventually leading to failures.Typically, a charge pump is several orders of magnitudelarger than the size of a crossbar [2]. To mitigate thislarge size, system designers connect many crossbars to eachcharge pump. Therefore, charge pump failures are a criticalbottleneck to the prolonged operation of a neuromorphichardware. Redundant charge pumps can improve reliability • A. Balaji, S. Song, A. Das, N. Kandasamy are with Drexel University,Philadelphia, PA, USA E-mail:[email protected]. • N. Dutt and J. Krichmar are with the Department of Computer Science,University of California, Irvine, CA, USA. • F. Catthoor is with Imec, Belgium and KU Leuven, Belgium.Manuscript received Month DD, YYYY; revised Month DD, YYYY. but increases hardware area. To improve reliability, stressedcharge pumps can also be forcefully discharged, where adischarge operation involves applying a low voltage to allCMOS devices in the charge pump. Once discharged, thecharge pump requires several cycles to boost its voltageback, before it can safely be used to access NVM cells ina crossbar. During this interval, crossbars are unable toprocess spikes, introducing a spike propagation delay. Thisdelay negatively impacts performance (such as latency andaccuracy) of the SNN workload being executed [3].Aging of a charge pump depends on how frequentlyNVM cells in the hardware are activated, which is due tospikes generated by the SNN workload being executed.We propose a novel framework that allows system de-signers to explore workload-speciﬁc trade-offs involvingreliability, performance, and design cost, early in the designprocess such that appropriate reliability-oriented designmargins can be set. Our framework incorporates the CARL-sim simulator [4] to ﬁrst extract the precise times of spikesin a SNN workload. We then use a characterized reliabilitymodel to estimate aging of charge pumps based on theiractivation times, which are inﬂuenced by the mapping ofsynapses to crossbars and the connectivity of crossbars tocharge pumps in the hardware. We show that this frame-work can be integrated inside 1) design-time techniques,where neurons and synapses can be efﬁciently allocated todifferent crossbars, balancing aging of all charge-pumps, 2)run-time techniques, where stressed charge pumps can beforcefully discharged at appropriate intervals, minimizing a r X i v : . [ c s . ET ] N ov EEE COMPUTER ARCHITECTURE LETTERS, VOL. XX, NO. Y, MONTH YYYY 2

Charge Pump 1Charge Pump 2 I n t e r c onne c t Crossbars Pre-synaptic neurons Post-synaptic neuron P r e - sy nap t i c neu r on s P o s t - sy nap t i c neu r on c r o ss ba r peripheral circuit Neuron Circuit Neuron Circuit Neuron Circuit resistive heating elementchalcogenide alloy (GST)metal (to bottom elcrode via diode)metal (to to top electrode) A typical neuromorphic architecture with cross bars interconnected using a time-shared inter- connect. The charge pumps supply the voltages necessary for the peripheral structures An example illustrating how two pre- synaptic neurons connected to a post- synaptic neuron is mapped to a crossbar

N1N2 N3 N1N2 N3 P1 (a) (b) w w P2 Fig. 1: An illustration of a typical neuromorphic architectureand how SNNs are mapped to a crossbar in this architecture.their aging without signiﬁcantly hurting performance, and3) architectural techniques, where the number of chargepumps can be budgeted to achieve a target lifetime.

ACKGROUND AND M OTIVATION

SNNs are networks of spiking neurons interconnected viasynapses. A neuron ﬁres a spike when its membrane volt-age exceeds a threshold and subsequently the membranevoltage is reset. The moment of threshold crossing deﬁnesthe ﬁring time . SNNs can be used to implement manymachine learning techniques. One example is the supervisedapproach, where a SNN is ﬁrst trained with examples fromthe ﬁeld and then used for inference with in-ﬁeld data.Performance of supervised machine learning is measuredin terms of accuracy , which is assessed from inter-spikeintervals (ISIs) [5]. To deﬁne ISI, we let { t , t , · · · , t K } be aneuron’s ﬁring times in the time interval [0 , T ] . The averageISI of this spike train is given by [5]: I = K (cid:88) i =2 ( t i − t i − ) / ( K − . (1) A neuromorphic hardware , shown in Figure 1(a), consists of6 crossbars, three of which are connected to charge pump 1and the remaining three to charge pump 2. All crossbars areinterconnected using a time-shared interconnect. Figure 1(b)illustrates the mapping of an SNN to a crossbar. Synapticweight w is programmed on the NVM cell P1 and w on P2. Output spike voltages x from N1 and x from N2inject currents into the crossbar, which are obtained by mul-tiplying a pre-synaptic neuron’s output spike voltage withthe NVM cell’s conductance at the cross-point of the pre-and post-synaptic neurons (following Ohm’s law). Currentsummations along columns are performed in parallel usingKirchhoffs current law, and implement the sums (cid:80) j w ij x i ,needed for forward propagation of neuron excitation x i .Figure 2(a) shows the spike train generated by N1 ofFigure 1(b). Each spike injects current to read the conduc-tance of the NVM cell P1. Figure 2(b) illustrates the chargepump’s operating voltage to process this spike train. Thecharge pump is operated at . V for the entire 60ms interval,boosting its voltage to V only to process spikes. Agingof the charge pump is 8.3 units (see Section 3 for agingcomputation) and the average ISI is 5.9ms (Equation 1).Figure 2(c) illustrates the charge pump’s operating volt-age when it is discharged to . V after processing everyspike and boosted again to . V before processing thenext. Once discharged, the crossbar becomes unavailableto process spikes, introducing latency in processing the s p i k e s (a) Example spike train from N1 of Figure 1(b). C h a r g e p u m p v o l t a g e ( V ) ISI = 5.9ms and aging = 8.3 unit (b) Charge pump voltage to process the spike train. C h a r g e p u m p v o l t a g e ( V ) ISI = 7.4ms and aging = 7.1 unit (c) Charge pump reset to 1.2V after processing every spike.

Fig. 2: Illustrating the trade-off between charge pump agingand SNN performance, considering PCM crossbars.spike train. The average ISI increases to 7.4ms, comparedto 5.9ms in Figure 2(b).

ISI deviation leads to accuracy loss [3].Frequently discharging the charge pump, however, reducesits aging to 7.1 units, compared to 8.3 units in Figure 2(b).This reduction in aging leads to an improvement of mean-time-to-failure (MTTF) of the charge pump by an average8.7%. Thus, aging reduction improves a charge pump’s lifetime.

ROPOSED W ORKLOAD -A WARE F RAMEWORK

We ﬁrst review NBTI, which is a dominant reliability issuein scaled technology nodes, and then present our proposedframework for PCM-based crossbars. We use characterizedNBTI model [6]. Our framework can also be extended withminimal efforts to consider 1) any NBTI model, 2) otherNVMs such as FeRAM and Flash, and 3) other reliability is-sues such as time dependent dielectric breakdown (TDDB),which is still the dominant one in older technology nodes.NBTI aging manifests as 1) decrease in drain currentand transconductance, and 2) increase in off current andthreshold voltage. NBTI aging is accelerated at high tem-perature and high oxide electric ﬁeld. Recent works suchas [6] suggest that NBTI is the collective response of twoindependent defects – the as-grown hole traps (AHTs) and generated defects (GDs). AHTs and a small proportion of GDscan be recovered by annealing at high temperatures if theNBTI stress voltage is removed. We focus on GDs, whichcontribute to permanent degradation of charge pumps. Infact, once introduced, GDs cannot be eliminated. Their effectcan, however, be delayed by applying lower voltages (i.e.,forcefully discharging stressed charge pumps).To formulate NBTI aging, we divide the SNN executiontime [0 , T ] into m equal intervals t < t · · · < t m = T ,with [ t i , t i +1 ) as the ( i + 1) th interval and V i is the chargepump’s voltage in this interval. Reliability at the end ofSNN execution can be expressed as R ( T ) = e − (cid:16)(cid:80) m − i =0 G ( V i ) (cid:17) β ,where G ( V i ) is the generated defect at voltage V i , expressed EEE COMPUTER ARCHITECTURE LETTERS, VOL. XX, NO. Y, MONTH YYYY 3

CARLsim CARLsimSNN MappingAgingEvaluationDischarge Management

Fig. 3: Framework to evaluate aging of charge pumps.as power law, G ( V i ) = g · ( V i − V th ) m · ( t i +1 − t i ) n and β, g , m, n are material-dependent constants [6]. We deﬁne NBTI agingin a stressed charge pump as A = m − (cid:88) i =0 g · ( V i − V th ) m · ( t i +1 − t i ) n , such that R ( T ) = e −A β . (2) Here (2) assumes all synapses are mapped to the samecrossbar, which is connected to a single charge pump. Inpractice, however, 1) synapses are distributed across differ-ent crossbars because a crossbar can accommodate only alimited number of synapses and 2) a neuromorphic hard-ware typically has more than one charge pump to limit thepower supply load. We now describe how to extend (2) toincorporate these practical constraints.We consider the SNN G , with N neurons and S synapses,excited with an input over the time interval [0 , T ] . We ar-range the spikes in this interval by synapses they excite as S = { τ , τ , · · · , τ k } , { τ , τ , · · · , τ k } , · · · , { τ S , τ S , · · · , τ Sk S } , (3) where τ sj is the j th spike on s th synapse of the SNN. Weintroduce the following notation. A s : aging to process spike train { τ s , · · · , τ sk s } on s th synapse C : number of crossbars L : number of charge pumps M ∈ R S × C :synapse-to-crossbar mapping, such that m ij ∈ M = (cid:40) if synapse i is mapped to crossbar j otherwise (4) P ∈ R C × L :crossbar-to-charge pump mapping, such that p jk ∈ P = (cid:40) if crossbar j is powered by charge pump k otherwise (5) Combining these two equations, we generate the synapse-to-charge pump mapping as m ij · p jk = (cid:40) if synapse i is powered by charge pump k otherwise (6) The total aging of charge pump k is therefore aging k = S (cid:88) i =1 C (cid:88) j =1 m ij · p jk · A i (7) Proposed Framework –

Figure 3 illustrates our frameworkto evaluate aging of charge pumps in a neuromorphichardware. We use CARLsim [4] to train SNN models.The output of CARLsim are the trained weights and theprecise times of spikes on all synapses of the SNN S . ASNN mapping approach such as [3] uses CARLsim output to generate a synapse-to-crossbar mapping M , optimizingsome objective function. In [3], the objective function isto minimize the number of spikes communicated betweencrossbars, which leads to lower energy and latency on theshared interconnect. Once the SNN is mapped to crossbarsof the hardware, its performance is obtained in terms ofthe inter-spike interval I using (1). Using this synapse-to-crossbar and crossbar-to-charge pump mapping, our novelformulation in (7) evaluates the aging of all charge pumpsin the hardware when executing an SNN workload. Thisdesign ﬂow is shown using solid arrows.Figure 3 also illustrates three future directions based onthis framework using dashed arrows. First, Aging Evalua-tion , as developed in (7), can be combined with the

SNNMapping step to generate an optimum mapping of the SNNto the hardware that balances aging of all charge pumps.This is shown by the dashed arrow labeled aging-awaremapping . Second, crossbar-to-charge pump mapping can beoptimized to achieve a desired lifetime of charge pumpsfor executing the SNN. This is shown using the dashed ar-row labeled application-speciﬁc charge pump placement . Third,strategies can be developed to discharge charge pumps atrun-time, improving their lifetime. This is shown in the

Discharge Management step.

VALUATION R ESULTS

This section presents evaluation results using our frame-work. We use the neuromorphic hardware of Figure 1(a)to evaluate the following SNNs [3], [7], [8], [9].

SNN Synapses Topology Spikes

ImgSmooth 136,314 FeedForward (4096, 1024) 17,600EdgeDet 272,628 FeedForward (4096, 1024, 1024, 1024) 22,780MLP-MNIST 79,400 FeedForward (784, 100, 10) 2,395,300HeartEstm 636,578 Recurrent 3,002,223HeartClass 2,396,521 CNN (82x82) - [Conv, Pool]*16 - [Conv, Pool]*16 - FC*256 - FC*6 (24x24) - [Conv, Pool]*16 - FC*150 - FC*10 (32x32) - [Conv, Pool]*6 - [Conv, Pool]*16 - Conv*120 - FC*84 (32x32x3) - [Conv, Pool]*6 - [Conv, Pool]*6 - FC*84 - FC*10 We use our framework to evaluate two state-of-the-art SNNmapping strategies – SCO [10] and SpiNeMap [3], in termsof performance (measured as change in ISI) and reliability(measured as aging). Figure 4 illustrates the result of SCO,normalized to SpiNeMap. SCO, which balances crossbarutilization, has on average 16.4% lower aging (better life-time) than SpiNeMap for these workloads. This is becauseSpiNeMap explicitly minimizes spike latency on the sharedinterconnect. To do so, some crossbars get more utilized thanothers. Heavily utilized crossbars activate charge pumpsmore frequently, causing their higher aging. Conversely,SpiNeMap has lower ISI change (higher performance). SCOhas on average 21% higher change in ISI than SpiNeMap.

From a performance perspective, SpiNeMap is better than SCO,while from a reliability perspective, SCO is better than SpiNeMap.

Figure 5 illustrates aging and ISI with discharge intervals of10ms, 50ms, and 100ms for the evaluated SNN workloads,normalized to when charge pumps are stressed for the

EEE COMPUTER ARCHITECTURE LETTERS, VOL. XX, NO. Y, MONTH YYYY 4 C N N - M N I S T E d g e D e t H e a r t C l a s s H e a r t E s t m i m g S m o o t h L e N e t - C I F A R L e N e t - M N I S T M L P - M N I S T A V E R A G E A g i n g o f S C O [ ] n o r m a li z e d t o Sp i N e M a p [ ] aging ISI I S I o f S C O [ ] n o r m a li z e d t o Sp i N e M a p [ ] Fig. 4: Aging and ISI of SCO [10] vs. SpiNeMap [3]. C N N - M N I S T E d g e D e t H e a r t C l a s s H e a r t E s t m i m g S m o o t h L e N e t - C I F A R L e N e t - M N I S T M L P - M N I S T A V E R A G E A g i n g n o r m a li z e d t o t h e a g i n g w i t h n o d i s c h a r g e charge pump discharge every 10ms 50ms 100ms (a) Aging for different discharge intervals normalized to the aging whencharge pumps are not discharged. C N N - M N I S T E d g e D e t H e a r t C l a s s H e a r t E s t m i m g S m o o t h L e N e t - C I F A R L e N e t - M N I S T M L P - M N I S T A V E R A G E I S I n o r m a li z e d t o t h e I S I w i t h n o d i s c h a r g e charge pump discharge every 10ms 50ms 100ms (b) ISI for different discharge intervals normalized to the ISI when chargepumps are not discharged. Fig. 5: Aging and ISI with different discharge intervals.entire execution duration. We make the following threekey observations. First, aging is the lowest for dischargeinterval of 10ms, while ISI variation is the highest. This isbecause, with smaller discharge intervals, a charge pump’sinternal CMOS devices recover partially from stress andtherefore, the rate of aging reduces improving lifetime. Theperformance is lower because of the delay introduced in fre-quent charge pump discharge. Second, when the dischargeinterval changes from 10ms to 100ms, aging increases, re-ducing charge pump’s lifetime, and ISI variation reduces,improving application performance. Third, aging of chargepumps varies across different SNN workloads. For MLP-MNIST, aging increases by 10% when the discharge intervalincreases from 10ms to 100ms, while for LeNet-CIFAR,aging increases by a factor of 2 for the same range. Thisis because for MLP-MNIST, spikes are generated less fre-quently due to sparsity of synaptic weights. There is there-fore, no signiﬁcant variation in aging when charge pumpsare discharged differently. The ISI variations are, however,due to delay of spike propagation when charge pumps arebeing discharged. We see no signiﬁcant variations acrossdifferent workloads. Our framework enables exploration ofSNN workload-speciﬁc lifetime and performance trade-offs.

ISCUSSION AND F UTURE O UTLOOK

Aging-related defects in charge pumps constitute a criticalbottleneck to the prolonged operating lifetime of neuro-morphic hardware. These defects are different from anNVM cell’s endurance failures, which are due to repeatedprogramming of the cell. In recent prototypes, e.g. [11], PCMendurance is in the order of cycles ( ≈ ≈ such as Brian [12], and

2) otherreliability issues such as electromigration [13]. A CKNOWLEDGMENT

This work is supported by the National Science FoundationAward CCF-1937419 (RTML: Small: Design of System Soft-ware to Facilitate Real-Time Neuromorphic Computing). R EFERENCES [1] W. Maass, “Networks of spiking neurons: the third generation ofneural network models,”

Neural networks , vol. 10, no. 9, pp. 1659–1671, 1997.[2] B. Shen and M. L. Johnston, “Zero reversion loss, high-ffﬁciencysharge pump for wide output current load range,” in

Symposiumon circuits and systems , 2018, pp. 1–5.[3] A. Balaji, A. Das, Y. Wu, K. Huynh, F. Dellanna, G. Indiveri, J. L.Krichmar, N. Dutt, S. Schaafsma, and F. Catthoor, “Mapping spik-ing neural networks to neuromorphic hardware,” in

Transactionson very large scale integration (VLSI) systems , 2019.[4] T. Chou, H. J. Kashyap, J. Xing, S. Listopad, E. L. Rounds,M. Beyeler, N. Dutt, and J. L. Krichmar, “CARLsim 4: An opensource library for large scale, biologically detailed spiking neuralnetwork simulation using heterogeneous clusters,” in

Internationaljoint conference on neural networks (IJCNN) , 2018, pp. 1–8.[5] S. Gr ¨un and S. Rotter,

Analysis of parallel spike trains , 2010, vol. 7.[6] R. Gao, Z. Ji, A. B. Manut, J. F. Zhang, J. Franco, S. W. M.Hatta, W. D. Zhang, B. Kaczer, D. Linten, and G. Groeseneken,“NBTI-generated defects in nanoscaled devices: fast characteriza-tion methodology and modeling,”

Transactions on electron devices ,vol. 64, no. 10, pp. 4011–4017, 2017.[7]

MLPerf: Fair and useful benchmarks for measuring training and in-ference performance of ML hardware, software, and services. https://mlperf.org/training-overview/overview .[8] A. K. Das, F. Catthoor, and S. Schaafsma, “Heartbeat classiﬁcationin wearables using multi-layer perceptron and time-frequencyjoint distribution of ECG,” in

Conference on connected health: Ap-plications, systems and engineering technologies , 2018, pp. 69–74.[9] A. Das, P. Pradhapan, W. Groenendaal, P. Adiraju, R. Rajan,F. Catthoor, S. Schaafsma, J. Krichmar, N. Dutt, and C. Van Hoof,“Unsupervised heart-rate estimation in wearables with Liquidstates and a probabilistic readout,”

Neural networks , vol. 99, 2018.[10] M. K. F. Lee, Y. Cui, T. Somu, T. Luo, J. Zhou, W. T. Tang, W.-F.Wong, and R. S. M. Goh, “A system-level simulator for RRAM-based neuromorphic computing chips,”

Transactions on architectureand code optimization , vol. 15, no. 4, p. 64, 2019.[11] Z. Song, D. Cai, X. Li, L. Wang, Y. Chen, H. Chen, Q. Wang,Y. Zhan, and M. Ji, “High endurance phase change memory chipimplemented based on carbon-doped Ge2Sb2Te5 in 40 nm nodefor embedded application,” in

International electron devices meeting ,2018, pp. 27–5.[12] D. F. Goodman and R. Brette, “The Brian Simulator,”

Frontiers inNeuroscience , vol. 3, p. 26, 2009.

EEE COMPUTER ARCHITECTURE LETTERS, VOL. XX, NO. Y, MONTH YYYY 5 [13] A. Das, A. Kumar, and B. Veeravalli, “Aging-aware hardware-software task partitioning for reliable reconﬁgurable multiproces-sor systems,” in