Accurate Emulation of Memristive Crossbar Arrays for In-Memory Computing
Anastasios Petropoulos, Irem Boybat, Manuel Le Gallo, Evangelos Eleftheriou, Abu Sebastian, Theodore Antonakopoulos
AAccurate Emulation of Memristive Crossbar Arraysfor In-Memory Computing
Anastasios Petropoulos ∗ , Irem Boybat †‡ , Manuel Le Gallo † , Evangelos Eleftheriou † ,Abu Sebastian † , and Theodore Antonakopoulos ∗∗ University of Patras, Dept. of ECE, 26504 Patras, Greece, Email: { a.petropoulos, antonako } @ece.upatras.gr † IBM Research - Zurich, 8803 R¨uschlikon, Switzerland, Email: { ibo, anu, ele, ase } @zurich.ibm.com ‡ Ecole Polytechnique Federale de Lausanne (EPFL), 1015 Lausanne, Switzerland
Abstract —In-memory computing is an emerging non-von Neu-mann computing paradigm where certain computational tasksare performed in memory by exploiting the physical attributesof the memory devices. Memristive devices such as phase-changememory (PCM), where information is stored in terms of theirconductance levels, are especially well suited for in-memorycomputing. In particular, memristive devices, when organizedin a crossbar configuration can be used to perform matrix-vector multiply operations by exploiting Kirchhoff’s circuit laws.To explore the feasibility of such in-memory computing coresin applications such as deep learning as well as for system-level architectural exploration, it is highly desirable to developan accurate hardware emulator that captures the key physicalattributes of the memristive devices. Here, we present onesuch emulator for PCM and experimentally validate it usingmeasurements from a PCM prototype chip. Moreover, we presentan application of the emulator for neural network inferencewhere our emulator can capture the conductance evolution ofapproximately 400,000 PCM devices remarkably well.
Index Terms —In-memory computing, neural networks, phase-change memory, hardware emulator
I. I
NTRODUCTION
The explosive growth in data-centric artificial intelligencerelated applications has necessitated the exploration of non-von Neumann computing paradigms such as in-memory com-puting. In in-memory computing, the physical attributes ofmemory devices are exploited to perform computational tasksin place without the need to shuttle around data between thememory and the processing units [1], [2], [3], [4], [5], [6].A new class of emerging memory devices known as resistivememory or memristive devices are particularly well suited forin-memory computing [7]. For example, the memristive de-vices, when organized in a crossbar configuration can be usedto perform matrix-vector multiply operations. Here, the matrixelements are stored in terms of the conductance values of thememristive devices. By exploiting Ohm’s law and Kirchhoff’scurrent summation law, the matrix-vector multiply operationcan be performed in constant time. This computational ca-pability makes in-memory computing especially interestingfor applications such as deep learning training and inference,where cascaded stages of matrix-vector multiplications formthe bulk of computation [8], [9], [10]. The forward propagation(inference) stage, as well as the backpropagation, can berealized by merely reading the array.In spite of the promise of in-memory computing for applica-tions such as deep learning, several open questions need to be ln exp Conductance drift −𝑣 V I
IFFT
1/ 𝑓
ReIm
𝑁~(0,𝑄)
G(t) VI G(t) a b ൗ 𝑡 𝑡 𝐺 𝑡 Τ𝑁 𝐹𝐹𝑇
Fig. 1: (a) Schematic illustration of a PCM device with amushroom-type device geometry. (b) Corresponding PCM cellmodel used for the FPGA-based emulator design.addressed. First, it is essential to understand the computationalreliability and accuracy of memristive in-memory cores for arange of applications since memristive devices exhibit non-idealities, such as temporal variations of conductance values.Identifying desired device characteristics for target applica-tions can provide useful insight into future device designs.Furthermore, it is critical to develop efficient system archi-tectures that involve cascaded memristive in-memory coresfor applications such as deep learning. Note that comparedto all-digital implementations, in-memory computing is moreamenable for highly pipelined dataflows. Finally, it is ofsignificant importance to develop a versatile software stackthat can map the applications to the multi-core in-memorycomputing hardware. An accurate and fast hardware emulatorof memristive devices and computing cores will be an indis-pensable tool to address all of these goals. In comparison toa software simulator, a custom-designed hardware counterpartcan perform the prototyping of an in-memory computing corein a more rapid manner.An FPGA-based hardware emulator for PCM arrays, whichcan mimic the temporal conductance evolution of PCM, hasbeen previously demonstrated in [11]. The system is shownto perform a matrix-vector multiplication on a 256x256 emu-lated array in only 136.16 microseconds. However, functionalverification with experimental data has not been demonstratedyet. In this paper, we show for the first time an FPGA-basedhardware emulator that can reliably capture experimental PCMcharacteristics. In Section II, we present the emulation ofsingle PCM devices in an FPGA where we capture the keyphysical attributes such as conductance drift and /f noise.We validate the emulator using experimental measurements c (cid:13) a r X i v : . [ c s . ET ] A p r bc d Fig. 2: Experimental measurements on mushroom-type PCM devices fabricated in 90 nm technology node. (a) Mean conductanceevolution of 100 devices programmed to different target conductances and the corresponding linear fits according to Eq.(1). The shades denote one standard deviation. (b) Power spectral density (PSD) of the conductance signals for each targetconductance level and the corresponding fit according to Eq. (2). (c) Emulation of the mean conductance evolution of 100devices programmed to three different target conductance levels. (d) Emulation of the conductance evolution of 3 PCM devices.from 100 devices from a prototype PCM chip programmed tovarious conductance levels. In Section III, we present how aPCM multi-cell crossbar array emulation can be constructed.Finally, in section IV, we illustrate the application of the PCMcrossbar emulator for neural network inference, and we vali-date our results with an experiment involving approximately400,000 PCM devices.II. PCM C
ELL E MULATION
PCM is arguably the most advanced resistive memorytechnology and has been widely employed for in-memorycomputing [12], [13], [14], [15], [16]. PCM exploits thebehavior of certain phase-change materials such as Ge Sb Te that can be switched reversibly between amorphous and crys-talline phases of different electrical resistivity. A PCM deviceconsists of a certain volume of this phase-change materialsandwiched between two electrodes (see Fig. 1a). By applyingsuitable electrical pulses, referred to as programming pulses,it is possible to alter the phase configuration within thePCM device and achieve different conductance values. Byiterative programming schemes comprising multiple program-and-verify steps, it is possible to obtain any desired conduc-tance value within a certain error margin [17]. However, theprogrammed conductance values exhibit temporal variationssuch as drift, which is attributed to the structural relaxation ofthe unstable amorphous phase [18], and /f noise. These tem-poral variations are shown to be detrimental for PCM-basedimplementations [19] and hence need to be well captured bya PCM cell emulator. As shown in Fig. 1b, the PCM cell emulator consists oftwo functional modules, one for the conductance drift andone for the /f noise. The drift is modeled according to thefollowing power-law equation and can be rearranged for easeof hardware implementation. G ( t ) = G ( t ) (cid:18) tt (cid:19) − ν = G ( t ) exp (cid:18) − ν ln (cid:18) tt (cid:19)(cid:19) (1)In Eq. (1), G ( t ) denotes the conductance value at time instance t , G ( t ) denotes the conductance at time t , and ν is thedrift exponent. For 90 nm doped GST devices, ν is reportedto take values between 0.03 and 0.1, depending on the initialamorphous volume created with the programming pulse [20],[21]. There is also variability associated with ν [20], [22], [23].To capture these observations, we sample the drift exponent ofindividual devices from a Gaussian distribution with a certainmean and standard deviation. For the /f noise module,a hardware block was designed in order to implement theequation: S I noise ( f ) = I read Q f (2) S I noise ( f ) denotes the power spectral density associated withthe read noise [24]. I read denotes the mean read current whenbiased by the read voltage, V . As shown in Fig. 1b, we gener-ate two independent and normally distributed random vectorswith a dimension of N F F T / , with known variance, Q , andzero mean, and we use them as a complex Gaussian randomvector in the frequency domain. The amplitude of the complex CM cell G x,1 G x,NV1VN G x,2 Δ T ++ ++ ++ PCM cell
PCM cell
PCM cell
N,1
PCM cell
N,2
PCM cell
N,N V V V N V N G1,1 G1,2 G1,NGN,1 GN,2 GN,N
I1 I2 IN Δ T Δ T Δ T Δ T Δ T Δ T - 𝒗 x,1 - 𝒗 x,2 - 𝒗 x,N - 𝑣 - 𝑣 - 𝑣 - 𝑣 N,1 - 𝑣 N,2 - 𝑣 N,N a b
Soft-CPU
PCIe Switch
DRAM
PCIe Controller (Gen.3, 8 lanes)
Xeon ProcessorUbuntu OS
SSD
DRAM
DRAM
DRAM
PCM Emulated Crossbar
Crossbar
ToHostDramHostDramto PCMCrossbarPCM Crossbar Adaptation Layer
HostDRAM
Fig. 3: The PCM crossbar emulator. (a) Functional diagram [11]. (b) Hardware diagram of the PCIe-based FPGA system.Gaussian random vector is scaled by the / √ f factor. Then thenegative frequency spectral samples are determined to satisfyfor Hermitian symmetry. Finally, the inverse Fourier transform( N F F T points) is applied for generating a real-valued timeseries with the desired noise characteristics [25].The underlying functions associated with the drift and noisefunctional modules were implemented with the utilizationof floating-point cores that integrate DSP slices, which arepipelined for achieving high throughput, if a large number ofcells has to be emulated. With this model, we can investigatethe influence of drift and /f noise on scalar multiplication.For the experimental validation of the PCM cell emulator,we used an experimental platform with 1 million mushroom-type PCM cells, with doped Ge Sb Te as the phase-changematerial, and fabricated in 90 nm CMOS technology. In orderto verify our PCM cell emulator for different conductancelevels, we programmed 100 devices to a range of 2 µ S to 40 µ Susing iterative programming. Subsequently, the read currentfrom each device measured for approximately 9 seconds witha sampling rate of 112 kHz. The device conductances wereestimated based on the read voltage of V = 0 . .Fig. 2a shows the evolution of PCM conductance states. Itcan be seen that the mean behavior matches the relationshippredicted by Eq. (1). For each targeted conductance level, aline is fitted on the average conductance evolution. The driftexponent is calculated from the slope of this linear fit. Notethat ν depends on the initially created amorphous volume. Weassume a constant standard deviation of 0.02 for ν . To estimatethe noise and its power spectral density, we used the readmeasurements obtained during the last second. The reason forthis is to decouple the effect of drift from the /f noise mea-surement as drift slows down significantly with time. Fig. 2bpresents the PSD of the /f noise and the corresponding fittingcurves with respect to Eq. (2) for different target conductancelevels. Based on these measurements, it is observed that Q becomes higher as the target conductance level becomes lower,also reported by [23], [26]. The observed values of Q werefrom 5.1 × − to 1.1 × − . Typically, the noise in PCMfollows a /f γ relationship, where γ is reported to be withinthe range of 0.9-1.1 [27]. The deviation from the /f behavioris also evident in Fig. 2b. However, for modeling simplicity, weassume an ideal /f relationship, where we use N F F T = ROSSBAR E MULATION
The PCM cell emulation model was utilized for emulating amulti-cell PCM crossbar architecture, as shown in Fig. 3a. Thedepicted architecture uses NxN PCM emulated cells, whereeach one has its own conductance and drift exponent modelparameter. Also, all cells are supplied with samples from thesame /f noise generator, but each emulated cell is fed witha different instantiation of noise samples.A sequential execution of N dot-products in the emulatorcan be used to simulate the matrix-vector multiplication of aNxN crossbar in hardware. For our scenario, each element-wise multiplication of the dot-product is performed by oneemulated PCM cell. The element-wise multiplication can beused to implement a k-element dot-product in a column of thecrossbar, where the k-factor depends on the available hardwareresources. Thus, a dot-product with a dimension greater thanthe k-factor can be achieved by executing its operation severaltimes with the addition of the partial results using a treestructure of adders and an accumulator [11], [28].The components of the system, which implements theemulated crossbar design, are shown in Fig. 3b. The emulatedPCM crossbar consists of two dedicated DRAM memories forstoring the crossbar conductances and drift coefficients, whileanother DRAM is used for storing pre-generated /f noisesamples. Two dedicated data mover engines (HostDRAM -PCM Crossbar) are used for high speed (8 GBps) transfersof data between the crossbar and the server’s memory. APCM crossbar adaptation layer is used for encoding weights toconductances, transforming vector data to voltage values andalso decoding the resulted current values. Also, it containsnonlinearity functional blocks (i.e., RELU, sigmoid, tanh) forneural network applications. In addition, the system incorpo-rates a soft-CPU for initialization and control. The soft-CPUinteracts with a host application using a dedicated device driver Gtarget (0, 2.5) S Gtarget (2.5, 7.5) S Gtarget (7.5, 12.5) S -3 -2 -1 -3 -2 -1 -3 -2 -1 N o r m a li ze d C oun t -3 -2 -1
28 hr
ExperimentalEmulated Time (sec)97.897.99898.1 A cc u r ac y ( % ) ExperimentalEmulated a b
Layer-2 Layer-1
Fig. 4: Neural network inference results. (a) Evolution of accuracy over time from PCM experimental results compared withthe mean behavior of emulated results and their shaded region representing one standard deviation. The inset presents themapping scheme of the network’s layers in a single emulated crossbar. (b) Experimental and emulated conductance distributionof the neural network’s encoded weights for three different conductance ranges.and descriptor-structured data transfers. The PCM crossbaremulator has been implemented on a Kintex UltraScale FPGAand has been tested on a high-end Xeon server.For a matrix-vector multiplication scenario, the host ini-tiates requests for downloading, conductance encoding, andconductance data storing to the FPGA’s DRAM memories.Also, it initializes a dedicated DRAM area with /f noisesamples. In order to start the emulated crossbar for matrix-vector operations, three main procedures are performed: (a)vector data are received through the host, passing by thePCM crossbar adaptation layer and transformed to voltages,(b) concurrently, the conductance matrix is loaded from theDRAMs to the crossbar along with /f noise values, andthe computation is started, (c) finally, the resulted currentsare processed by the adaptation layer and then uploaded tothe host’s DRAM. Such a versatile process can operate in apipelined fashion using several crossbars.IV. N EURAL N ETWORK I NFERENCE USING THEEMULATED
PCM C
ROSSBAR
For the evaluation of the emulated PCM crossbar, we con-sidered the task of MNIST handwritten digit recognition. Forthat purpose, we compared emulated, and experimental infer-ence results from PCM arrays over time for a fully-connectedneural network with two layers. The network dimensionsare 784-250-10, and it was trained in software using single-precision floating-point weights. Next, the trained weightswere iteratively programmed to conductance values on thePCM prototype chip, utilizing approximately 400,000 PCMdevices. These weights are linearly mapped to conductancevalues. A differential PCM configuration is used for eachsynapse where one device denotes the positive part of theweight, and the other device denotes the negative part of theweight. According to the sign of the weight, one device of eachdifferential pair is set close to µ S. We use the programmedconductance values of PCM devices at 23 µ sec as the emula-tor’s initial state [10]. The subsequent conductance values oflater time steps are determined with model parameters (i.e., drift exponent, /f variance Q factor). For simplicity, weadopt the parameters used to emulate the behavior of deviceswith target conductance of µ S in Section II ( ¯ ν = 0 . , σ ν = 0 . , Q = 4 × − ). Note that the target conductancesrepresenting the network weights are mostly contained withinthe range of 0 to µ S.In Fig. 4a, we present accuracy results of neural networkinference for a time period greater than 27 hours. The evo-lution of the mean accuracy over time from the experimentis well captured by the emulator. Additionally, to furtherverify our model regarding the conductance drift and noise,we show the evolution of the network’s weight distributionencoded to conductances. As depicted in Fig. 4b, the emulatedresults capture well the temporal evolution of the conductancedistributions for different target conductance states.For this inference application, both weight layers of theneural network were emulated in a single crossbar in apipelined fashion. This is achieved by using a crossbar size of1034x520, to fit both layer dimensions, with redundant cells(zero conductance) in appropriate places of the weights’ matrix(see Fig. 4a inset). With this mapping approach, our emulatorachieves a processing rate of 8.8 kilo-images per second and µ sec latency. V. C ONCLUSION
In this work, we presented an accurate FPGA-based hard-ware emulator for phase-change memory that captures the keyphysical attributes such as temporal drift of conductance valuesas well as /f noise. The PCM cell emulator and its extensionto the PCM crossbar emulator were experimentally validatedusing a prototype PCM array based on a deep learninginference hardware experiment that involves approximately400,000 PCM devices. The presented hardware emulator canbe a powerful tool for the exploration of in-memory comput-ing and its applications. This approach is scalable to largernetworks and more complex problems, while the applicationdomain is not restricted to neural network inference as thisemulator can benefit other in-memory computing scenarios. EFERENCES[1] A. Sebastian, T. Tuma, N. Papandreou, M. Le Gallo, L. Kull, T. Parnell,and E. Eleftheriou, “Temporal correlation detection using computationalphase-change memory,”
Nature Communications , vol. 8, no. 1, p. 1115,2017.[2] D. Ielmini and H.-S. P. Wong, “In-memory computing with resistiveswitching devices,”
Nature Electronics , vol. 1, no. 6, p. 333, 2018.[3] N. Verma, H. Jia, H. Valavi, Y. Tang, M. Ozatay, L.-Y. Chen, B. Zhang,and P. Deaville, “In-memory computing: Advances and prospects,”
IEEESolid-State Circuits Magazine , vol. 11, no. 3, pp. 43–55, 2019.[4] A. Serb, J. Bill, A. Khiat, R. Berdan, R. Legenstein, and T. Prodromakis,“Unsupervised learning in probabilistic neural networks with multi-statemetal-oxide memristive synapses,”
Nature communications , vol. 7, p.12611, 2016.[5] S. Yu, “Neuro-inspired computing with emerging nonvolatile memorys,”
Proceedings of the IEEE , vol. 106, no. 2, pp. 260–285, 2018.[6] I. Vourkas and G. C. Sirakoulis, “Emerging memristor-based logic circuitdesign approaches: A review,”
IEEE Circuits and Systems Magazine ,vol. 16, no. 3, pp. 15–30, 2016.[7] G. W. Burr, R. M. Shelby, A. Sebastian, S. Kim, S. Kim, S. Sidler,K. Virwani, M. Ishii, P. Narayanan, A. Fumarola et al. , “Neuromorphiccomputing using non-volatile memory,”
Advances in Physics: X , vol. 2,no. 1, pp. 89–124, 2017.[8] G. W. Burr, R. M. Shelby, S. Sidler, C. Di Nolfo, J. Jang, I. Boybat,R. S. Shenoy, P. Narayanan, K. Virwani, E. U. Giacometti et al. ,“Experimental demonstration and tolerancing of a large-scale neuralnetwork (165 000 synapses) using phase-change memory as the synapticweight element,”
IEEE Transactions on Electron Devices , vol. 62, no. 11,pp. 3498–3507, 2015.[9] Z. Wang, S. Joshi, S. Savelev, W. Song, R. Midya, Y. Li, M. Rao, P. Yan,S. Asapu, Y. Zhuo et al. , “Fully memristive neural networks for patternclassification with unsupervised learning,”
Nature Electronics , vol. 1,no. 2, p. 137, 2018.[10] A. Sebastian, I. Boybat, M. Dazzi, I. Giannopoulos, V. Jonnalagadda,V. Joshi, G. Karunaratne, B. Kersting, R. Khaddam-Aljameh, S. Nan-dakumar et al. , “Computational memory-based inference and training ofdeep neural networks,” in . IEEE,2019, pp. T168–T169.[11] A. Petropoulos and T. Antonakopoulos, “Accurate PCM crosspointemulator and its use on eigenvalues calculation,” in .IEEE, 2018, pp. 549–552.[12] G. W. Burr, M. J. Brightsky, A. Sebastian, H.-Y. Cheng, J.-Y. Wu,S. Kim, N. E. Sosa, N. Papandreou, H.-L. Lung, H. Pozidis et al. ,“Recent progress in phase-change memory technology,”
IEEE Journalon Emerging and Selected Topics in Circuits and Systems , vol. 6, no. 2,pp. 146–162, 2016.[13] A. Sebastian, M. Le Gallo, and E. Eleftheriou, “Computational phase-change memory: beyond von Neumann computing,”
Journal of PhysicsD: Applied Physics , vol. 52, no. 44, p. 443002, 2019.[14] M. Le Gallo, A. Sebastian, R. Mathis, M. Manica, H. Giefers, T. Tuma,C. Bekas, A. Curioni, and E. Eleftheriou, “Mixed-precision in-memorycomputing,”
Nature Electronics , vol. 1, no. 4, p. 246, 2018.[15] I. Boybat, M. Le Gallo, S. Nandakumar, T. Moraitis, T. Parnell,T. Tuma, B. Rajendran, Y. Leblebici, A. Sebastian, and E. Eleftheriou,“Neuromorphic computing with multi-memristive synapses,”
Naturecommunications , vol. 9, no. 1, p. 2514, 2018.[16] S. Nandakumar, M. Le Gallo, I. Boybat, B. Rajendran, A. Sebastian, andE. Eleftheriou, “Mixed-precision architecture based on computationalmemory for training deep neural networks,” in . IEEE, 2018, pp. 1–5.[17] N. Papandreou, H. Pozidis, A. Pantazi, A. Sebastian, M. Breitwisch,C. Lam, and E. Eleftheriou, “Programming algorithms for multilevelphase-change memory,” in . IEEE, 2011, pp. 329–332.[18] M. Le Gallo, D. Krebs, F. Zipoli, M. Salinga, and A. Sebastian, “Collec-tive structural relaxation in phase-change memory devices,”
AdvancedElectronic Materials , vol. 4, no. 9, p. 1700627, 2018.[19] V. Joshi, M. L. Gallo, I. Boybat, S. Haefeli, C. Piveteau, M. Dazzi,B. Rajendran, A. Sebastian, and E. Eleftheriou, “Accurate deep neuralnetwork inference using computational phase-change memory,” arXivpreprint arXiv:1906.03138 , 2019. [20] M. Le Gallo, A. Sebastian, G. Cherubini, H. Giefers, and E. Eleftheriou,“Compressed sensing recovery using computational memory,” in . IEEE, 2017,pp. 28–3.[21] I. Boybat, S. Nandakumar, M. Le Gallo, B. Rajendran, Y. Leblebici,A. Sebastian, and E. Eleftheriou, “Impact of conductance drift on multi-pcm synaptic architectures,” in . IEEE, 2018, pp. 1–4.[22] M. Boniardi, D. Ielmini, S. Lavizzari, A. L. Lacaita, A. Redaelli, andA. Pirovano, “Statistics of resistance drift due to structural relaxation inphase-change memory arrays,”
IEEE Transactions on Electron Devices ,vol. 57, no. 10, pp. 2690–2696, 2010.[23] S. Nandakumar, I. Boybat, V. Joshi, C. Piveteau, M. Le Gallo, B. Rajen-dran, A. Sebastian, and E. Eleftheriou, “Phase-change memory modelsfor deep learning training and inference,” in . IEEE,2019, pp. 727–730.[24] G. Close, U. Frey, M. Breitwisch, H. Lung, C. Lam, C. Hagleitner, andE. Eleftheriou, “Device, circuit and system-level analysis of noise inmulti-bit phase-change memory,” in . IEEE, 2010, pp. 29–5.[25] J. Timmer and M. K¨onig, “On generating power law noise.”
Astronomyand Astrophysics , vol. 300, p. 707, 1995.[26] P. Fantini, A. Pirovano, D. Ventrice, and A. Redaelli, “Experimentalinvestigation of transport properties in chalcogenide materials through1/ f noise measurements,”
Applied physics letters , vol. 88, no. 26, p.263506, 2006.[27] P. Fantini, G. B. Beneventi, A. Calderoni, L. Larcher, P. Pavan, andF. Pellizzer, “Characterization and modelling of low-frequency noise inpcm devices,” in .IEEE, 2008, pp. 1–4.[28] A. Petropoulos and T. Antonakopoulos, “A versatile PCM-based circuitsemulator and its use on implementing linear algebra functions,” in201821st Euromicro Conference on Digital System Design (DSD)