[PDF] A neuromorphic systems approach to in-memory computing with non-ideal memristive devices: From mitigation to exploitation

Abstract

Memristive devices represent a promising technology for building neuromorphic electronic systems. In addition to their compactness and non-volatility features, they are characterized by computationally relevant physical properties, such as state-dependence, non-linear conductance changes, and intrinsic variability in both their switching threshold and conductance values, that make them ideal devices for emulating the bio-physics of real synapses. In this paper we present a spiking neural network architecture that supports the use of memristive devices as synaptic elements, and propose mixed-signal analog-digital interfacing circuits which mitigate the effect of variability in their conductance values and exploit their variability in the switching threshold, for implementing stochastic learning. The effect of device variability is mitigated by using pairs of memristive devices configured in a complementary push-pull mechanism and interfaced to a current-mode normalizer circuit. The stochastic learning mechanism is obtained by mapping the desired change in synaptic weight into a corresponding switching probability that is derived from the intrinsic stochastic behavior of memristive devices. We demonstrate the features of the CMOS circuits and apply the architecture proposed to a standard neural network hand-written digit classification benchmark based on the MNIST data-set. We evaluate the performance of the approach proposed on this benchmark using behavioral-level spiking neural network simulation, showing both the effect of the reduction in conductance variability produced by the current-mode normalizer circuit, and the increase in performance as a function of the number of memristive devices used in each synapse.

Full PDF

JJournal Name

A neuromorphic systems approach to in-memorycomputing with non-ideal memristive devices:From mitigation to exploitation † Melika Payvand, ∗ a Manu V Nair, a ‡ Lorenz K. Muller, ∗ a and Giacomo Indiveri a Memristive devices represent a promising technology for building neuromorphic electronic sys-tems. In addition to their compactness and non-volatility features, they are characterized bycomputationally relevant physical properties, such as state-dependence, non-linear conductancechanges, and intrinsic variability in both their switching threshold and conductance values, thatmake them ideal devices for emulating the bio-physics of real synapses. In this paper we presenta spiking neural network architecture that supports the use of memristive devices as synapticelements, and propose mixed-signal analog-digital interfacing circuits which mitigate the effectof variability in their conductance values and exploit their variability in the switching threshold,for implementing stochastic learning. The effect of device variability is mitigated by using pairsof memristive devices conﬁgured in a complementary push-pull mechanism and interfaced to acurrent-mode normalizer circuit. The stochastic learning mechanism is obtained by mapping thedesired change in synaptic weight into a corresponding switching probability that is derived fromthe intrinsic stochastic behavior of memristive devices. We demonstrate the features of the CMOScircuits and apply the architecture proposed to a standard neural network hand-written digit classi-ﬁcation benchmark based on the MNIST data-set. We evaluate the performance of the approachproposed on this benchmark using behavioral-level spiking neural network simulation, showingboth the effect of the reduction in conductance variability produced by the current-mode normal-izer circuit, and the increase in performance as a function of the number of memristive devicesused in each synapse.

Neuromorphic computing systems comprise synapse and neuroncircuits arranged in a massively parallel manner to support theemulation of large-scale spiking neural networks . In many ofthese systems, and in particular in neuromorphic processing de-vices designed to overcome the von-Neumann bottleneck prob-lem , the bulk of the silicon real-estate is taken up bysynaptic circuits that integrate in the same area both memory andcomputational primitives. To save area and maximize density insuch devices, one possible approach is to implement very basicsynapse circuits arranged in dense cross-bar arrays . How-ever, such approach is likely to relegate the role of the synapse to a a Address, Address, Town, Country. Fax: XX XXXX XXXX; Tel: XX XXXX XXXX; E-mail:[email protected] b Address, Address, Town, Country. † Electronic Supplementary Information (ESI) available: [details of any supplemen-tary information available should be included here]. See DOI: 10.1039/b000000x/ ‡ Additional footnotes to the title and authors can be included e.g. ‘Present address:’or ‘These authors contributed equally to this work’ as above using the symbols: ‡ , §,and ¶ . Please place the appropriate symbol next tibo the author’s name and includea \footnotetext entry in the the correct place in the list. basic multiplier . In biology, synapses are extremely sophisti-cated structures that exhibit complex and powerful computationalproperties, including temporal dynamics, state-dependence, andstochastic learning behavior. The challenge is to design neu-romorphic circuits that emulate these computational properties,and are also compact and low power. Memristive devices haverecently emerged as nano-scale devices which provide a promis-ing technology for addressing these problems . These de-vices offer a compact and efﬁcient solution to model synapticweights since they are non-volatile, have a nano-scale footprint,can be integrated with Complementary Metal-Oxide Semiconduc-tor (CMOS) chips , might only require little energy to changetheir state , and in addition can emulate many of the synapticfunctions observed in biological synapses . However, thesedevices are also characterized by non-idealities that introduce sig-niﬁcant challenges in designing neural network architectures ap-plied to classiﬁcation and recognition tasks. In particular, oneproperty of memristive devices that introduces signiﬁcant chal-lenges in the design of large scale neural network architecturesis the large variability of their operational parameters. Memris- Journal Name, [year], [vol.] ,, a r X i v : . [ c s . ET ] J u l ive device variability exhibits itself in different forms, both be-tween device to device (spatial) and from cycle to cycle within asingle device (temporal). This variability therefore manifests it-self both in the device conductance values and in their switch-ing voltage . Device-to-device variability originates from pro-cess variations which also exists in current CMOS process, whilethe cycle-to-cycle variability stems from the underlying switch-ing mechanism of memristors. The cycle-to-cycle variability isobserved in different types of memristors, from Phase ChangeMemories (PCMs) and Conductive Bridge RAMs to ionicredox-based resistive RAMs . In particular, in the latter case,the underlying mechanism for this variability is associated withthe formation and rupture of a conducting ﬁlament. Filament for-mation involves oxidation, ion transport and reduction which areall thermodynamical processes and as a result require overcom-ing an energy barrier. Therefore, the switching involves thermalactivation to surpass the barrier and thus is a probabilistic pro-cess. In other words, for the same devices and the same ﬁlament,the nature of the switching events will occur randomly and is thus stochastic . .To summarize, the variability in memristive devices results ina distribution of different parameters that can be categorized infour distinct groups: G1 Distribution of the switching voltage of a single device G2 Distribution of the high and low resistive states of a singledevice G3 Distribution of the switching voltages among multiple devices G4 Distribution of the high and low resistive states among mul-tiple devicesThe variability of parameters across multiple devices (e.g., forgroups G3 and G4) can be mitigated and managed for example byconsidering only binary states , by implementing “compound”synapses that employ multiple memristive devices per synapticelement , or by interfacing the memristive devices to CMOSprocessing stages that reduce the effect of their variability .Conversely, the cycle-to-cycle variability (e.g., in groups G1 andG2) can be managed by using feedback control to set the desiredstate to a well-deﬁned value , which requires a large overheadcontrol circuit, or it can be exploited as a means to implementstochastic learning in spiking neural networks . Indeed,it has been shown that employing binary synapses, variability andrandomness in their switching threshold in spiking neural net-works greatly improves the convergence of the network and pro-vides a form of regularization which substantially improves thenetwork generalization performance . In the case of neu-ral networks with low resolution synapses, it has been shown thata randomized gradient descent method signiﬁcantly outperformsnaive deterministic rounding methods .Memristive devices are a promising emerging technology foruse in large-scale neural network architectures . Em-ploying such devices in neural processing systems for robust com-putation in real-world practical applications calls for ways to ei-ther mitigate their non-idealities, to exploit them, or to combine the best of both approaches in the same architecture. In this pa-per we present a spiking neural network architecture that supportthe use of variable and stochastic memristive devices for robustinference and probabilistic learning. We show that by combiningsuch devices with state-of-the-art mixed-signal digital and analogsubthreshold circuits , it is possible to build electronic learningsystems with biologically plausible functionality which can pro-cess and classify sensory data directly on-chip in real-time, andwhich represent ideal technologies for always-on edge-computingneural network applications. We propose synapse-CMOS interfac-ing circuits that dramatically reduce the effect of device-to-devicevariability, as well as spike-based learning circuits that are com-patible and exploit the device cycle-to-cycle variability to imple-ment stochastic learning. We validate the functionality of suchcircuits by applying the neural network architecture to a patternclassiﬁcation task, using a standard digit recognition benchmarkbased on the Modiﬁed National Institute of Standards and Tech-nology (MNIST) data-set . In the next section we describe thespiking neural network architecture, explain its basic principle ofevent-based operation, and present its main neuromorphic build-ing blocks; in Section 2 we present the memristive synapse cir-cuits and their related current-mode sense circuits used to reducethe device-to-device variability for improving the network perfor-mance in its inference phase; in Section 3 we present the spike-based stochastic learning circuits that exploit the devices cycle-to-cycle variability for inducing probabilistic state changes in thenetwork synaptic weights; Section 4 presents behavioral simula-tions results at the system level, in the hand-written digit recog-nition benchmark to validate the proposed circuits and approach;ﬁnally in Section 6 we present the concluding remarks. The spiking neural network architecture that supports the use ofmemristive circuits as synapse elements is shown in Fig. 1. Thisarchitecture expects input spikes and produces output spikes thatare encoded as Address-Events: each neuron is assigned a uniqueaddress, and when it produces an output spike, a correspondingdigital pulse is encoded on a common shared time-multiplexedbus with its corresponding address. Potential collisions arisingfrom multiple neurons requesting access to the same bus are han-dled by asynchronous arbiter circuits, that are part of the Address-Event Representation (AER) protocol . In this protocol, theanalog information present in the silicon neuron is encoded inthe time interval between its address-events. The asynchronousnature of this communication protocol ensures that precise tim-ing information is preserved, and signals are transmitted onlywhen there is neural activity. As neural activity in spiking neu-ral networks is typically sparse in both space and time, this pro-tocol is ideal for minimizing power-consumption and maximiz-ing bandwidth . The architecture of Fig. 1 comprises multiplerows of neurons, each composed of multiple Memristive Synapse(MR) elements, Integrate and Fire (I&F) soma circuits, and ad-ditional interfacing circuits for managing the input pulse shapes,the synaptic currents, their temporal dynamics, and the spike-based learning mechanism. Upon the arrival of an input Address-Event, this is decoded by the AER input circuits into a one-hot Journal Name, [year], [vol.] , PS PS PSAER INPUT A E R O U T P U T LBError1 - UP/DNI & F +- iNeuron I pos1 I neg1 I neg1 I pos1 DPI +DPI - DPIPC MS MS iTarget MS NC LBErrorN - UP/DNI & F +- iNeuron I posN I negN I negN I posN DPI +DPI - DPIPC MS MS N iTarget N MS NC

Fig. 1

Neuromorphic architecture comprising multiple silicon neurons, each receiving inputs from CMOS-memristive synapse elements. MS is shortfor Memristive Synapse, PS for Pulse Shaper, NC for Normalizer Circuit, DPI for Differential Pair Integrator, I&F Neuron for Integrate and Fire Neuron,LB for Learning Block and PC for Programming Circuitry. PS PULSE EXTENDER Write

Write

AER input event PULSE EXTENDER

Read

Fig. 2

Pulse shaper (PS) block schematic. With the arrival of an inputevent from the AER block, two consecutive pulses Read and Write aregenerated by two digital Pulse Extender circuits. pulse to be transmitted to the target column in the network. Thisdecoded pulse is then converted by a dedicated Pulse Shaper (PS)circuit, which produces a

Read and a

Write pulse, used to mea-sure the currents through the memristive synapse elements andpotentially change their conductance values correspondingly. Aschematic diagram of the PS circuit is shown in Fig. 2. Thepulse extender circuit block in the ﬁgure is based on a classicala starved-inverter circuit, and has been characterized in previouswork . The output of the PS block is then broadcast to all MS synapse blocks of the corresponding column. Each MS synapsecomprises one pair of memristive devices arranged in a comple-mentary conﬁguration (see D pos and D neg of Fig. 3). The pairsof devices are arranged in a way to produce positive contributingcurrents (modeling excitatory synapses) and negative contribut-ing ones (modeling inhibitory synapses) during the “read-phase”,and are updated in a push-pull way during the “write-phase” (i.e.,if the conductance of one device is increased, the conductance ofthe complementary device is decreased, and vice-versa). Speciﬁ-cally, during the read phase, the V drive voltage of Fig. 3 is set to asmall value, such that small currents (e.g., of the order of nano-Ampere) will ﬂow through the memristive pair onto the separatepositive and negative summing lines. Conversely, during the write Write Write M2T2Write Write M1T1 Read Read I exc I inh Vdd/2

Vdrive Dpos Dneg MS Fig. 3

A single Memristive Synapse (MS) block of the proposed neuro-morphic system. The devices D pos and D neg are modeling the excitatoryand inhibitory synapses respectively. When the Read pulse signal fromthe corresponding column is active, the excitatory currents sum togetheron the excitatory ∑ I exc and the inhibitory ∑ I inh lines. Similarly, when theWrite pulse is high, the switches connect the devices to the programminglines. phase, digital control signals disable the connection to the currentsumming lines and enable the connection to the weight updateProgramming Circuits ( PC ), which set the V drive signal to either V dd or Gnd depending on the sign of

Error signal produced byspike based learning Block ( LB ) of the corresponding row.During the read phase, the output currents produced by all MSblocks along a row in the architecture are summed through Kirch-hoff’s current law and conveyed to a Normalizer Circuit ( NC )block. This is a current-mode circuit based on the Gilbert nor-malizer circuit which receives the positive and negative con-tributions of currents from the memristive devices and producestwo corresponding output currents that are scaled and normal-ized appropriately. As this circuit plays a fundamental role inreducing the effect of device variability across all memristive de- Journal Name, [year], [vol.] ,,

Current-mode normalizer circuit (NC) block. Input currents comingfrom multiple synapses from the excitatory and inhibitory lines are scaledand normalized. vices present in the neuron row, we describe its functionality indetail in Section 2.The positive and negative output currents produced by the NCblock are then sent to two separate Differential Pair Integrator(

DPI ) circuits . These are current-mode linear integrator ﬁltersthat integrate the incoming current pulses and produce tempo-rally decaying currents that faithfully model the Excitatory PostSynaptic Current (EPSC) and Inhibitory Post Synaptic Current(IPSC) counterparts of real biological synapses. The differencebetween positive and negative synaptic current contributions isthen sent into the I&F soma block, that temporally integratesthese currents and produces an output spike as soon as the in-tegrated current reaches the neuron’s ﬁring threshold. Both DPIand I&F blocks have been fully characterized and explained in aprevious work .The output spikes of the I&F block are sent to the AER out-put circuits, as well as to an additional DPI circuit that integratesthe neurons spikes. The output current of this DPI circuit (see iNeuron of Fig. 1) is proportional to the neuron’s average ﬁr-ing rate. It is sent as input to the neuron’s Learning Block ( LB ),which compares the neuron’s output ﬁring rate to a desired tar-get value, and produces an error signal that is proportional to thedifference. This error signal is then used by the correspondingrow Programming Circuit ( PC ) block to change the probability ofsynaptic weight update in the synapses that were stimulated bythe incoming Address-Event. These circuits implement the proba-bilistic “Delta” learning rule used in the architecture, and theyare fully described in Section 3. The memristive current normalizer circuit is shown in Fig. 4. Thecircuit is operated in the weak inversion, or subthreshold do-main where transistors have an exponential transfer function,in order to reproduce the functionality of the Gilbert-normalizerelement which was originally designed for use with bi-polartransistors. The input signals to this circuit are given by the sumof the currents measured across the memristive devices in the cor-responding neuron row (see also Fig. 1). The circuit has a differ-ential input, provided by the positive and negative summing linesof the circuit’s row. As these input currents are proportional to thevalues of the memristive devices, they can be affected by a largevariation in their values. However, it has been demonstrated . . . . . . N o r m a li ze d c o un t Ω D neg Ω D pos (a) . . . . . . N o r m a li ze d c o un t I pos I neg (b) Fig. 5

Histograms highlighting the differential memristive synapse weightstorage behavior for on/off resistance ratio of 2: (2.87 k Ω , 490 Ω ), Ω D pos =(6.12 k Ω , 1.3 k Ω ). Monte Carlo circuit simulations were run to obtain theseplots where 50 values of low and high conductance states were sampledand plotted in 20 bins. Dashed lines show the sampling distributions fordevice high and low conductance states in (a). (b) shows the distributionof the output currents from the normalizer circuit. The shown histogramsare normalized by dividing the count by the number of observations timesthe bin width. that the normalizer output currents I pos and I neg of Fig. 4, can beapproximately expressed as function of the input currents ∑ I exc and ∑ I inh , which in turn are proportional to the memristive de-vice conductances: I pos = I b ∑ I exc ∑ I exc + ∑ I inh I neg = I b ∑ I inh ∑ I exc + ∑ I inh (1)Since in each Memristive Synapse block the memristive devicesare arranged in a push-pull conﬁguration (see Fig. 3), large ∑ I exc currents will typically result in small ∑ I inh currents and vice-versa.In the extreme case, when all conductances of one type (e.g., ex-citatory) are in the high state and the conductances of the othertype (e.g., inhibitory) are in the low state, one output current ofthe circuit will be approximately equal to the maximum possiblevalue (e.g., I pos ≈ I b ) and the other to the minimum value, whichis set by the transistor leakage current. It is due to this strongnon-linear behavior that the normalizing function of eq. (1) hasthe remarkable effect of reducing the effect of device mismatch Journal Name, [year], [vol.] , . . . . . . N o r m a li ze d c o un t Ω D neg Ω D pos (a) . . . . . N o r m a li ze d c o un t I pos I neg . . .

00 21 . (b) Fig. 6

Histograms highlighting the differential memristive synapse synap-tic weight storage behavior for high/low resistance ratio of 10: (Mean,Std Dev) for Ω D neg = (2.931 k Ω , 582 Ω ), Ω D pos = (30.35 k Ω , 5.71 k Ω ). MonteCarlo circuit simulations were run to obtain these plots where 50 valuesof low and high conductance states were sampled and plotted in 20 bins.Dashed lines show the sampling distributions for device high and low con-ductance states in (a). (b) shows the distribution of the output currentsfrom the normalizer circuit. The insets in Figure 6b show the resultingoutput current distributions in ﬁner detail where the range of observedvalues for I pos and I neg are plotted in 10 bins without normalization. in their conductance values. Examples of the variability reduc-tion features of the circuit are illustrated in Figures 5 and 6: Fig-ure 5a shows the effect of the normalizer circuit on its outputcurrents for a typical distribution of device conductances that wasderived from the literature , for a very conservative on-off ratioof two. While there is a signiﬁcant overlap between the resis-tance values of the single memristive devices (see Fig. 5a), it isclear from Fig. 5b that using the output of the normalizer to mea-sure synaptic weight values reduces this overlap signiﬁcantly, asit squashes the distributions of output currents toward the maxi-mum and minimum possible current outputs. This is even moreevident in Fig. 6, where the on-off ratio of the conductance valuesis ten. In particular, note that in this case the normalizer circuiteliminates the effect of device variability almost completely, as thedistributions of currents (equivalent to the distribution of synap-tic weights) is almost completely binary, despite the fact that thedistribution in memristive conductance values is still substantial (compare Fig. 6a with Fig. 6b).As the output currents of the normalizer circuit can be scaled tovery small subthreshold current values (e.g., in the range of pico-Amperes), the power consumption of the neural processing cir-cuits downstream can be kept very low. Furthermore, this makesthe downstream circuits more compact as they can use smaller ca-pacitors to implement temporal dynamics with biologically plau-sible time constants (e.g., for allowing real-time interaction withthe environment). In addition to mitigating the effect of devicevariability, the differential operation used in the architecture pro-posed has the advantage of allowing the use of both positive (ex-citatory) and negative (inhibitory) weights, effectively doublingthe “high-low” dynamic range of the memristive devices. In this section we propose circuits that can be interfaced to mem-ristive devices to exploit the cycle-to-cycle variability in theirswitching characteristics to implement stochastic learning. In-deed, the cycle-to-cycle variability in the switching of memristorsprovides an intrinsic stochastic process that can be used to updatethe weights of the synapses in a neural network. The probabilisticswitching in the memristor devices has been observed and stud-ied before which is believed to stem from the formation and dis-solution of a ﬁlament between the device electrodes . Theﬁlament formation model in the memristive devices is stronglybias-dependent and can be explained by the hopping of ions ina thermally activated process . The hopping rate is thereforeexponentially related to the activation energy and linearly depen-dent in time: Γ = / τ = υ e − E a ( V ) / k B T , (2)where υ is the attempt frequency for particle hopping, k B is theBoltzmann constant and T is the absolute temperature. As a re-sult of the thermodynamical nature of this process, the switchingof the memristive devices is stochastic and is shown to follow aPoisson distribution in silver/amorphous silicon/p doped poly sil-icon memristive devices . The authors claim that the results canbe generalized to other memristive systems such as OxRAMs. ThePoisson distribution suggests that the switching events are inde-pendent from one another and that the probability of a switchingevent occurring within ∆ t at time t is P ( t ) = ∆ t τ e − t / τ , where τ isthe characteristic wait time which is the mean time after the ap-plication of the SET pulse in which the device switches. A thor-ough study on the effect of the applied SET voltage V on the waittime has been performed which shows that as the applied voltageacross the device increases linearly, the characteristic wait timedecreases exponentially . Therefore, τ ( V ) = τ e − V / V where τ and V are ﬁtting parameters found by the experimental measure-ments . Employing this model, the probability of switching for t << τ can be written as : P ( t ) = ∆ t τ = ∆ t τ e V / V (3)The stochastic learning mechanism we propose exploits thischaracteristic in an event-based network which comprises binarysynapses, implemented using memristive devices that are driven Journal Name, [year], [vol.] ,, o their maximum or minimum conductance states with everyweight update. Even though the synapses are treated as binaryelements, the probabilistic nature of the weight-update mecha-nism can be used to preserve the analog nature of the learningrule. The weight update mechanism that we consider in this workis the “Delta-rule”. This is one of the most common weight up-date rules used in the literature for single-layer networks ,and it is at the base of the back-propagation algorithm used inthe vast majority of current multi-layer neural networks . Ithas been shown that the Delta-rule is a learning algorithm whichminimizes the Least Mean Square (LMS) error of a single-layerneural network cost function deﬁned as the difference between atarget desired output signal T and the network output signal y ,for a given set of input patterns signals x , weighted by the synap-tic weight parameters w . Speciﬁcally, this learning rule sets thecorresponding weight change between the ith input and the jth output neuron to be: ∆ w ji = α ( T j − y j ) x i .In the stochastic version of the Delta-rule, this weight update istranslated to the probability of weight change , and in the contextof implementing it with memristive devices, to the probability ofswitching the device’s state rather than an incremental change inits conductance. Therefore, to directly map the probability P , intothe weight change ∆ w ji , P has to be a linear function of the error ( T j − y j ) . Since from eq. (3) , P is an exponential function of thevoltage applied across the device, this voltage needs to be: VV = log ( T j − y j ) . (4)such that by plugging eq. (4) into eq. (3) we get: P ( t ) = ∆ te log ( T j − y j ) x i = ∆ t ( T j − y j ) x i (5)which ensures that P follows a linear function of the error.In our framework we encode input signals x as a sequence ofpre-synaptic events coming from the AER block, which also trig-ger the weight update at their arrival. The error signal used forthe weight updates depends on the average ﬁring rate of the out-put neuron (equivalent to the Delta-rule y signal) and on a desiredtarget signal T provided as an external input. The neuron averageﬁring rate is computed using a current-mode low pass ﬁlter (seethe DPI circuit of Fig. 1 which produces the current iNeuron ). Thedesired target signal is represented by the current iTarget .To compute the error as the difference of these two signals, weused the circuit shown in Fig. 7. It is an analog circuit operatedin the subthreshold domain known as the Bump/anti-Bump cir-cuit . The circuit generates a current in the middle branch thatincreases as the values of iNeuron and iTarget become more andmore similar (bump), whereas it generates increasing currents inthe side branches as iNeuron and iTarget become dissimilar (anti-bump). Note that the side branch currents, labeled as I and I inFig. 7, have the same transfer function of the current normalizercircuit described in Section 2: I = I b iNeuroniNeuron + iTarget ; I = I b iTargetiNeuron + iTarget (6) iNeuron iTargetIb1 UPI1 I2M3 M1 M2 M4V1 V2Stop LB Fig. 7

Learning Block circuit, implemented as a Bump/anti-Bump cir-cuit. The neuron average activity iNeuron is compared against a targetcurrent iTarget. The voltages V1 and V2 are a function of the differencebetween iTarget and iNeuron. The digital signal UP is high when the erroris positive and is low otherwise.

The difference in the side currents is then thresholded anddigitized to produce a digital control signal UP , and its inverse DN (not shown in the ﬁgure), that controls the direction of theweight update for the synapse that received its corresponding in-put event.The voltage applied to the memristive devices to implement theprobabilistic weight change of eq. (5) is determined by eq. (4).The precise value of this voltage is very important, as the proba-bility of switching of a memristor is exponentially dependent onthe voltage across it. However CMOS device mismatch and mem-ristive device variability do not allow the use of a single constantvoltage shared across all synapses. Although analogous effortshave been proposed in the literature , implementing calibrationcircuits to precisely control the voltage biases in each synapsecircuit would result in a very bulky design with large overheadcircuitry and time-consuming calibration procedures at run time.Rather than attempting to solve the device mismatch and vari-ability effects with brute-force approaches, we exploit the stochas-tic nature of the learning algorithm: by generating a time-varyingvoltage ramp signal and applying it to the memristive devices inthe weight-update phase, we can sweep across all values of thedistribution of voltages that can affect the device switching behav-ior. Speciﬁcally, we propose a circuit that generates a ramp volt-age with a slope α that is proportional to the logarithmic value ofthe error signal, as deﬁned in eq.(4).By applying this voltage ramp to the memristive devices, theswitching probability of the devices becomes proportional to iTarget-iNeuron . Since iTarget is the desired output spike rateand iNeuron the effective output spike rate, the expected weightchange resulting from a switching is thus proportional to thederivative of this difference squared: In expectation the circuit Journal Name, [year], [vol.] , PUPV1 V2Write I b2 Vdd/2

Write C1 + - V3 V4I I I I AM5 M6M7 M8I out PC Write Write

VdriveVpr

Fig. 8

The Programming Circuit (PC) block used to generate the rampused to program the memristors as a function of the error. The voltagesignals V1 and V2 are obtained from Fig. 7. Depending on the sign ofthe UP signal a rising or falling ramp is generated. implements a gradient descent procedure on this squared error.The time varying ramp signal modulates the probability of re-sistive switching such that high errors results in more probableswitching and vice versa.This strategy implements a form of “Randomized Rounding” on the Delta-Rule, which has been shown to be more effectivethan deterministic rounding in a similar context .The circuit that produces this voltage ramp is shown in Fig. 8.It is a global circuit shared by all the Memristive Synapse (MS)blocks of a neuron row (see PC block Fig. 1). The generationof the voltage ramp is triggered every time an input spike-eventproduces a Write pulse from the PS block of Fig. 2. During thisperiod the circuit is operational and receives as input the analogsignals V1 , V2 , and the digital one UP . Given the subthresholdmode of operation, the output voltage signals of this circuit V3 and V4 can be expressed as: V = U T κ log ( ∆ II ) if ∆ I > V = U T κ log ( − ∆ II ) if ∆ I < (7)where ∆ I is deﬁned as iTarget − iNeuron , k and I are theprocess-dependent subthreshold slope factor and reverse biasedleakage current respectively, and U T is the thermal voltage.Now, to generate the desired ramp voltage, we need to con-vert the ( V3 − V4 ) voltage difference to a current that cancharges/discharge a capacitor linearly. This is achieved by usinga transconductance ampliﬁer to produce the current I out : I out = I b tanh ( κ U T ( V − V )) (8)It is safe to assume that the tanh function of eq. (8) is operatingin its linear region, since V and V are generated from V and V in circuits of Fig 7 which operate in the subthreshold region. The ramp voltage V pr thus becomes:

V pr = V dd ± I out C ∆ t Write = V dd ± I b C log ( ± ∆ II ) ∆ t Write (9)where ∆ t Write is the duration of the write-phase during which thememristors is programmed. The voltage V dd is the value to whichthe capacitor is pre-charged before and after the write-phase.This voltage is applied to the memristive synapse that was stim-ulated by the input spike-event, using the polarity deﬁned by the UP and DN signals produced by the Learning Block of the corre-sponding row. As the ramp generator circuit is shared among allthe synapses of a row, any other incoming spike-event receivedduring the write-phase will be ignored. It has been shown thatthis assumption holds as long as the average rate of input spikesis slower than the write-phase ramp duration .As the online learning proceeds and the neuron’s mean activ-ity approaches the target value, the magnitude of the current I out of the PC circuit (see Fig. 8) decreases and as a consequence theslope of the ramp decreases. Since the probability of switching forthe memristive devices is practically zero for voltages much lowerthan the “nominal threshold voltage” , this implementation in-duces a “stop-learning” zone in which no change is applied to thestate of the devices. It has been shown how this strategy of havinga region of operation by which the weight-updates are disabled,when the learning error value decreases below a set threshold im-proves the stability of the learning process and the convergenceproperties of the network . Furthermore, this strategy has theimportant feature of enabling continuous time “always on” learn-ing operations, without having to artiﬁcially separate the trainingphase from the test phase.To validate the analysis presented above we carried out circuitsimulations of both the Learning Block and the Programming Cir-cuit of Fig. 7 and Fig. 8 for a standard . µ m CMOS process.Figure 9 shows the circuit simulation results, for both cases of theerror signal ∆ I greater and less than zero. The plots show also theﬁt of eq. (9) with the data for I b = nA , I b = nA , C = f F and ∆ t = µ secs . As depicted in the ﬁgures, the circuit outputsclosely match the ﬁts. To evaluate the effects of various sources of variability on the per-formance of the network and circuits proposed we carried outsystem-level behavioral simulations of the network, applied to alinear classiﬁcation task using the MNIST hand-written digit data-set, comprising a training set for the learning phase and a test setfor the validation phase. We compared the network performanceon the test set after training on the training set in four cases:1. Rate-based neural network with ﬂoating point synaptic pre-cision trained by standard gradient-descent method as abaseline for comparing the accuracy of the network.2. Spiking neural network with ideal binary devices trained byprobabilistic gradient descent (as explained in Section 3).3. Spiking neural network with non-ideal binary devices havinghigh variability in their resistance value (20 % of standard

Journal Name, [year], [vol.] ,,

Journal Name, [year], [vol.] ,, × − . . . . . . . V o l t ag e a c r o ss t h e m e m r i s t o r( V ) Circuit DataFit (R = 0 . (a) × − − . − . − . − . − . V o l t ag e a c r o ss t h e m e m r i s t o r( V ) Circuit DataFit (R = 0 . (b) Fig. 9

Circuit simulation results. Voltage across the memristor is shownas a function of iTarget-iNeuron when the control signal UP is high (a),and iTarget-iNeuron when the control signal UP is low (b). The circuitdata is ﬁtted with eq. (9) . The term R indicates the coefﬁcient of deter-mination, which is a statistical measure of how close the data are to theﬁtted line. deviation) trained by probabilistic gradient descent.4. Spiking neural network with non-ideal binary devices ofitem 3, whose variations are suppressed using the variabil-ity reduction circuit presented in Section 2, and trained byprobabilistic gradient descent.To compare the network to previously published results, weused a conﬁguration analogous to the setup presented in the workof Bill and Legenstein , who used a model of memristive ele-ments in an unsupervised Winner-take-all network to learn digitprototypes for digits zero to four. A downscaled network of thiskind has been partially veriﬁed in hardware recently . Thissetup is also comparable to other setups for previous simulationsdone by our group .We carried out spiking neural network simulations using theBrian2 simulator and neuron model equations that match thetransfer function of the silicon neuron circuits and DPI ﬁlters used in the architecture. In these simulations, we combine forthe ﬁrst time a stochastic learning algorithm with a variabilitycompensation method. Both are based on different variabilitycharacteristics of memristors: The stochastic learning algorithmuses the cycle-to-cycle variability in the switching probability of amemristor for a given voltage ramp, the variability compensationaddresses the device-to-device (and cycle-to-cycle) variability inconductance level of a memristor.The gray-level MNIST input images were re-scaled to imagesizes of × and their pixel values were converted to Poissonspike trains with a mean ﬁring rate proportional to the pixel inten-sity. To obtain higher resolution effective connections from eachinput pixel while using binary synaptic elements we encoded thepixel values with multiple instances of spiking neurons. Specif-ically, each pixel was associated to a number n c of spiking neu-rons in the input layer, that stimulated a corresponding numberof synaptic elements of a target “compound synapse” (compris-ing n c devices instead of two) in the network output recognitionlayer. In this way, the synaptic connection strengths have · n c ef-fective levels, instead of two. The total number of neurons in theinput layer is therefore n c ( × ) . The output recognition layeris composed of ﬁve read-out neurons (one for each digit type zeroto four), each of which comprises a row of ( × ) compoundsynapses, with each compound synapse containing n c memristivedevices.The neuromorphic architecture used in these system level be-havioral simulations is the one described in Section 1. The pa-rameters used to encode the synaptic weights are either two pre-cise discrete values (with no variability), in the case of ideal-ized synaptic elements, or are random numbers that follow abi-modal distribution based on measured data from memristivedevice properties, as given in Fig. 5a and 6a. To implement thelearning strategy described in Section 3 we model the effect ofthe ramp generator on the synaptic conductance as a stochasticbinary update, using the switching probabilities deﬁned in Sec-tion 3. The learning block of each output neuron receives inputsfrom two sources: from the ﬁlter that measures the average ﬁringrate of the neuron itself iNeuron , and from external teacher neu-rons that provide a desired average current iTarget . In the pro-tocol used, large iTarget values indicate that the neuron shouldlearn to be active for the given input pattern (see also Fig. 7),while low iTarget values indicate that the neuron should learn toignore the input pattern.The network is initialized by sampling synaptic weights fromappropriate distributions given in Fig. 5a and Fig. 6a. Note thatwe assume that the memristive devices have already been formedand are ready for read and write operations. Training the networkis achieved by presenting 10000 randomly chosen digits from thetraining set along with the appropriate teacher signals. Each im-age is presented for 100 ms while the learning circuits tune thesynaptic weights. After this, the performance of the network isevaluated on 5000 further digits (randomly drawn from the testset).To evaluate the performance of the network, namely the clas-siﬁcation accuracy, we chose as network output the index of theoutput neuron that spiked with the highest ﬁring rate during the Journal Name, [year], [vol.] , ig. 10 Test set error on MNIST digits 0-4 vs. number of synapses perinput pixel ( n c ). On/Off ratio in the memristor model is 2 (see Fig. 5b). Fig. 11

Test set error on MNIST digits 0-4 vs. number of synapses perinput pixel ( n c ). On/Off ratio in the memristor model is 10 (see Fig. 6b). input presentation and compared it’s identity to the label of thepattern provided in input. If more than one output neuron spiked,the neuron that spiked the most was chosen as the one encodingthe learned label.Figures 10 and 11 show the performance of the proposed ar-chitecture. As a base-line comparison (that we expect to upperbound the performance of this setup) we also trained a standardlinear classiﬁer with 32-bit ﬂoating point synaptic elements and32-bit rate based neurons using stochastic gradient descent .This baseline reaches circa . ± . test set error. The dis-crepancy to the circa error of our best simulation, can beexplained by the low resolution of synaptic memory, the singlebit communication channels of spiking neurons and the lossy in-put encoding in Poisson spike trains. An intermediary idealizedsetup, only controlling for memristive conductance variability, butincorporating other non-idealities is given by the ‘ideal binary’simulations (see the green bars in Fig. 10 and 11). The networksimulations with different types of synapse models (i.e., basic un-normalized linear conversion case, and current-normalizer con- version case) show how the normalization circuit decreases theclassiﬁcation error overall. By comparing the error-bars on theun-normalized (red bars) and normalized (blue bars) simulationresults in Fig. 10 and 11, it is evident how the normalization cir-cuit decreases also the variance in the error. We speculate that thereason for this is the more stable update size of the normalizedsetup.Figure 12 shows examples of synaptic weight matrices of theﬁve different neurons that were trained to recognize the ﬁve dif-ferent digits, for the case in which n c = . These synaptic weightmatrices can be interpreted as “receptive ﬁelds” of the trainedneurons which correspond to the best discriminatory features(e.g., positive weights for prototypes of the digit the neurons aresupposed to classify intermixed with negative weights for the dig-its that they are supposed to ignore).Overall these simulations show that changes on the behaviorallevel of a small neural network can be inﬂuenced by low-levelcharacteristics of the building blocks of the neurons that com-prise it. Speciﬁcally we have shown that the probabilistic switch-ing behavior of memristors can be used as a powerful computa-tional primitive in a learning setting, and that variability in con-ductance levels of memristors can be effectively (in the sense ofhigh-level performance) mitigated by appropriate normalizationwith a compact circuit. Although in this paper we focus on the use of memristive devicesas binary elements, the architecture proposed can potentially sup-port the full spectrum of memristive behaviors that has been re-ported in the literature:1. Stochastic binary

2. Multiple binary devices in parallel (compoundsynapse)

3. Stochastic multiple discrete levels

4. Almost analog

In the case of binary synapses, we showed how the proposedstochastic learning circuits enable the architecture to achieve ac-ceptable performance on the MNIST test bench. The system-levelbehavioral simulations demonstrated that the use of compoundsynapses improves the classiﬁcation performance, and quantiﬁedthe improvement factors.It has been shown in the literature , how gradual conduc-tance modulation of memristive devices can be observed whenpulses are applied for a short amount of time. Under these condi-tions controlling the number of pulses applied to the device canbe used as a way to tune the desired conductance values. Thearchitecture proposed can support this regime of operation by ap-propriately setting the the pulse height and/or duration via the LB and PC blocks of Section 3. The same circuits can be ex-tended to produce a tunable number of short pulse sequences Journal Name, [year], [vol.] ,, ig. 12 Example receptive ﬁelds (arbitrary units) learned by the network with n c = . Note that these are discriminatory features, not digit prototypes. by enabling a ring oscillator for the desired duration. This lat-ter strategy would allow us to implement learning with gradualchanges, rather than binary probabilistic one, by encoding the de-sired change in weight ∆ w with the number of pulses generated bythe ring oscillator. It is worth noting that the same memristive de-vice can be tuned to behave as a binary one or multi-level one byadopting different biasing and operating conditions . For exam-ple, even for a ﬁxed set-voltage, it is possible to operate the samedevice in the binary or analog region by changing the length ofthe Write pulse in the PS block of Fig. 2: longer pulses will drivethe device into the binary mode, while shorter ones will exhibitmore of an analog behavior. In this paper we have presented analog CMOS circuits that can beinterfaced to memristive devices to mitigate the effect of their de-vice variability. A remarkable feature of the use of analog CMOScircuits used to implement also synapse and neuron dynamics isthe fact that their device mismatch non-idealities can be exploited to improve the network classiﬁcation performance. Indeed, de-vice mismatch across multiple memristive synapses and siliconneurons, the very phenomenon that decreases the classiﬁcationperformance of one single binary classiﬁer (e.g., one Perceptronor neuron row of Fig. 1) and that engineers tend to minimize withbrute-force approaches, can be embraced to build highly accu-rate classiﬁers composed of ensembles of single ones. This can bedemonstrated by the theory of ensemble learning . There are twobroad classes of algorithms that fall in the category of ensemblelearning: Bagging and Boosting.

Bagging or bootstrap aggregating is an averaging technique pro-posed by Breiman where a collection of M classiﬁers aretrained on M equally-sized subsets of the full training setcreated with replacement. The predictions made by the en-semble of M classiﬁers are then averaged to make the ﬁnalprediction. Boosting is a technique that uses a collection of un-correlatedweak classiﬁers (whose accuracy is only slightly better thanchance) to build a strong-classiﬁer (whose prediction errorcan be made arbitrarily small) . One of the most popular variants of the approach is called the AdaBoost algorithm .Unlike the bagging approach, every weak classiﬁer in the en-semble is exposed to the full training data, where each sam-ple is associated with an observation weight during training.For training the ﬁrst classiﬁer, the weights are kept equal forevery training sample. When training the second classiﬁer,the sample weights are adjusted such that the misclassiﬁedsamples by the ﬁrst classiﬁer have a higher weight. A weightis also assigned to each classiﬁer based on its prediction ac-curacy. This process is continued till the desired number ofweak classiﬁers are generated. The ﬁnal prediction from theensemble is a weighted sum of the weak-classiﬁer predic-tions.These ensemble learning principles can indeed be applied tothe neuromorphic architecture proposed in Section 1 to asymp-totically improve the accuracy of the system. In particular, theBagging approach is immediately applicable to the system, bysimply sending the same input patterns to multiple neuron rowsand training ensembles of neurons to recognize the same class.The variability in the synapse and neuron circuits is already suf-ﬁcient to make sure that each neuron acting as a “weak” binaryclassiﬁer behaves in a way that is different from the other onesbelonging to the same ensemble. However, to truly ensure thatthe weak classiﬁers are fully independent it would be sufﬁcientto train each neuron of the same ensemble with input patternsthat represent different sub data-sets of the original training dataset. This has indeed already been demonstrated with pure CMOSbased architectures of the type proposed in this paper, by usingdifferent random connectivity patterns for each weak classiﬁer ofthe ensemble ? .The boosting approach, promises to yield even better results.However the constraints on choosing which weights to changemight lead to the adoption of extra control modules per neuronthat require too large or complex overhead circuits and could re-sult to be prohibitive for realistic compact chip designs. The nano-scale footprint of memristors is an importantfeature which can enable ultra dense memory capacity . Toexploit this extremely low footprint to its full extent, dense cross-bar arrays have been reportedly implemented and proposed as

10 | 1–13

Journal Name, [year], [vol.] , n-memory computing neural network engines . However,although the development of dense cross-bars is extremely impor-tant for the scaling of technology, there are many challenges asso-ciated with their use in neuromorphic architectures both from fab-rication and circuits point of view. For example, it is not clear howmuch passive cross-bar arrays can be scaled up to larger sizes, dueto sneak path and cross-talk issues . Even in the case of cross-bar arrays with active elements such as 1T-1R (one-transistorand one-memristor) or memristive devices with embedded “se-lectors” used to avoid the sneak-path problem, issues such as theline resistance, reproducibility, and overhead size of external en-coder and decoder CMOS circuits are yet to be satisfactorilyaddressed. Alternatively, one can decide to forgo the cross-bar ap-proach of very high density arrangements of basic 1R or 1T-1R el-ements, and design addressable arrays of more complex synapsesthat comprise multiple transistors and multiple memristive de-vices per synapse, to try and capitalize on the many other usefulfeatures of memristive devices (in addition to their compact size),such as non-volatility, state-dependence, complex physics that canbe exploited to emulate the complex molecular properties of bio-logical synapses, complex dynamics, and stochastic switching be-havior. The architecture we propose represents an intermediateapproach that comprises two memristive devices per synapse andtwo select switches. This design was proposed to allow maximumﬂexibility in exploring the properties of different types of mem-ristive memory devices, but it could be made even more dense byreplacing the transistors currently used to switch between read-mode and write-mode with embedded selectors and modulatingthe amplitude of the Vdrive line of Fig. 3 to operate the deviceonly in read-mode or in both read- and write-mode, thanks to thefact that the voltage set at the terminals of the memristive de-vices is a ramp that can cover both ranges of operation. However,while large-scale in-memory computing cross-bar arrays of thistype may solve the memory-bottleneck problem , they wouldstill be crippled by an Input/Output (I/O) bottleneck problem dueto the constraint that while one synapse is being operated in itswrite-mode (which could last micro-seconds), no other synapseof the same row could be stimulated. By incorporating the PS and NC blocks of Fig. 1 in the MS blocks, this addressable ar-ray architecture would deﬁnitely lose the beneﬁt of high-densitysynapses, but would dramatically increase the bandwidth of itsinput Address-Events (e.g., with each I/O operation lasting nano-seconds), as each synapse element would become independentfrom the others and multiple synapses would be able to safelyoperate in read- or write-mode in parallel. Once the choice ismade to forgo the density beneﬁt, adding further transistors forexample to implement local non-linear dynamics, such as short-term plasticity ? , or homeostatic synaptic scaling mechanisms ? ,or more complex learning mechanisms ? to improve the perfor-mance of the overall neuromorphic computing system would be-come easily realizable. We presented an effort to design and combine a suite of com-putational techniques for constructing a trainable neuromorphicplatform that supports the use of a wide variety of memristive de- vices. We showed that variability of the memristive devices andmismatch in CMOS circuits can be on one hand reduced by circuittechniques, and can on the other hand be exploited as a featurefor training and computation. We described the architecture ofa neuromorphic platform that can implement stochastic trainingexploiting the switching properties of memristive devices and val-idated the approach with system-level behavioral simulations fora linear classiﬁcation task, using the MNIST data-set.The proposed neuromorphic computing architecture supportscontinuous-time always-on on-chip learning, and continuouslystreams output spikes to the AER output block. By routing outputaddress-events via either off-chip or on-chip asynchronous AERrouting schemes and circuits , these architectures supportscaling by tiling them either across multiple chips, or on multi-ple cores within a multi-core device. Examples of multi-core neu-romorphic computing systems based on the AER protocol havebeen recently proposed , however none have been imple-mented so far using memristive devices, and exploiting their in-trinsic properties to implement probabilistic learning.

Acknowledgements

This work is supported by SNSF grant number CRSII2_160756.We acknowledge also funding from the “InternationalizationFund of the FZ-Juelich” for the project “NeuroCode”.

References

Proceed-ings of the IEEE , 2014, , 1367–1388.2 J. Park, S. Ha, T. Yu, E. Neftci and G. Cauwenberghs, Biomed-ical Circuits and Systems Conference (BioCAS), 2014 IEEE,2014, pp. 675–678.3 S. Furber, F. Galluppi, S. Temple and L. Plana,

Proceedings ofthe IEEE , 2014, , 652–665.4 B. V. Benjamin, P. Gao, E. McQuinn, S. Choudhary, A. R. Chan-drasekaran, J. Bussat, R. Alvarez-Icaza, J. Arthur, P. Merollaand K. Boahen,

Proceedings of the IEEE , 2014, , 699–716.5 P. Merolla, J. Arthur, R. Alvarez, J.-M. Bussat and K. Boahen,

Circuits and Systems I: Regular Papers, IEEE Transactions on ,2014, , 820–833.6 S. Mitra, S. Fusi and G. Indiveri, Biomedical Circuits and Sys-tems, IEEE Transactions on , 2009, , 32–42.7 N. Qiao, H. Mostafa, F. Corradi, M. Osswald, F. Stefanini,D. Sumislawska and G. Indiveri, Frontiers in Neuroscience ,2015, , 1–17.8 S. Moradi, N. Qiao, F. Stefanini and G. Indiveri, BiomedicalCircuits and Systems, IEEE Transactions on , 2017, 1–17.9 M. Davies, N. Srinivasa, T. H. Lin, G. Chinya, Y. Cao, S. H.Choday, G. Dimou, P. Joshi, N. Imam, S. Jain, Y. Liao, C. K.Lin, A. Lines, R. Liu, D. Mathaikutty, S. McCoy, A. Paul,J. Tse, G. Venkataramanan, Y. H. Weng, A. Wild, Y. Yang andH. Wang,

IEEE Micro , 2018, , 82–99.10 J. Backus, Communications of the ACM , 1978, , 613–641.11 G. Indiveri and S.-C. Liu, Proceedings of the IEEE , 2015, ,1379–1397.12 I. Boybat, M. L. Gallo, T. Moraitis, T. Parnell, T. Tuma, B. Ra-

Journal Name, [year], [vol.] ,,

Journal Name, [year], [vol.] ,, endran, Y. Leblebici, A. Sebastian, E. Eleftheriou et al. , Naturecommunications , 2018, , 2514.13 C. Li, D. Belkin, Y. Li, P. Yan, M. Hu, N. Ge, H. Jiang, E. Mont-gomery, P. Lin, Z. Wang, W. Song, J. P. Strachan, M. Barnell,Q. Wu, R. S. Williams, J. J. Yang and Q. Xia, Nature Commu-nications , 2018, , 1–8.14 S. Ambrogio, P. Narayanan, H. Tsai, R. M. Shelby, I. Boybat,C. di Nolfo, S. Sidler, M. Giordano, M. Bodini, N. C. P. Farinha,B. Killeen, C. Cheng, Y. Jaoudi and G. W. Burr, Nature , 2018, , 60–67.15 K. Likharev, A. Mayr, I. Muckra and Ö. Türel,

Annals of theNew York Academy of Sciences , 2003, , 146–163.16 E. Linn, R. Rosezin, C. Kügeler and R. Waser,

Nature materials ,2010, , 403–406.17 K. Kim, S. Gaba, D. Wheeler, J. Cruz-Albrecht, T. Hussain,N. Srinivasa and W. Lu, Nano letters , 2012, , 389–395.18 M. Prezioso, F. Merrikh-Bayat, B. Hoskins, G. Adam, K. K.Likharev and D. B. Strukov, Nature , 2015, , 61–64.19 J. Sandrini, M. Barlas, M. Thammasack, T. Demirci,M. De Marchi, D. Sacchetto, P.-E. Gaillardon, G. De Micheliand Y. Leblebici,

IEEE Journal on Emerging and Selected Topicsin Circuits and Systems , 2016, , 339–351.20 P. A. Merolla, J. V. Arthur, R. Alvarez-Icaza, A. S. Cassidy,J. Sawada, F. Akopyan, B. L. Jackson, N. Imam, C. Guo,Y. Nakamura, B. Brezzo, I. Vo, S. K. Esser, R. Appuswamy,B. Taba, A. Amir, M. D. Flickner, W. P. Risk, R. Manohar andD. S. Modha, Science , 2014, , 668–673.21 M. Payvand, A. Madhavan, M. A. Lastras-Montaño,A. Ghofrani, J. Rofeh, K.-T. Cheng, D. Strukov and L. Theoga-rajan, Circuits and Systems (ISCAS), 2015 IEEE InternationalSymposium on, 2015, pp. 1378–1381.22 B. Chakrabarti, M. A. Lastras-Montaño, G. Adam, M. Prezioso,B. Hoskins, M. Payvand, A. Madhavan, A. Ghofrani, L. Theog-arajan, K.-T. Cheng et al. , Scientiﬁc Reports , 2017, , 42429.23 S. H. Jo, T. Chang, I. Ebong, B. B. Bhadviya, P. Mazumder andW. Lu, Nano letters , 2010, , 1297–1301.24 D. Ielmini and R. Waser, Resistive Switching: From Fundamen-tals of Nanoionic Redox Processes to Memristive Device Applica-tions , John Wiley & Sons, 2015.25 T. Tuma, A. Pantazi, M. Le Gallo, A. Sebastian and E. Elefthe-riou,

Nature nanotechnology , 2016, , 693–699.26 M. Suri, D. Querlioz, O. Bichler, G. Palma, E. Vianello, D. Vuil-laume, C. Gamrat and B. DeSalvo, IEEE Transactions on Elec-tron Devices , 2013, , 2402–2409.27 M. Suri, O. Bichler, D. Querlioz, G. Palma, E. Vianello, D. Vuil-laume, C. Gamrat and B. DeSalvo, Electron Devices Meeting(IEDM), 2012 IEEE International, 2012, pp. 3–10.28 S. Gaba, P. Sheridan, J. Zhou, S. Choi and W. Lu, Nanoscale ,2013, , 5872–5878.29 S. H. Jo, K.-H. Kim and W. Lu, Nano letters , 2008, , 496–500.30 S. Ambrogio, S. Balatti, V. Milo, R. Carboni, Z.-Q. Wang,A. Calderoni, N. Ramaswamy and D. Ielmini, IEEE Transac-tions on Electron Devices , 2016, , 1508–1515.31 J. J. Yang, D. B. Strukov and D. R. Stewart, Nature nanotech- nology , 2013, , 13–24.32 S. N. Truong, S.-J. Ham and K.-S. Min, Nanoscale ResearchLetters , 2014, , 629.33 A. Serb, J. Bill, A. Khiat, R. Berdan, R. Legenstein and T. Pro-dromakis, Nature communications , 2016, , 12611.34 M. V. Nair and G. Indiveri, A differential memristive current-mode circuit , European patent application EP 17183461.7,2017, Filed 27.07.2017.35 A. Serb, W. Redman-White, C. Papavassiliou and T. Prodro-makis,

IEEE Transactions on Circuits and Systems I: RegularPapers , 2016, , 827–835.36 A. Vincent, J. Larroque, W. Zhao, N. B. Romdhane, O. Bichler,C. Gamrat, J.-O. Klein, S. Galdin-Retailleau and D. Querlioz,International Symposium on Circuits and Systems, (ISCAS),2014, 2014, pp. 1074–1077.37 M. Al-Shedivat, R. Naous, G. Cauwenberghs and K. N.Salama, IEEE Journal on Emerging and Selected Topics in Cir-cuits and Systems , 2015, , 242–253.38 E. O. Neftci, B. U. Pedroni, S. Joshi, M. Al-Shedivat andG. Cauwenberghs, Frontiers in Neuroscience , 2016, , 241.39 M. Payvand, L. K. Muller and G. Indiveri, Circuits and Systems(ISCAS), 2018 IEEE International Symposium on, 2018, pp.1–5.40 J. Bill and R. Legenstein, Frontiers in neuroscience , 2014, ,1–18.41 M. Courbariaux, Y. Bengio and J.-P. David, Advances in neuralinformation processing systems, 2015, pp. 3123–3131.42 L. K. Muller and G. Indiveri, arXiv preprint arXiv:1504.05767 ,2015, 1–11.43 S. Wozniak, A. Pantazi, S. Sidler, N. Papandreou, Y. Leblebiciand E. Eleftheriou, IEEE Transactions on Circuits and SystemsII: Express Briefs , 2017, 1342–1346.44 E. Covi, S. Brivio, A. Serb, T. Prodromakis, M. Fanciulli andS. Spiga,

Frontiers in neuroscience , 2016, , 1–13.45 T. Serrano-Gotarredona and B. Linares-Barranco, Memristorsand Memristive Systems , Springer, 2014, pp. 353–377.46 G. Indiveri, B. Linares-Barranco, R. Legenstein, G. Deligeorgisand T. Prodromakis,

Nanotechnology , 2013, , 384010.47 S.-C. Liu, J. Kramer, G. Indiveri, T. Delbruck and R. Douglas, Analog VLSI:Circuits and Principles , MIT Press, 2002.48

The MNIST database of handwritten digits , Yann LeCun’s web-site, 2012, http://yann.lecun.com/exdb/mnist/.49 S. Deiss, R. Douglas and A. Whatley,

Pulsed Neural Networks ,MIT Press, 1998, ch. 6, pp. 157–78.50 J. Lazzaro and J. Wawrzynek, Sixteenth Conference on Ad-vanced Research in VLSI, 1995, pp. 158–169.51 K. Boahen,

Neuromorphic Systems Engineering , Kluwer Aca-demic, Norwell, MA, 1998, pp. 229–259.52 M. V. Nair, L. K. Mueller and G. Indiveri,

Nano Futures , 2017, , 1–12.53 B. Gilbert, Analog Integrated Circuits and Signal Processing ,1996, , 95–118.54 C. Bartolozzi and G. Indiveri, Neural Computation , 2007, ,2581–2603.

12 | 1–13

Journal Name, [year], [vol.] , Analogue IC design: the current-mode approach ,Peregrinus, Stevenage, Herts., UK, 1990, ch. 2, pp. 11–91.57 S. Brivio, E. Covi, A. Serb, T. Prodromakis, M. Fanciulli andS. Spiga,

Applied Physics Letters , 2016, , 133504.58 R. Naous, M. Al-Shedivat and K. N. Salama,

IEEE Transactionson Nanotechnology , 2016, , 15–28.59 J. Hertz, A. Krogh and R. Palmer, Introduction to the Theory ofNeural Computation , Addison-Wesley, Reading, MA, 1991.60 Y. LeCun, Y. Bengio and G. Hinton,

Nature , 2015, , 436–444.61 J. Schmidhuber,

Neural Networks , 2015, , 85–117.62 J. Schemmel, D. Bruderle, A. Grubl, M. Hock, K. Meier andS. Millner, Circuits and Systems (ISCAS), Proceedings of 2010IEEE International Symposium on, 2010, pp. 1947–1950.63 P. Raghavan and C. D. Tompson, Combinatorica , 1987, ,365–374.64 J. M. Brader, W. Senn and S. Fusi, Neural computation , 2007, , 2881–2912.65 S. Sheik, S. Paul, C. Augustine and G. Cauwenberghs, arXivpreprint arXiv:1701.01495 , 2017.66 C. Baldassi, F. Gerace, C. Lucibello, L. Saglietti andR. Zecchina, Phys. Rev. E , 2016, , 052313.67 D. Goodman and R. Brette, Frontiers in Neuroscience , 2009, ,192–197.68 C. Bishop, Pattern recognition and machine learning , SpringerNew York, 2006.69 S. Stathopoulos, A. Khiat, M. Trapatseli, S. Cortese, A. Serb,I. Valov and T. Prodromakis,

Scientiﬁc reports , 2017, , 17532.70 J. Frascaroli, S. Brivio, E. Covi and S. Spiga, Scientiﬁc reports ,2018, , 71–78.71 T. Chang, S.-H. Jo, K.-H. Kim, P. Sheridan, S. Gaba and W. Lu, Applied physics A , 2011, , 857–863. 72 M. Prezioso, I. Kataeva, F. Merrikh-Bayat, B. Hoskins,G. Adam, T. Sota, K. Likharev and D. Strukov, IEEE Interna-tional Electron Devices Meeting (IEDM), 2015, pp. 209–223.73 L. Breiman,

Machine learning , 1996, , 123–140.74 R. E. Schapire, Machine learning , 1990, , 197–227.75 Y. Freund and R. E. Schapire, Journal of computer and systemsciences , 1997, , 119–139.76 S. Pi, P. Lin and Q. Xia, Journal of Vacuum Science & TechnologyB, Nanotechnology and Microelectronics: Materials, Processing,Measurement, and Phenomena , 2013, , 06FA02.77 B. Govoreanu, A. Redolﬁ, L. Zhang, C. Adelmann, M. Popovici,S. Clima, H. Hody, V. Paraschiv, I. Radu, A. Franquet et al. ,Electron Devices Meeting (IEDM), 2013 IEEE International,2013, pp. 10–2.78 C. Li, M. Hu, Y. Li, H. Jiang, N. Ge, E. Montgomery, J. Zhang,W. Song, N. Dávila, C. E. Graves et al. , Nature Electronics ,2018, , 52–59.79 S. Moradi, G. Indiveri, N. Qiao and F. Stefanini, Networks andhierarchical routing fabrics with heterogeneous memory struc-tures for scalable event-driven computing systems , Europeanpatent application EP 15/165272, 2015, Filed 27.04.2015.80 D. Fasnacht and G. Indiveri, Conference on Information Sci-ences and Systems, CISS 2011, Johns Hopkins University,2011, pp. 1–6.81 R. Serrano-Gotarredona, M. Oster, P. Lichtsteiner, A. Linares-Barranco, R. Paz-Vicente, F. Gómez-Rodriguez, L. Camunas-Mesa, R. Berner, M. Rivas-Perez, T. Delbruck, S.-C. Liu,R. Douglas, P. Häﬂiger, G. Jimenez-Moreno, A. Civit-Ballcels,T. Serrano-Gotarredona, A. Acosta-Jiménez and B. Linares-Barranco,

IEEE Transactions on Neural Networks , 2009, ,1417–1438.82 J. Park, T. Yu, S. Joshi, C. Maier and G. Cauwenberghs, IEEETransactions on Neural Networks and Learning Systems , 2016,1–15.

Journal Name, [year], [vol.] ,,