[PDF] AIDX: Adaptive Inference Scheme to Mitigate State-Drift in Memristive VMM Accelerators

Abstract

An adaptive inference method for crossbar (AIDX) is presented based on an optimization scheme for adjusting the duration and amplitude of input voltage pulses. AIDX minimizes the long-term effects of memristance drift on artificial neural network accuracy. The sub-threshold behavior of memristor has been modeled and verified by comparing with fabricated device data. The proposed method has been evaluated by testing on different network structures and applications, e.g., image reconstruction and classification tasks. The results showed an average of 60% improvement in convolutional neural network (CNN) performance on CIFAR10 dataset after 10000 inference operations as well as 78.6% error reduction in image reconstruction.

Full PDF

SSUBMITTED TO IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS-II: EXPRESS BRIEFS 1

AIDX: Adaptive Inference Scheme to MitigateState-Drift in Memristive VMM Accelerators

Tony Liu, Amirali Amirsoleimani, Fabien Alibart, Serge Ecoffey, Dominique Drouin, and Roman Genov

Abstract —An adaptive inference method for crossbar (AIDX)is presented based on an optimization scheme for adjusting theduration and amplitude of input voltage pulses. AIDX minimizesthe long-term effects of memristance drift on artiﬁcial neuralnetwork accuracy. The sub-threshold behavior of memristor hasbeen modeled and veriﬁed by comparing with fabricated devicedata. The proposed method has been evaluated by testing ondifferent network structures and applications, e.g., image recon-struction and classiﬁcation tasks. The results showed an averageof improvement in convolutional neural network (CNN)performance on CIFAR10 after inference operations aswell as . error reduction in image reconstruction. Index Terms —Memristor, Crossbar, Vector-Matrix Multiplica-tion, Inference, State-Drift, Neural Network.

I. I

NTRODUCTION R RESISTIVE switching memory crossbars have emergedas potentially high-speed and low-power accelerators forvector-matrix multiplication (VMM) [1], [2]. However, non-idealities and defects in these platforms dramatically impactthe neural network (NN) performance and accuracy. One of thesigniﬁcant and not extensively studied non-ideal phenomenais memristance drift [3] and it occurs in different types ofresistive switching memory technologies in various ways.For instance, phase change memories (PCM) will experienceincreasing resistance due to drift, even when there is no voltageapplied over the cell [4]. On the other hand, for memristors,state-drift from their programmed state happens as a result ofmany repeated VMM operations which leads to the computa-tional accuracy degradation (Fig. 1). Previous studies [5]–[7]on memristance drift in memristor technology have mainlybeen focused on high-density memory where memristors areused solely for storage rather than computation. More recentreports on drift [8] for computational memristor crossbarsinclude an inline calibration approach [9] which involvesoptimizing the calibration time of the memristor crossbar.By performing polynomial ﬁtting on the computational errordata, a . calibration efﬁciency is achieved. A closed-loop weight compensation based solution is presented in [10]which minimizes the effects of state-drift by increasing thecomputational service lifetime by . × and results in ap-proximately computational accuracy degradation within read operations. In this brief, we present an adaptiveinference scheme (AIDX) as a ﬂexible optimization procedurethat automatically adapts to existing crossbar non-idealitiesand circuit parasitics and can be applied to any VMM-basedtask. According to experimentally veriﬁed simulations, AIDX Tony Liu, Amirali Amirsoleimani, and Roman Genov are with the Depart-ment of Electrical and Computer Engineering, University of Toronto, 10 KingsCollege Road, Toronto, Ontario, Canada. Fabien Alibart, Serge Ecoffey andDominique Drouin are with the Interdisciplinary Institute for TechnologicalInnovation - 3IT, University of Sherbrooke, Qc, Canada. (A. Amirsoleimanicorresponding author email: [email protected]). cuts accuracy loss due to the device state-drift in modernconvolutional neural networks (CNN) by more than aswell a . error reduction in image reconstruction.II. P RELIMINARY

A. Impact of Memristance Drift on Crossbar MAC Operations

Memristance drift is deﬁned as the unintended smallchanges in memristor conductance caused by a low-voltageread/inference operation. Ideally, for the ideal weight distri-bution ( G ) the output current j -th column I j is given by I j = (cid:80) i G ij V i (Fig. 1(a)). We can deﬁne the memristancedrift caused by the k -th inference operation as δG k and theconductance of the memristor at the ( k + 1) -th iteration as: G k +1 = G k + δG k (1)The total memristance drift due to the k -th operation is ∆ G = (cid:80) ki δG i . As such, the real output current of the j -th column at the k -th operation is I (cid:48) j = (cid:80) i ( G ij + ∆ G ij ) V i .The current error I − I (cid:48) due to memristance drift can be quiteproblematic in larger crossbars because current scales withcrossbar size. However, a differential mapping scheme canprevent the build-up of memristance drift error in very largearrays because the error in the positive column will scale atthe same rate as the negative column. Fig. 1(b) illustrates theconcept of small changes in NN weights accumulating intomuch larger errors in the output layer. Fig. 1(c) illustrates3D structure of the network in Fig. 1(b). Fig. 1(d) showssample heatmaps of simulated × array of memristorsconductance changes due to memristance drift.The bottom rowrepresents the bias weights of a NN and they are initiallymapped to a high-conductance state which is why it is the onlyrow with reduction in overall conductance. Fig. 1(e) examinesthe state-drift impact on MNIST classiﬁcation task for multilayer perceptron network with various number of hidden layersand Fig. 1(f) illustrates the difference between above- and sub-threshold memristor switching. B. Memristance drift modeling and analysis

Behaviour-based memristor models are typically used inmemristor crossbar simulations due to their simplicity andlight computational load. However, most behaviour-basedmodels do not consider memristance drift and approximate theinternal state change due to an applied sub-threshold voltageto be zero. To address this issue, we propose an extension tothe popular VTEAM model [11] that accounts for the minutechanges in internal state due to sub-threshold voltages. For a r X i v : . [ c s . ET ] S e p UBMITTED TO IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS-II: EXPRESS BRIEFS 2 V inM,1 V in V in = Σ(V M × G M,N ) Memristance Drift

SkewedDistribution V in K V in K V inM,K I out I out I outN ,1 I out K I out K I outN , K K -Inference Steps

Input Vectors O u t pu t V e c t o r s V in V in V in V in I out1 I out2 I out3 Memristor (a) (b) (c)(d) % Drift (f)(e) V th V th Sub- thresholdOver- thresholdΔG G Weights

Ideal Weight Distribution G G G N G G G N G M ,1 G M ,2 G M , N Error x x x x a a a x x a a x x a x x a a x x a o f I n f e r e n c e ( K ) G+ΔG low High

Fig. 1. (a) Neural network (NN) forward pass VMM performed on a memristor crossbar. (b) Degradation of neural network weights and output accuracyover time. (c) 3D memristor crossbar array structure. (d) Percentage change of memristors conductance in a randomly chosen subset of NN weights doneon a simulated memristor crossbar. (e) Impact of simulated state-drift on MNIST classiﬁcation accuracy with varying number of hidden layer. (f) Sub- andover-threshold behaviour of device transition oxide ﬁlament. model consistency, we adopt a similar mathematical structurein the sub- and above-threshold region: dw ( t ) dt =  k s,off · (cid:16) v ( t ) v off (cid:17) α s,off · f s,off ( w ) , if ≤ v < v off k s,on · (cid:16) v ( t ) v on (cid:17) α s,on · f s,on ( w ) , if v on < v < (2)Here, v off and v on represent the RESET and SET voltagethresholds respectively. w ( t ) is the internal state variableand is related to the resistance R of the memristor as R ( t ) = R off w ( t ) + R on (1 − w ( t )) . k s,off and k s,on areﬁtting parameters that represent the rate of ion migrationat any given applied sub-threshold voltage. Similarly, α s,off and α s,on are parameters that characterize the exponentialrelationship between speed of ion migration and the appliedvoltage. f s,on ( w ) and f s,off ( w ) are window functions thatbounds the state between and . The time derivative of theresistance can be expressed as: dR ( t ) dt = R off dw ( t ) dt − R on dw ( t ) dt (3) R on and R off are low and high resistance state of device,respectively. Cycle-to-cycle and device-to-device variations insub-threshold drift speed are modelled by adding 15% randomGaussian noise to k and α parameters. The probability densityfunction (PDF) of k on is shown in Eqn. (4) where k on isthe ideal, ﬁtted parameter and x represents k on with addedGaussian noise. The PDF of the other k and α parametersfollow the same structure as Eqn. (4). f kon ( x ) = 1 √ . k on π e − ( x − kon )20 . kon (4)To validate our proposed model, the VTEAM extension isapplied to TiOx-based memristor device (Fig. 2(a)) data.The extended VTEAM k and α parameters were ﬁt usingsimulated annealing algorithms and gradient descent withSET and RESET voltage thresholds of − . V and . V.Fig. 2(b-c) illustrates that the extended VTEAM models sub-threshold memristor behaviour much more accurately than theoriginal VTEAM. Fig. 2(d) shows a 3D plot of how memristorswitching behavior and conductance changes with internalstate w and applied voltage in sub-threshold region. (a) (b)(c) (d)Memristor Device Structure Highly Doped Si Substrate

TiN

Highly Doped Si Substrate Al TiNAl O Ti /TiO Fig. 2. (a) Memristor device structure used for ﬁtting. (b) Comparison of I - V ﬁtting of the extended VTEAM with experimental data and the original modelin sub-threshold region for SET operation ( k s,on = − . × − , α s,on =6 ). For clarity, we extracted a best ﬁt curve to represent the experimental datafrom over the threshold and extrapolated the curve into sub-threshold regionby keeping the gradual drift trend. (c) Fitting comparison with experimental data for RESEToperation ( k s,off = 1 . × − , α s,off = 5 ). (d) 3Dcharacterization of the extended VTEAM for the same devicewith respect to the internal state variable w and voltage.III. M ETHODOLOGY

A. Problem and Assumptions

By formulating the issue of memristance drift as an opti-mization problem, we can develop an optimization scheme tominimize accuracy degradation. With no memristance drift, theideal mean squared error (MSE) is E = (cid:80) j ( y j − (cid:80) i G ij V i ) and the real MSE at the k -th operation is E k = (cid:80) j ( y j − (cid:80) i ( G ij + ∆ G ij ) V i ) . Where V i is the voltage applied to i -th row and ∆ G ij is the total memristance drift of the ij -thmemristor from its originally programmed value. We deﬁnethe error due to memristance drift E Drift as the differencein MSE between the initially programmed state ( E ) and the k -th inference operation ( E k ). As an optimization problem,the goal is to minimize the increase of E Drift with respectto time. The change in conductance due to memristance drift,

UBMITTED TO IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS-II: EXPRESS BRIEFS 3 ∆ G , can mainly be optimized by input to voltage amplitudemapping and relative inference voltage pulse width. Factorsthat cannot be easily changed such as speciﬁc memristorcharacteristics and overall crossbar structure will be ignoredin the optimization procedure. B. Optimization Methodology

We will frame the minimization of E Drift as an uncon-strained optimization problem where A is the input to voltageamplitude mapping and D is the relative voltage pulse width: min A,D E Drift ( A , D ) (5) D is a vector that represents the length of a positive inputread pulse relative to a negative read pulse. For instance, agiven row can have a positive read pulse of ns whilethe negative read pulse is only ns long. Similarly, A is a vector whose elements represent the relative inferencevoltage pulse amplitude ratio for positive to negative inputs(Fig. 3a). In summary, AIDX modiﬁes the amplitude andduration of inference voltage pulses to minimize memristancedrift for a given task. Even the minimum allowable voltagepulse amplitude and widths will still result in noticeablememristance drift after many inference operations. As such,AIDX is required to minimize aggregate memristance driftthrough balancing the total drift in the SET and RESETdirections. We use the popular Broyden-Fletcher-Goldfarb-Shannon (BFGS) algorithm [12] for this optimization problem.The BFGS algorithm is a quasi-Newton method that relieson the gradient of the objective function to ﬁnd the optimalsolution. However, E Drift is an unknown function that canonly be evaluated, so the ∇ E Drift had to be approximatedusing ﬁnite-difference approach. Gradient-free optimizationmethods like Nelder-Mead simplex method [13] were alsoexplored, but quasi-Newton algorithms were most effective forthis problem.

C. Constraint Violations

Normally, the optimized voltage amplitude ratio A andwidth ratio D can be reasonably used. However, there arecertain cases where elements of the optimal A and D arefar too large or too small to be implemented practically,typically when the device characteristics and input distributionare heavily skewed. To address this issue, we will ﬁrst framethe optimization problem through a different lens. Let’s startwith a simpliﬁed scenario of a single memristor with input data Algorithm 1:

BFGS Algorithm1. Obtain a direction p k through solving B k p k = −∇ f ( x k )

2. Perform Line Search to ﬁnd step size α k such that α k = argminf ( x k + α p k ) s k = α k p k x k +1 = x k + s k y k = ∇ f ( x k +1 ) − ∇ f ( x k ) B k +1 = B k + y k y Tk y Tk s k − B k s k s Tk B Tk s Tk B k s k

7. Repeat 1-6 until x converges. modelled by the discrete random variable X with a probabilitydensity function (PDF) of f ( x ) . Deﬁning the time derivativeof the internal state w for a given x as dw ( x ) dx = g ( x ) . (6)In our sub-threshold model, g ( x ) is the same as Eqn. (2) as v ( t ) replaced with v ( x ) which represents the mapping functionof input x to voltage amplitude v ( x ) . The average rate ofmemristance drift given input distribution X is as follows: E [ dw ( x ) dx ] = (cid:88) x g ( x ) f ( x ) . (7)The optimization problem over E Drift can also be reframedas minimizing (cid:12)(cid:12)(cid:12) E [ dw ( x ) dt ] (cid:12)(cid:12)(cid:12) over all memristors where the g ( x ) parameters for each memristor is sampled according to Eqn.(4) to account for device-to-device variations. The AIDXscheme deﬁned so far only affects g ( x ) , but has not yetmade any adjustments related to f ( x ) . If we allow AIDXto ﬁrst optimize over f ( x ) , the issue of impractical A or D can be circumvented. One of the only useful recoverabletransformations of the input vector x is inversion throughmultiplying by − . By inverting a random proportion a ofthe input data, the input PDF is transformed into f (cid:48) ( x ) : f (cid:48) ( x ) = (1 − a ) f ( x ) , < a < (8)Impractical A or D only occur when either (cid:12)(cid:12)(cid:12) E [ dw ( x ) dt ] (cid:12)(cid:12)(cid:12) >> or (cid:12)(cid:12)(cid:12) E [ dw ( x ) dt ] (cid:12)(cid:12)(cid:12) << before applying AIDX. As such, if we canoptimize (cid:12)(cid:12)(cid:12) E [ dw (cid:48) ( x,a ) dt ] (cid:12)(cid:12)(cid:12) over a , (cid:12)(cid:12)(cid:12) E [ dw ( x ) dt ] (cid:12)(cid:12)(cid:12) will be brought closeto before optimizing A and D which will therefore preventany constraint violations. Fig. 3(b) illustrates our approach toconstraint violations. D. General Solution Flow

As it can be seen in Fig. 3(c), during pre-processing,optimization is done in three separate scenarios to guaranteeoptimal ﬁtting parameters. Once hardware constraint violationsare resolved with input data inversion, the input circuits e.g.digital-to-analog converters (DACs) are adjusted to ﬁt theoptimized input to voltage signal mapping parameters. Themajority of AIDX takes place during pre-processing whichonly needs to be done once for any given task. The onlydifference in the AIDX inference operation as compared toa normal inference operation is to recover the intended outputcurrent from an inverted output through multiplying by − .Fig. 3(d) summarizes the general pipeline of AIDX. Fig. 3(e)shows the evolution of memristor state due to memristancedrift for AIDX and the baseline model and Fig. 3(f) is aheatmap of a portion of the memristor crossbar at 1000 and10000 inference steps where the bottom row represents thebias. While memristance drift is a phenomenon that can causememristors to switch in both the set and reset direction asseen in Fig. 3(e), almost all memristors within a crossbar willtypically drift in only one direction for image-based applica-tions. These inputs are almost entirely positive which causesan aggregate drift in the reset direction. Other reasons for UBMITTED TO IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS-II: EXPRESS BRIEFS 4 (c)(d)

AIDX Preprocessing AIDX Inference

Get Input Vector X Map X to Voltage Pulses ( V in ) based on V opt Perform Inference:

Apply V in and Sense I out Inverted input?

Invert output: I * out = - I out Final output Y

YESNO

Map Matrix G onto Crossbar

Optimization on PAOptimization on PW&PAOptimization on PWChoose Best Performing

Parameter as V opt YES

Adjust DAC to fit V opt NO Start

BFGS

Optimization

Proportional Input Data

InversionRedo Optimization by

Transformed Data* PW: Input Pulse Width ** PA: Input Pulse Amplitude * **Constraint

Violation? f ( x ) f ( x ) f ( x ) f ( x ) Probability

Input distribution

Error x f ( x ) Excessive Positive Error

Balanced

Error

Probability

Inv. Input distribution

Error (1-2 a ) f ( x ) OldInvert Input x x x Positive Input Dist.

Negative Input Dist. % Drift (f)(b)

Positive Negative

Default Input Voltage Optimized

PA, PW,PWA (a) A , D A , D A , D A , D A , D A , D A , D AIDX A opt , D opt A opt1 , D opt1 A opt2 , D opt2 A opt3 , D opt3 A opt4 , D opt4 A opt5 , D opt5 A opt6 , D opt6 A opt7 , D opt7 (a) (b) (e) AIDXBaseline vs. AIDX

Baseline AIDX A base D base A opt D opt Optimized Amplitude & Duration Matrix for input pulses

Fig. 3. (a) AIDX’s optimized parameters A and D mapped onto input voltage pulses. (b) Proportional input inversion to balance memristance drift errorand prevent constraint violations. (c) AIDX Design ﬂowchart. (d) AIDX optimized parameters for duration ( D opt ) and amplitude ( A opt ) are applied to eachCNN layer’s input separately. (e) Simulated memristance drift with and without AIDX over time with 15% device-to-device variations. To better observe thetotal state-drift, all devices initial conductance are set to . and a positively skewed pre-generated random sequence of voltage pulses were applied tohalf the memristors and a negatively skewed pulses to the other half. (f) Percentage change in conductance of the same simulated memristors shown in Fig.1(d) by utilizing AIDX. a unidirectional aggregate drift include: the device switchingspeed is not the same in the set and reset direction and mostnon-biased memristors are being initialized close to the highresistive state where the drift speed is strongly skewed in theset direction (Fig 2(d)).IV. R ESULTS AND D ISCUSSION

In this paper, all simulations are performed using ourextended VTEAM memristor model [14] by including theeffects of sub-threshold state-drift whose parameters are ﬁtaccording to the experimental data shown in Fig. 2. We inte-grate this memristor model into our existing 1T1R memristorcrossbar simulation to simulate both memristance drift andcrossbar non-idealities like sneak paths and line resistance.A differential weight mapping scheme is used where eachelement is mapped onto a pair of memristors where onememristor represents positive values and the other representsnegative values. In Fig. 4(a), AIDX is tested across 10 baselinetasks from the Proben1 benchmark datasets [14]. We trained ashallow 1-hidden layer NN for all of these tasks. While thereare large variations in baseline performance across differenttasks, it should be noted that all baseline tasks ended ataround the same classiﬁcation accuracy as random guessingdue to some tasks having more classiﬁcation categories thanothers. To verify our solution’s effectiveness for more prac-tical applications, we adopted AIDX for a selection of CNNarchitectures on the CIFAR10 dataset. The CNN memristorcrossbar mapping scheme used is similar to the one foundin [15]. Fig. 4(b) compares the performance of 10 differentCNN architectures between AIDX and the baseline model. As compared to the shallow NNs used for the

Proben1 datasets,the CNNs had an overall higher speed of accuracy degradation.The worse performance of CNNs is to be expected becauseof error propagation from one layer to the next amplifyingthe effect of memristance drift. The error in column j of the l + 1 − th layer in a fully connected NN is: E j,l +1 = n (cid:88) i V i,l +1 ( σ ( E i,l ) + ∆ G ij,l +1 ) (9)Here, n is the total length of the input vector and σ isthe activation function of the l − th layer. Due to the largenumber of parameters in modern CNNs, BFGS optimizationin AIDX is performed sequentially layer by layer to reduceoptimization time. Applying AIDX to the selected CNNsprovided consistent improvements in classiﬁcation accuracy onCIFAR10. The consistent improvement in AIDX performanceacross varying sizes and designs of CNNs demonstrates theproposed method ﬂexibility across different crossbar sizesand structure. In addition to classiﬁcation tasks, we wantedto demonstrate AIDX’s effectiveness in a different type ofmemristor crossbar application. Fig. 4(c) shows the results ofimage reconstruction with the MNIST dataset. For this task, a1-hidden layer auto-encoder with hidden units was trainedoff-chip which corresponds to a . × compression factor.With AIDX, the average mean squared error has improved by . over the baseline after inference operations.V. O VERHEAD A NALYSIS AND C OMPARISON

Different state-drift mitigation techniques have been com-pared with two different AIDX conﬁgurations optimized for

UBMITTED TO IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS-II: EXPRESS BRIEFS 5

Fig. 4. (a) Classiﬁcation accuracy comparison between baseline and AIDX across baseline tasks from Proben1 datasets. (b) Classiﬁcation accuracy comparisonacross different CNN architectures on CIFAR-10 image classiﬁcation dataset. (c) Sample reconstructed MNIST images and average image reconstruction errorfrom baseline and AIDX-enhanced auto encoders.TABLE IC

OMPARATIVE ANALYSIS OF

AIDX.

Methods [5] [6] AIDX-A AIDX-P

Power overhead (%) - +1 .

61 +3 .

27 + . Area overhead (%) .

34 0 0

Performance life-time . × . × . × . × Scalability vs Baseline

Worse Worse

Better Better

Accuracy improvement (%) . . . . Include non-idealities

No No

Yes Yes accuracy (AIDX-A) and power efﬁciency (AIDX-P) to im-plement a MLP network in Table 1. AIDX-A is the baselineAIDX method discussed in previous sections while AIDX-P adds a L2 regularization terms for A and D as follows: min A,D ( E Drift ( A , D ) + λ (cid:80) A + λ (cid:80) D ) . Where λ and λ are regularization constants and regularizing the voltageamplitude and width ratios allows AIDX-P to reduce the pas-sive crossbar power consumption. For the sake of consistency,we use the same estimates of peripheral power consumptionas [10]. Crossbar power consumption in AIDX is computedas the average power consumed across the memristors in oneinference operation. Area overhead is deﬁned as the percentageincrease in on-chip area required for the memristance driftsolution due to peripherals, external circuit, and other items.Accuracy improvement is the increase in classiﬁcation accu-racy provided by a solution over the baseline model in a 1-hidden layer MLP at the end of the baseline models deﬁnedlifetime. Performance lifetime is deﬁned as the amount oftime required for a system to degrade to classiﬁcationaccuracy on the MNIST dataset. We chose this metric as anaxis of comparison primarily because it is used in [10] and iseasily adaptable to the Interrupt and Benchmark method usedin [9]. Scalability is a measure of how well a memristance driftsolutions performance and overhead scales with crossbar sizeand additional layers in NN applications. Time overhead is notshown in Table 1 because there is negligible time overheadintroduced by all solutions presented as compared to theirrespective baseline models.VI. C ONCLUSION

In this paper, we propose a new inference scheme based onvoltage signal optimization called AIDX to reduce the impactof memristance drift on memristor crossbar MAC operations.By optimizing the voltage pulse width and amplitude input mapping, AIDX is ﬂexible and effective across a differentrange of tasks including classiﬁcation and image reconstruc-tion. AIDX minimizes the computational error due to memris-tance drift. AIDX provides up to a and . increasein classiﬁcation accuracy on the CIFAR-10 datasets and imagereconstruction of MNIST dataset, respectively. In addition, wehave proposed an extension to the popular VTEAM modelto more precisely simulate memristor behaviour below theswitching voltage thresholds.A CKNOWLEDGMENT

This work is supported by NSERC HIDATA project andERC-CoG IONOS n773228.R

EFERENCES[1] P. Yao et al., Fully hardware-implemented memristor convolutional neuralnetwork, Nature, vol. 577, no. 7792, pp. 641646, 2020.[2] C. Li et al., Efﬁcient and self-adaptive in-situ learning in multilayermemristor neural networks, Nature comm., vol. 9, no. 1, pp. 18, 2018.[3] T. Chang, et al., Short-Term Memory to Long-Term Memory Transitionin a Nanoscale Memristor, ACS Nano, vol. 5, no. 9, pp. 76697676, 2011.[4] S. Oh, et al., ”The Impact of Resistance Drift of Phase Change Memory(PCM) Synaptic Devices on Artiﬁcial Neural Network Performance,” inIEEE Electron Device Letters, vol. 40, no. 8, pp. 1325-1328, 2019.[5] S. S-. Sheu et al., ”A 4Mb embedded SLC resistive-RAM macro with7.2 ns read-write random-access time and 160ns MLC-access capability”,2011 IEEE Int. Solid-State Circuits Conf. (ISSCC), pp. 200-202, 2011.[6] Y. Chen, et al., ”A nondestructive self-reference scheme for spin-transfertorque random access memory (STT-RAM)”, 2010 Design Autom. & Testin Europe Conf. & Exh. (DATE), pp. 148-153, 2010.[7] D. Niu, et al., ”Low power memristor-based RERAM design with errorcorrecting code”, 2012 17th Asia and South Paciﬁc Design Autom. Conf.(ASP-DAC), pp. 79-84, 2012.[8] V. Joshi, et al., ”Accurate deep neural network inference using compu-tational phase-change memory” Nature Comm., vol. 11, no. 1, pp. 1-13,2020.[9] B. Li, et al., ”Memristor-based approximated computation,” 2013 Int.Symp. on Low Power Elec. and Design (ISLPED), pp. 242-247, 2013.[10] B. Yan, et al., ”A closed-loop design to enhance weight stability ofmemristor based neural network chips,” 2017 IEEE/ACM Int. Conf. onComputer-Aided Design (ICCAD), pp. 541-548, 2017.[11] S. Kvatinsky, et al., ”VTEAM: A General Model for Voltage-ControlledMemristors,” in IEEE Trans. on Circuits and Sys. II: Exp. Briefs, vol. 62,no. 8, pp. 786-790, 2015.[12] R. Fletcher, ”Practical methods of optimization. J. Wiley &&