AIDX: Adaptive Inference Scheme to Mitigate State-Drift in Memristive VMM Accelerators
Tony Liu, Amirali Amirsoleimani, Fabien Alibart, Serge Ecoffey, Dominique Drouin, Roman Genov
SSUBMITTED TO IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS-II: EXPRESS BRIEFS 1
AIDX: Adaptive Inference Scheme to MitigateState-Drift in Memristive VMM Accelerators
Tony Liu, Amirali Amirsoleimani, Fabien Alibart, Serge Ecoffey, Dominique Drouin, and Roman Genov
Abstract —An adaptive inference method for crossbar (AIDX)is presented based on an optimization scheme for adjusting theduration and amplitude of input voltage pulses. AIDX minimizesthe long-term effects of memristance drift on artificial neuralnetwork accuracy. The sub-threshold behavior of memristor hasbeen modeled and verified by comparing with fabricated devicedata. The proposed method has been evaluated by testing ondifferent network structures and applications, e.g., image recon-struction and classification tasks. The results showed an averageof improvement in convolutional neural network (CNN)performance on CIFAR10 after inference operations aswell as . error reduction in image reconstruction. Index Terms —Memristor, Crossbar, Vector-Matrix Multiplica-tion, Inference, State-Drift, Neural Network.
I. I
NTRODUCTION R RESISTIVE switching memory crossbars have emergedas potentially high-speed and low-power accelerators forvector-matrix multiplication (VMM) [1], [2]. However, non-idealities and defects in these platforms dramatically impactthe neural network (NN) performance and accuracy. One of thesignificant and not extensively studied non-ideal phenomenais memristance drift [3] and it occurs in different types ofresistive switching memory technologies in various ways.For instance, phase change memories (PCM) will experienceincreasing resistance due to drift, even when there is no voltageapplied over the cell [4]. On the other hand, for memristors,state-drift from their programmed state happens as a result ofmany repeated VMM operations which leads to the computa-tional accuracy degradation (Fig. 1). Previous studies [5]–[7]on memristance drift in memristor technology have mainlybeen focused on high-density memory where memristors areused solely for storage rather than computation. More recentreports on drift [8] for computational memristor crossbarsinclude an inline calibration approach [9] which involvesoptimizing the calibration time of the memristor crossbar.By performing polynomial fitting on the computational errordata, a . calibration efficiency is achieved. A closed-loop weight compensation based solution is presented in [10]which minimizes the effects of state-drift by increasing thecomputational service lifetime by . × and results in ap-proximately computational accuracy degradation within read operations. In this brief, we present an adaptiveinference scheme (AIDX) as a flexible optimization procedurethat automatically adapts to existing crossbar non-idealitiesand circuit parasitics and can be applied to any VMM-basedtask. According to experimentally verified simulations, AIDX Tony Liu, Amirali Amirsoleimani, and Roman Genov are with the Depart-ment of Electrical and Computer Engineering, University of Toronto, 10 KingsCollege Road, Toronto, Ontario, Canada. Fabien Alibart, Serge Ecoffey andDominique Drouin are with the Interdisciplinary Institute for TechnologicalInnovation - 3IT, University of Sherbrooke, Qc, Canada. (A. Amirsoleimanicorresponding author email: [email protected]). cuts accuracy loss due to the device state-drift in modernconvolutional neural networks (CNN) by more than aswell a . error reduction in image reconstruction.II. P RELIMINARY
A. Impact of Memristance Drift on Crossbar MAC Operations
Memristance drift is defined as the unintended smallchanges in memristor conductance caused by a low-voltageread/inference operation. Ideally, for the ideal weight distri-bution ( G ) the output current j -th column I j is given by I j = (cid:80) i G ij V i (Fig. 1(a)). We can define the memristancedrift caused by the k -th inference operation as δG k and theconductance of the memristor at the ( k + 1) -th iteration as: G k +1 = G k + δG k (1)The total memristance drift due to the k -th operation is ∆ G = (cid:80) ki δG i . As such, the real output current of the j -th column at the k -th operation is I (cid:48) j = (cid:80) i ( G ij + ∆ G ij ) V i .The current error I − I (cid:48) due to memristance drift can be quiteproblematic in larger crossbars because current scales withcrossbar size. However, a differential mapping scheme canprevent the build-up of memristance drift error in very largearrays because the error in the positive column will scale atthe same rate as the negative column. Fig. 1(b) illustrates theconcept of small changes in NN weights accumulating intomuch larger errors in the output layer. Fig. 1(c) illustrates3D structure of the network in Fig. 1(b). Fig. 1(d) showssample heatmaps of simulated × array of memristorsconductance changes due to memristance drift.The bottom rowrepresents the bias weights of a NN and they are initiallymapped to a high-conductance state which is why it is the onlyrow with reduction in overall conductance. Fig. 1(e) examinesthe state-drift impact on MNIST classification task for multilayer perceptron network with various number of hidden layersand Fig. 1(f) illustrates the difference between above- and sub-threshold memristor switching. B. Memristance drift modeling and analysis
Behaviour-based memristor models are typically used inmemristor crossbar simulations due to their simplicity andlight computational load. However, most behaviour-basedmodels do not consider memristance drift and approximate theinternal state change due to an applied sub-threshold voltageto be zero. To address this issue, we propose an extension tothe popular VTEAM model [11] that accounts for the minutechanges in internal state due to sub-threshold voltages. For a r X i v : . [ c s . ET ] S e p UBMITTED TO IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS-II: EXPRESS BRIEFS 2 V inM,1 V in V in = Σ(V M × G M,N ) Memristance Drift
SkewedDistribution V in K V in K V inM,K I out I out I outN ,1 I out K I out K I outN , K K -Inference Steps
Input Vectors O u t pu t V e c t o r s V in V in V in V in I out1 I out2 I out3 Memristor (a) (b) (c)(d) % Drift (f)(e) V th V th Sub- thresholdOver- thresholdΔG G Weights
Ideal Weight Distribution G G G N G G G N G M ,1 G M ,2 G M , N Error x x x x a a a x x a a x x a x x a a x x a o f I n f e r e n c e ( K ) G+ΔG low High
Fig. 1. (a) Neural network (NN) forward pass VMM performed on a memristor crossbar. (b) Degradation of neural network weights and output accuracyover time. (c) 3D memristor crossbar array structure. (d) Percentage change of memristors conductance in a randomly chosen subset of NN weights doneon a simulated memristor crossbar. (e) Impact of simulated state-drift on MNIST classification accuracy with varying number of hidden layer. (f) Sub- andover-threshold behaviour of device transition oxide filament. model consistency, we adopt a similar mathematical structurein the sub- and above-threshold region: dw ( t ) dt = k s,off · (cid:16) v ( t ) v off (cid:17) α s,off · f s,off ( w ) , if ≤ v < v off k s,on · (cid:16) v ( t ) v on (cid:17) α s,on · f s,on ( w ) , if v on < v < (2)Here, v off and v on represent the RESET and SET voltagethresholds respectively. w ( t ) is the internal state variableand is related to the resistance R of the memristor as R ( t ) = R off w ( t ) + R on (1 − w ( t )) . k s,off and k s,on arefitting parameters that represent the rate of ion migrationat any given applied sub-threshold voltage. Similarly, α s,off and α s,on are parameters that characterize the exponentialrelationship between speed of ion migration and the appliedvoltage. f s,on ( w ) and f s,off ( w ) are window functions thatbounds the state between and . The time derivative of theresistance can be expressed as: dR ( t ) dt = R off dw ( t ) dt − R on dw ( t ) dt (3) R on and R off are low and high resistance state of device,respectively. Cycle-to-cycle and device-to-device variations insub-threshold drift speed are modelled by adding 15% randomGaussian noise to k and α parameters. The probability densityfunction (PDF) of k on is shown in Eqn. (4) where k on isthe ideal, fitted parameter and x represents k on with addedGaussian noise. The PDF of the other k and α parametersfollow the same structure as Eqn. (4). f kon ( x ) = 1 √ . k on π e − ( x − kon )20 . kon (4)To validate our proposed model, the VTEAM extension isapplied to TiOx-based memristor device (Fig. 2(a)) data.The extended VTEAM k and α parameters were fit usingsimulated annealing algorithms and gradient descent withSET and RESET voltage thresholds of − . V and . V.Fig. 2(b-c) illustrates that the extended VTEAM models sub-threshold memristor behaviour much more accurately than theoriginal VTEAM. Fig. 2(d) shows a 3D plot of how memristorswitching behavior and conductance changes with internalstate w and applied voltage in sub-threshold region. (a) (b)(c) (d)Memristor Device Structure Highly Doped Si Substrate
TiN
Highly Doped Si Substrate Al TiNAl O Ti /TiO Fig. 2. (a) Memristor device structure used for fitting. (b) Comparison of I - V fitting of the extended VTEAM with experimental data and the original modelin sub-threshold region for SET operation ( k s,on = − . × − , α s,on =6 ). For clarity, we extracted a best fit curve to represent the experimental datafrom over the threshold and extrapolated the curve into sub-threshold regionby keeping the gradual drift trend. (c) Fitting comparison with experimental data for RESEToperation ( k s,off = 1 . × − , α s,off = 5 ). (d) 3Dcharacterization of the extended VTEAM for the same devicewith respect to the internal state variable w and voltage.III. M ETHODOLOGY
A. Problem and Assumptions
By formulating the issue of memristance drift as an opti-mization problem, we can develop an optimization scheme tominimize accuracy degradation. With no memristance drift, theideal mean squared error (MSE) is E = (cid:80) j ( y j − (cid:80) i G ij V i ) and the real MSE at the k -th operation is E k = (cid:80) j ( y j − (cid:80) i ( G ij + ∆ G ij ) V i ) . Where V i is the voltage applied to i -th row and ∆ G ij is the total memristance drift of the ij -thmemristor from its originally programmed value. We definethe error due to memristance drift E Drift as the differencein MSE between the initially programmed state ( E ) and the k -th inference operation ( E k ). As an optimization problem,the goal is to minimize the increase of E Drift with respectto time. The change in conductance due to memristance drift,
UBMITTED TO IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS-II: EXPRESS BRIEFS 3 ∆ G , can mainly be optimized by input to voltage amplitudemapping and relative inference voltage pulse width. Factorsthat cannot be easily changed such as specific memristorcharacteristics and overall crossbar structure will be ignoredin the optimization procedure. B. Optimization Methodology
We will frame the minimization of E Drift as an uncon-strained optimization problem where A is the input to voltageamplitude mapping and D is the relative voltage pulse width: min A,D E Drift ( A , D ) (5) D is a vector that represents the length of a positive inputread pulse relative to a negative read pulse. For instance, agiven row can have a positive read pulse of ns whilethe negative read pulse is only ns long. Similarly, A is a vector whose elements represent the relative inferencevoltage pulse amplitude ratio for positive to negative inputs(Fig. 3a). In summary, AIDX modifies the amplitude andduration of inference voltage pulses to minimize memristancedrift for a given task. Even the minimum allowable voltagepulse amplitude and widths will still result in noticeablememristance drift after many inference operations. As such,AIDX is required to minimize aggregate memristance driftthrough balancing the total drift in the SET and RESETdirections. We use the popular Broyden-Fletcher-Goldfarb-Shannon (BFGS) algorithm [12] for this optimization problem.The BFGS algorithm is a quasi-Newton method that relieson the gradient of the objective function to find the optimalsolution. However, E Drift is an unknown function that canonly be evaluated, so the ∇ E Drift had to be approximatedusing finite-difference approach. Gradient-free optimizationmethods like Nelder-Mead simplex method [13] were alsoexplored, but quasi-Newton algorithms were most effective forthis problem.
C. Constraint Violations
Normally, the optimized voltage amplitude ratio A andwidth ratio D can be reasonably used. However, there arecertain cases where elements of the optimal A and D arefar too large or too small to be implemented practically,typically when the device characteristics and input distributionare heavily skewed. To address this issue, we will first framethe optimization problem through a different lens. Let’s startwith a simplified scenario of a single memristor with input data Algorithm 1:
BFGS Algorithm1. Obtain a direction p k through solving B k p k = −∇ f ( x k )
2. Perform Line Search to find step size α k such that α k = argminf ( x k + α p k ) s k = α k p k x k +1 = x k + s k y k = ∇ f ( x k +1 ) − ∇ f ( x k ) B k +1 = B k + y k y Tk y Tk s k − B k s k s Tk B Tk s Tk B k s k
7. Repeat 1-6 until x converges. modelled by the discrete random variable X with a probabilitydensity function (PDF) of f ( x ) . Defining the time derivativeof the internal state w for a given x as dw ( x ) dx = g ( x ) . (6)In our sub-threshold model, g ( x ) is the same as Eqn. (2) as v ( t ) replaced with v ( x ) which represents the mapping functionof input x to voltage amplitude v ( x ) . The average rate ofmemristance drift given input distribution X is as follows: E [ dw ( x ) dx ] = (cid:88) x g ( x ) f ( x ) . (7)The optimization problem over E Drift can also be reframedas minimizing (cid:12)(cid:12)(cid:12) E [ dw ( x ) dt ] (cid:12)(cid:12)(cid:12) over all memristors where the g ( x ) parameters for each memristor is sampled according to Eqn.(4) to account for device-to-device variations. The AIDXscheme defined so far only affects g ( x ) , but has not yetmade any adjustments related to f ( x ) . If we allow AIDXto first optimize over f ( x ) , the issue of impractical A or D can be circumvented. One of the only useful recoverabletransformations of the input vector x is inversion throughmultiplying by − . By inverting a random proportion a ofthe input data, the input PDF is transformed into f (cid:48) ( x ) : f (cid:48) ( x ) = (1 − a ) f ( x ) , < a < (8)Impractical A or D only occur when either (cid:12)(cid:12)(cid:12) E [ dw ( x ) dt ] (cid:12)(cid:12)(cid:12) >> or (cid:12)(cid:12)(cid:12) E [ dw ( x ) dt ] (cid:12)(cid:12)(cid:12) << before applying AIDX. As such, if we canoptimize (cid:12)(cid:12)(cid:12) E [ dw (cid:48) ( x,a ) dt ] (cid:12)(cid:12)(cid:12) over a , (cid:12)(cid:12)(cid:12) E [ dw ( x ) dt ] (cid:12)(cid:12)(cid:12) will be brought closeto before optimizing A and D which will therefore preventany constraint violations. Fig. 3(b) illustrates our approach toconstraint violations. D. General Solution Flow
As it can be seen in Fig. 3(c), during pre-processing,optimization is done in three separate scenarios to guaranteeoptimal fitting parameters. Once hardware constraint violationsare resolved with input data inversion, the input circuits e.g.digital-to-analog converters (DACs) are adjusted to fit theoptimized input to voltage signal mapping parameters. Themajority of AIDX takes place during pre-processing whichonly needs to be done once for any given task. The onlydifference in the AIDX inference operation as compared toa normal inference operation is to recover the intended outputcurrent from an inverted output through multiplying by − .Fig. 3(d) summarizes the general pipeline of AIDX. Fig. 3(e)shows the evolution of memristor state due to memristancedrift for AIDX and the baseline model and Fig. 3(f) is aheatmap of a portion of the memristor crossbar at 1000 and10000 inference steps where the bottom row represents thebias. While memristance drift is a phenomenon that can causememristors to switch in both the set and reset direction asseen in Fig. 3(e), almost all memristors within a crossbar willtypically drift in only one direction for image-based applica-tions. These inputs are almost entirely positive which causesan aggregate drift in the reset direction. Other reasons for UBMITTED TO IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS-II: EXPRESS BRIEFS 4 (c)(d)
AIDX Preprocessing AIDX Inference
Get Input Vector X Map X to Voltage Pulses ( V in ) based on V opt Perform Inference:
Apply V in and Sense I out Inverted input?
Invert output: I * out = - I out Final output Y
YESNO
Map Matrix G onto Crossbar
Optimization on PAOptimization on PW&PAOptimization on PWChoose Best Performing
Parameter as V opt YES
Adjust DAC to fit V opt NO Start
BFGS
Optimization
Proportional Input Data
InversionRedo Optimization by
Transformed Data* PW: Input Pulse Width ** PA: Input Pulse Amplitude * **Constraint
Violation? f ( x ) f ( x ) f ( x ) f ( x ) Probability
Input distribution
Error x f ( x ) Excessive Positive Error
Balanced
Error
Probability
Inv. Input distribution
Error (1-2 a ) f ( x ) OldInvert Input x x x Positive Input Dist.
Negative Input Dist. % Drift (f)(b)
Positive Negative
Default Input Voltage Optimized
PA, PW,PWA (a) A , D A , D A , D A , D A , D A , D A , D AIDX A opt , D opt A opt1 , D opt1 A opt2 , D opt2 A opt3 , D opt3 A opt4 , D opt4 A opt5 , D opt5 A opt6 , D opt6 A opt7 , D opt7 (a) (b) (e) AIDXBaseline vs. AIDX
Baseline AIDX A base D base A opt D opt Optimized Amplitude & Duration Matrix for input pulses
Fig. 3. (a) AIDX’s optimized parameters A and D mapped onto input voltage pulses. (b) Proportional input inversion to balance memristance drift errorand prevent constraint violations. (c) AIDX Design flowchart. (d) AIDX optimized parameters for duration ( D opt ) and amplitude ( A opt ) are applied to eachCNN layer’s input separately. (e) Simulated memristance drift with and without AIDX over time with 15% device-to-device variations. To better observe thetotal state-drift, all devices initial conductance are set to . and a positively skewed pre-generated random sequence of voltage pulses were applied tohalf the memristors and a negatively skewed pulses to the other half. (f) Percentage change in conductance of the same simulated memristors shown in Fig.1(d) by utilizing AIDX. a unidirectional aggregate drift include: the device switchingspeed is not the same in the set and reset direction and mostnon-biased memristors are being initialized close to the highresistive state where the drift speed is strongly skewed in theset direction (Fig 2(d)).IV. R ESULTS AND D ISCUSSION
In this paper, all simulations are performed using ourextended VTEAM memristor model [14] by including theeffects of sub-threshold state-drift whose parameters are fitaccording to the experimental data shown in Fig. 2. We inte-grate this memristor model into our existing 1T1R memristorcrossbar simulation to simulate both memristance drift andcrossbar non-idealities like sneak paths and line resistance.A differential weight mapping scheme is used where eachelement is mapped onto a pair of memristors where onememristor represents positive values and the other representsnegative values. In Fig. 4(a), AIDX is tested across 10 baselinetasks from the Proben1 benchmark datasets [14]. We trained ashallow 1-hidden layer NN for all of these tasks. While thereare large variations in baseline performance across differenttasks, it should be noted that all baseline tasks ended ataround the same classification accuracy as random guessingdue to some tasks having more classification categories thanothers. To verify our solution’s effectiveness for more prac-tical applications, we adopted AIDX for a selection of CNNarchitectures on the CIFAR10 dataset. The CNN memristorcrossbar mapping scheme used is similar to the one foundin [15]. Fig. 4(b) compares the performance of 10 differentCNN architectures between AIDX and the baseline model. As compared to the shallow NNs used for the
Proben1 datasets,the CNNs had an overall higher speed of accuracy degradation.The worse performance of CNNs is to be expected becauseof error propagation from one layer to the next amplifyingthe effect of memristance drift. The error in column j of the l + 1 − th layer in a fully connected NN is: E j,l +1 = n (cid:88) i V i,l +1 ( σ ( E i,l ) + ∆ G ij,l +1 ) (9)Here, n is the total length of the input vector and σ isthe activation function of the l − th layer. Due to the largenumber of parameters in modern CNNs, BFGS optimizationin AIDX is performed sequentially layer by layer to reduceoptimization time. Applying AIDX to the selected CNNsprovided consistent improvements in classification accuracy onCIFAR10. The consistent improvement in AIDX performanceacross varying sizes and designs of CNNs demonstrates theproposed method flexibility across different crossbar sizesand structure. In addition to classification tasks, we wantedto demonstrate AIDX’s effectiveness in a different type ofmemristor crossbar application. Fig. 4(c) shows the results ofimage reconstruction with the MNIST dataset. For this task, a1-hidden layer auto-encoder with hidden units was trainedoff-chip which corresponds to a . × compression factor.With AIDX, the average mean squared error has improved by . over the baseline after inference operations.V. O VERHEAD A NALYSIS AND C OMPARISON
Different state-drift mitigation techniques have been com-pared with two different AIDX configurations optimized for
UBMITTED TO IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS-II: EXPRESS BRIEFS 5
Fig. 4. (a) Classification accuracy comparison between baseline and AIDX across baseline tasks from Proben1 datasets. (b) Classification accuracy comparisonacross different CNN architectures on CIFAR-10 image classification dataset. (c) Sample reconstructed MNIST images and average image reconstruction errorfrom baseline and AIDX-enhanced auto encoders.TABLE IC
OMPARATIVE ANALYSIS OF
AIDX.
Methods [5] [6] AIDX-A AIDX-P
Power overhead (%) - +1 .
61 +3 .
27 + . Area overhead (%) .
34 0 0
Performance life-time . × . × . × . × Scalability vs Baseline
Worse Worse
Better Better
Accuracy improvement (%) . . . . Include non-idealities
No No
Yes Yes accuracy (AIDX-A) and power efficiency (AIDX-P) to im-plement a MLP network in Table 1. AIDX-A is the baselineAIDX method discussed in previous sections while AIDX-P adds a L2 regularization terms for A and D as follows: min A,D ( E Drift ( A , D ) + λ (cid:80) A + λ (cid:80) D ) . Where λ and λ are regularization constants and regularizing the voltageamplitude and width ratios allows AIDX-P to reduce the pas-sive crossbar power consumption. For the sake of consistency,we use the same estimates of peripheral power consumptionas [10]. Crossbar power consumption in AIDX is computedas the average power consumed across the memristors in oneinference operation. Area overhead is defined as the percentageincrease in on-chip area required for the memristance driftsolution due to peripherals, external circuit, and other items.Accuracy improvement is the increase in classification accu-racy provided by a solution over the baseline model in a 1-hidden layer MLP at the end of the baseline models definedlifetime. Performance lifetime is defined as the amount oftime required for a system to degrade to classificationaccuracy on the MNIST dataset. We chose this metric as anaxis of comparison primarily because it is used in [10] and iseasily adaptable to the Interrupt and Benchmark method usedin [9]. Scalability is a measure of how well a memristance driftsolutions performance and overhead scales with crossbar sizeand additional layers in NN applications. Time overhead is notshown in Table 1 because there is negligible time overheadintroduced by all solutions presented as compared to theirrespective baseline models.VI. C ONCLUSION
In this paper, we propose a new inference scheme based onvoltage signal optimization called AIDX to reduce the impactof memristance drift on memristor crossbar MAC operations.By optimizing the voltage pulse width and amplitude input mapping, AIDX is flexible and effective across a differentrange of tasks including classification and image reconstruc-tion. AIDX minimizes the computational error due to memris-tance drift. AIDX provides up to a and . increasein classification accuracy on the CIFAR-10 datasets and imagereconstruction of MNIST dataset, respectively. In addition, wehave proposed an extension to the popular VTEAM modelto more precisely simulate memristor behaviour below theswitching voltage thresholds.A CKNOWLEDGMENT
This work is supported by NSERC HIDATA project andERC-CoG IONOS n773228.R
EFERENCES[1] P. Yao et al., Fully hardware-implemented memristor convolutional neuralnetwork, Nature, vol. 577, no. 7792, pp. 641646, 2020.[2] C. Li et al., Efficient and self-adaptive in-situ learning in multilayermemristor neural networks, Nature comm., vol. 9, no. 1, pp. 18, 2018.[3] T. Chang, et al., Short-Term Memory to Long-Term Memory Transitionin a Nanoscale Memristor, ACS Nano, vol. 5, no. 9, pp. 76697676, 2011.[4] S. Oh, et al., ”The Impact of Resistance Drift of Phase Change Memory(PCM) Synaptic Devices on Artificial Neural Network Performance,” inIEEE Electron Device Letters, vol. 40, no. 8, pp. 1325-1328, 2019.[5] S. S-. Sheu et al., ”A 4Mb embedded SLC resistive-RAM macro with7.2 ns read-write random-access time and 160ns MLC-access capability”,2011 IEEE Int. Solid-State Circuits Conf. (ISSCC), pp. 200-202, 2011.[6] Y. Chen, et al., ”A nondestructive self-reference scheme for spin-transfertorque random access memory (STT-RAM)”, 2010 Design Autom. & Testin Europe Conf. & Exh. (DATE), pp. 148-153, 2010.[7] D. Niu, et al., ”Low power memristor-based RERAM design with errorcorrecting code”, 2012 17th Asia and South Pacific Design Autom. Conf.(ASP-DAC), pp. 79-84, 2012.[8] V. Joshi, et al., ”Accurate deep neural network inference using compu-tational phase-change memory” Nature Comm., vol. 11, no. 1, pp. 1-13,2020.[9] B. Li, et al., ”Memristor-based approximated computation,” 2013 Int.Symp. on Low Power Elec. and Design (ISLPED), pp. 242-247, 2013.[10] B. Yan, et al., ”A closed-loop design to enhance weight stability ofmemristor based neural network chips,” 2017 IEEE/ACM Int. Conf. onComputer-Aided Design (ICCAD), pp. 541-548, 2017.[11] S. Kvatinsky, et al., ”VTEAM: A General Model for Voltage-ControlledMemristors,” in IEEE Trans. on Circuits and Sys. II: Exp. Briefs, vol. 62,no. 8, pp. 786-790, 2015.[12] R. Fletcher, ”Practical methods of optimization. J. Wiley &&