Rethinking Non-idealities in Memristive Crossbars for Adversarial Robustness in Neural Networks
11 Rethinking Non-idealities in MemristiveCrossbars for Adversarial Robustness in NeuralNetworks
Abhiroop Bhattacharjee, and Priyadarshini PandaDepartment of Electrical Engineering, Yale University, USA
Abstract — Deep Neural Networks (DNNs) have been shown to be prone to adversarial attacks. With a growing need to enableintelligence in embedded devices in this
Internet of Things (IoT) era, secure hardware implementation of DNNs has becomeimperative. Memristive crossbars, being able to perform
Matrix-Vector-Multiplications (MVMs) efficiently, are used to realize DNNs onhardware. However, crossbar non-idealities have always been devalued since they cause errors in performing MVMs, leading todegradation in the accuracy of the DNNs. Several software-based adversarial defenses have been proposed in the past to make DNNsadversarially robust. However, no previous work has demonstrated the advantage conferred by the non-idealities present in analogcrossbars in terms of adversarial robustness. In this work, we show that the intrinsic hardware variations manifested through crossbarnon-idealities yield adversarial robustness to the mapped DNNs without any additional optimization. We evaluate resilience ofstate-of-the-art DNNs (VGG8 & VGG16 networks) using benchmark datasets (CIFAR-10 & CIFAR-100) across various crossbar sizestowards both hardware and software adversarial attacks. We find that crossbar non-idealities unleash greater adversarial robustness( > − ) in DNNs than baseline software DNNs. We further assess the performance of our approach with other state-of-the-artefficiency-driven adversarial defenses and find that our approach performs significantly well in terms of reducing adversarial losses. Index Terms —Deep Neural Networks, Memristive crossbars, Non-idealities, Adversarial robustness (cid:70)
NTRODUCTION
In the recent years, resistive crossbar systems have re-ceived significant focus for their ability to realize
Deep Neu-ral Networks (DNNs) by effciently computing analog dot-products [1], [2], [3]. These systems have been realized usinga wide range of emerging technologies such as,
ResistiveRAM (ReRAM),
Phase Change Memory (PCM) and Spintronicdevices [4], [5], [6]. These devices exhibit high on-chipstorage density, non-volatility, low leakage and low-voltageoperation and thus, enable compact and energy-efficientimplementation of DNNs [7], [8].Despite so many advantages, the analog nature of com-putation of dot-products in crossbars poses certain chal-lenges owing to device-level and circuit-level non-idealitiessuch as, interconnect parasitics, process variations in thesynaptic devices, driver and sensing resistances, etc. [8],[9]. Such non-idealities lead to errors in the analog dot-product computations in the crossbars, thereby adverselyaffecting DNN implementation in the form of accuracydegradation [10]. Numerous frameworks have been de-veloped in the past to model the impact of non-idealitiespresent in crossbar systems and accordingly, retraining theweights (stored in synaptic devices) of the DNNs to mitigateaccuracy degradation [9], [10], [11], [12].Crossbar-based non-idealities, thus, have so far been de-valued because they lead to accuracy degradation in DNNs.However, an interesting aspect of these non-idealities inproviding resilience to DNNs against adversarial attackshas been unexplored. DNNs have been shown to be ad-versarially vulnerable [13]. A DNN can easily be fooled byapplying structured, yet, small perturbations on the input,leading to high confidence misclassification of the input. This vulnerability severely limits the deployment and po-tential safe-use of DNNs for real world applications such asself-driving cars, malware detection, healthcare monitoringsystems etc. [14], [15]. Thus, it is imperative to ensure thatthe DNN models used for such applications are robustagainst adversarial attacks. Recent works such as [16], [17]show quantization methods, that primarily reduce computeresource requirements of DNNs, act as a straightforwardway of improving the robustness of DNNs against adver-sarial attacks. A recent work has led to the developmentof a framework called
QUANOS that provides a structuredmethod for hybrid quantization of different layers of aDNN to produce energy-efficient, accurate and adversari-ally robust models [15]. In [15], [16], the authors show thatefficiency-driven hardware optimization techniques can beleveraged to improve software vulnerability, such as, adver-sarial attacks, while yielding energy-efficiency. In this work,we present a comprehensive analysis on how device-leveland circuit-level non-idealities intrinsic to analog crossbarscan be leveraged for adversarial robustness in neural net-works. To the best of our knowledge, we are the first to showthat the intrinsic hardware variations manifested throughnon-idealities in crossbars intrinsically improve adversarialsecurity without any additional optimization. Our mainfinding is that-
A DNN model mapped on hardware, whilesuffering accuracy degradation, is also more adversarially resilientthan the baseline software DNN . Contributions:
In summary, the key contributions of thiswork are as follows: • We employ a systematic framework in
PyTorch [18] a r X i v : . [ c s . ET ] A ug to map DNNs onto resistive crossbar arrays andinvestigate the cumulative impact of various circuitand device-level non-idealities to confer adversarialrobustness. • We analyse the robustness of state-of-the-art DNNs, viz.
VGG8 and VGG16 [19] using benchmarkdatasets- CIFAR-10 and CIFAR-100 [20], respectively,across various crossbar dimensions. • We show that crossbar-based non-idealities impartrobustness in neural networks against both hardwareand software-based adversarial attacks. • We find that non-idealities lead to higher adversarialrobustness ( > − for both FGSM and PGD-based adversarial attacks on hardware) in DNNsmapped onto resistive crossbars than DNNs evalu-ated on software. • We investigate the role of various crossbar param-eters (such as R MIN ) in unleashing adversarial ro-bustness to DNNs mapped onto crossbars. We alsostudy the impact of input
Pixel Discretization pro-posed in [16] together with crossbar non-idealitieson adversarial robustness. • A comparison of our proposed method with otherstate-of-the-art quantization techniques is also pre-sented to emphasise the importance of hardwarenon-idealities in imparting resilience to DNNsagainst adversarial inputs.
ELATED WORKS
Prior research works have focused on modeling crossbarnon-idealities to mitigate the problem of accuracy degrada-tion incurred when DNNs are mapped onto them. Severalframeworks have been proposed such as,
CxDNN [10],that employs matrix-inversion techniques combined with
Kirchoff’s circuit laws to model the effect of interconnect par-asitics and peripheral non-idealities in the resistive crossbararrays. The authors in [21] have presented an approximationtechnique based on sample input/output behavior. How-ever, these analytical models take into account only lineardata-dependent non-idealities while modeling the crossbarinstances. Recent frameworks such as
GenieX [9] use aneural network-based approach to accurately encapsulatethe effects of both data dependent and non-data depen-dent non-idealities and assess their impact on accuracydegradation.
PUMA is the first
Instruction Set Architecture (ISA)-programmable inference accelerator based on hybridCMOS-memristor crossbar technology, which is designedto maintain crossbar area and energy efficiency as wellas storage density.
PUMA has been shown to outperformother state-of-the-art CPUs, GPUs, and ASICs for ML ac-celeration [7], [22], [23], [24]. Nevertheless, none of theaforementioned techniques or architectures have helpedunderstand the advantages that the intrinsic non-idealitiesof the crossbar structures may confer in terms of adversarialrobustness to DNNs.
In the recent years, several heuristic adversarial defensestrategies have been developed, including adversarial train-ing [13], [25], [26], [27], [28], randomization-based tech-niques [29], [30], [31] and denoising methods [32], [33],[34], [35]. However, these defenses might be broken by anew attack in the future since they lack a theoretical error-rate guarantee [36]. Hence, researchers have strived to de-velop certified defensive methods [37], [38], [39], [40], whichalways maintain a certain accuracy under a well-definedclass of attacks [36]. Even though the certified defensemethods indicate a way to reach theoretically guaranteedsecurity, their accuracy and efficiency are far from meetingthe practical requirements [36]. Apart from these, severalquantization-based methods on software have been pro-posed of late, including works like [15], [16], [17] to improveresilience of neural networks against adversarial perturba-tions. The work in [16] deals with discretization of the inputspace (or allowed pixel levels from 256 values or 8-bit to4-bit, 2-bit). It shows that input discretization improves theadversarial robustness of DNNs for a substantial range ofperturbations, besides improvement in its computationalefficiency with minimal loss in test accuracy. Likewise,
QUANOS [15] is a framework that performs layer-specifichybrid quantization of DNNs based on a metric termed as
Adversarial Noise Sensitivity (ANS) to make DNNs robustagainst adversarial perturbations. In contrast to prior works,we present a first of its kind work that comprehensivelystudies the inherent advantage of hardware non-idealitiestowards imparting adversarial robustness to DNNs withoutrelying upon other software-based optimization methodolo-gies. Note, we also show that combining previously pro-posed optimization strategies, such as pixel discretization,with analog crossbars further improves robustness.
ACKGROUND
DNNs are vulnerable to adversarial attacks in which themodel gets fooled by applying precisely calculated smallperturbations on the input, leading to high confidencemisclassification [15]. The authors in [25] have proposed amethod called
Fast Gradient Sign Method (FGSM) to generatethe adversarial input by linearization of the loss function( L ) of the trained models with respect to the input ( X ) asshown in equation (1). X adv = X + (cid:15) × sign ( ∇ x L ( θ, X, y true )) (1)Here, y true is the true class label for the input X; θ denotesthe model parameters (weights, biases etc.) and (cid:15) quantifiesthe degree of distortion.The quantity ∆ = (cid:15) × sign ( ∇ x L ( θ, X, y true )) is the netperturbation added to the input ( X ), which is controlled by (cid:15) . It is noteworthy that gradient propagation is, thus, a cru-cial step in unleashing an adversarial attack. Furthermore,the contribution of gradient to ∆ would vary for differentlayers of the network depending upon the activations [15].In addition to FGSM-based attacks, multi-step variants ofFGSM, such as Projected Gradient Descent (PGD) [13] havealso been proposed that cast stronger attacks.
To build resilience against against small adversarial per-turbations, defense mechanisms such as gradient maskingor obfuscation [41] have been proposed. Such methods con-struct a model devoid of useful gradients, thereby makingit difficult to create an adversarial attack.
Types of Attacks:
Broadly, attacks to evaluate adversar-ial robustness are classified as:
Black-Box (BB) and
White-Box (WB). WB attacks are launched when the attacker hascomplete knowledge of the target model parameters andtraining information. BB attacks, on the other hand, arelaunched when the attacker has no knowledge about thetarget model parameters. Resilience against WB adversariesalso guarantees resilience against the BB ones for similarperturbation ( (cid:15) ) range [15]. Thus, all our subsequent exper-iments are based on WB adversaries for the assessment ofadversarial robustness.In this work,
Clean Accuracy ( CA ) refers to the accuracyof a DNN when presented with the test dataset in absenceof an adversarial attack. We define Adversarial Accuracy ( AA ) as the accuracy of a DNN on the adversarial datasetcreated using the test data for a given task. Adversarial Loss ( AL ) is defined as the difference between CA and AA , i.e ., AL = CA − AA . Higher the value of AA , smaller will bethe value of AL , which implies increased robustness againstadversarial attacks. Resistive crossbar arrays can be harnessed to implement
Matrix-Vector-Multiplications (MVMs) in an analog man-ner. Crossbars (Fig. 1(a)) consist of 2D arrays of synapticdevices (programmable resistors realized using emergingnanotechnologies),
Digital-to-Analog (DAC), and
Analog-to-Digital (DAC) converters and a write circuit. The synapticdevices at the intersection of each row and column areconfigured to a particular value of conductance (that rangesfrom G MIN to G MAX ), by enabling the corresponding writecircuits along the
Write Wordline (WWL) and the
Bitline (BL). Thereafter, the MVMs are performed by convertingthe digital inputs to analog voltages on the
Read Wordlines (RWLs) using DACs, and sensing the output current flowingthrough the bitlines (BLs) using the ADCs [8].Equation (2) shows the ideal MVM operation for anMxN crossbar, for which V in is a 1xM vector comprisingthe input analog voltages, G ideal is the MxN conductancematrix (formed by mapping the weights of a DNN onto thecrossbar instances), and Iout ideal is a 1xN vector comprisingoutput currents.
Iout ideal = V in ∗ G ideal (2) Non-idealities:
The analog nature of the computationleads to various non-idealities resulting in errors in theMVMs. These include device-level and circuit-level non-idealities in the resistive crossbars. Fig. 1(b) shows theequivalent circuit for the crossbar array and its peripheralsaccounting for the non-idealities listed in TABLE 1. Thecircuit-level non-idealities have been modelled as parasiticresistances. The cumulative effect of all the non-idealitiesresults in the deviation of the output current from its idealvalue, resulting in an
Iout non − ideal vector. The relative TABLE 1Various circuit-level and device-level non-idealities in a resistivecrossbar array
Type of non-idealities Parameters
Circuit non-idealities Rdriver, Rwire row, Rwire col, RsenseDevice non-idealities Gaussian variation profile deviation of
Iout non − ideal from its ideal value is denotedby non-ideality factor (NF) [9] such that: N F = (
Iout ideal − Iout non − ideal ) /Iout ideal (3)Thus, increased non-idealities in crossbars can induce agreater value of NF. This can lead to a significant impacton the computational accuracy of crossbars and therefore,degradation in the accuracy of the DNNs implemented onhardware [8], [9], [10]. Crossbar Mapping:
In this work, we use a similarprocedure as that of [8], [10] for mapping DNNs ontocrossbars of various dimensions as shown in Fig. 3(b) .First, the weights of each layer of the DNN are partitionedbased on the size of the crossbar array used and mappedonto the crossbar instances. Thereafter, the correspondingconductance for each value of DNN weight in a crossbarinstance is computed by taking into account the synapticdevice parameters, viz. G MIN , G MAX and bit-precision.This gives us the ideal conductance matrix ( G ideal ). Finally,we consider the circuit-level and device-level non-idealitiespresent in a crossbar instance specified in TABLE 1, andconvert G ideal into G non − ideal using circuit laws ( Kirchoff’slaws and
Ohm’s law ) and linear algebraic operations [8]. Thiscompletes the mapping of the weights of the DNN onto thecrossbar instances.
Non-idealities inherent in crossbars have so far been pro-jected in a negative light since, they lead to degradationin clean accuracy when DNNs are mapped onto them.However, in this work, we show how the non-idealities (oran increased value of NF for a crossbar) lead to robustness ofDNNs against adversarial attacks. Thus, we observe loweradversarial loss ( AL ) with respect to the correspondingsoftware implementation of the DNNs. We argue that non-idealities intrinsically lead to defense via gradient obfuscationagainst adversarial perturbations since gradient propagation, asdiscussed in Section 3.1, is crucial to initiate an adversarial attack .Fig. 2 pictorially demonstrates the intuition behind cre-ation of an adversary in DNNs and how hardware non-idealities can cause gradient obfuscation. DNNs, being dis-criminative models, partition a very high-dimensional inputspace into different classes by learning appropriate decisionboundaries. The class-specific decision boundaries simplydivide the space into hyper-volumes. These hypervolumesconsist of the training data examples as well as large areasof unpopulated space that is arbitrary and untrained. Thedecision boundary during model training extrapolates tovast regions of unpopulated high-dimensional subspacebecause of linearity/generalization in the model behavior.
ADC ADC ADC ADCDACDACDAC
Write circuitWrite circuit Write circuit Write circuitWrite circuitWrite circuitWrite circuit
WWLRWL BL
Synaptic Device ...... ...... I I I N1 I I I I N V V V N G G G N1 G G G G G G G N2 G N3 G NN (a) WWLRWL BL ...... ......
Rdriver Rwire_row Rwire_colRsense V V V N Peripheral and parasitic resistances
WWL - Write WordlineRWL - Read WordlineBL - Bit-line
Processvariation I I I I N G G G N1 G G G G G G G N2 G N3 G NN I j = f(V i , G ij (V i ), R driver , R sense, R wire_row , R wire_col ) (b)Fig. 1. (a) An ideal crossbar array; (b) A typical non-ideal crossbar array structure with resistive circuit-level non-idealities Decision Boundary Class A Class B
Class CAdversarial perturbation for
DNN mapped on crossbarsAdversarial perturbation for software-based DNN ABC ABCDNN on software DNN mapped oncrossbars (a) (b)
Fig. 2. Pictorial depiction of creation of adversaries for software andhardware-based DNNs - (a) The data points (shown as ‘dots) encom-pass the data manifold in the high-dimensional subspace. The classifieris trained to separate the data into different classes or hypervolumesbased on which the decision boundary is formed. Adversaries are cre-ated by perturbing the data points into these empty regions or hypervol-umes and are thus misclassified; (b) The decision boundaries get shiftedowing to the crossbar-based non-idealities in hardware, resulting in theplacement of certain data points into a different hypervolume leadingto accuracy degradation. However, due to gradient obfuscation owingto crossbar non-idealities, many data points remain restricted in theiroriginal hypervolumes on perturbations. This results in better adversarialrobustness in hardware-based DNNs.
But this exposes the model to adversarial attacks [16]. Ad-versarial perturbations, essentially, can shift a data pointfrom its typical hypervolume region to another, leadingto high-confidence missclassification. This has been shownin Fig. 2(a) with black arrows for a DNN evaluated onsoftware.However, when a DNN is mapped onto crossbar arrays,the decision boundaries are shifted owing to the crossbar-based non-idealities, resulting in the placement of certaindata points into a different hypervolume (Fig. 2(b)). Thisleads to missclassifications and hence, degradation in the clean accuracy of the DNN. Also, on unleashing adversarialattacks on a DNN mapped on crossbars, the displacement ofa data point in the high-dimensional subspace gets alteredin a different direction. This has been marked in Fig. 2(b)using violet arrows which demarcate another direction w.r.t. the one demarcated using black arrows (for DNNs evalu-ated on software). Thus, instead of moving into a differ-ent hypervolume, the many perturbed data points remainrestricted in their original hypervolumes, thereby resultingin lower adversarial losses (ALs) for the DNN and greateradversarial robustness.
Quantifying the intuition in Fig. 2:
To support ourgradient obfuscation argument, let us consider a DNNmapped onto crossbars as f . The net perturbation addedto the input ( X ) in case of an adversarial attack is givenby ∆ = (cid:15) × sign ( ∇ x L ( θ, X, y true )) (refer to Section 3.1).Without loss of generality, we assume the loss function ( L )from the hardware mapped DNN to be a function of theoutput current emerging out of a crossbar array ( I out ), i.e. : L = f ( I out ) Since DNNs are sufficiently linear owing to the ReLU acti-vation functions being used, we can assume that: L ≈ I out In the ideal scenario of crossbars with no non-idealities, L ≈ Iout ideal which implies, ∆ ideal = (cid:15) × sign ( ∇ x ( Iout ideal )) (4)However, in the case of non-idealities being present incrossbar structures, Iout non − ideal = Iout ideal − γ where, γ denotes the deviation of the output current ofthe crossbar from its ideal value due to the inherent non-idealities. Hence, in the non-ideal scenario, we have: ∆ non − ideal = (cid:15) × sign ( ∇ x ( Iout ideal − γ )) (5) From equation (5), we find that there is a deviationin the adversarial perturbation from its ideal value owingto crossbar non-idealities. Hence, this explains the reasonbehind an altered displacement of data points in the high-dimensional subspace w.r.t. the direction of displacement incase of a DNN evaluated on software (Fig. 2(b)).In this work, we employ a framework in
PyTorch similarto
RxNN [8], to map DNNs onto a resistive crossbar arrayand investigate the cumulative impact of the circuit anddevice-level non-idealities (mentioned in TABLE 1) on therobustness of neural networks against adversarial inputs.
ETHODOLOGY
The methodology described in Fig. 3(a) is adopted to assessthe robustness of DNNs against adversarial inputs whenimplemented on hardware. The entire process is dividedinto two parts:
We employ benchmark datasets- CIFAR-10 and CIFAR-100to evaluate VGG8 and VGG16 networks, respectively. Thesenetworks are first trained on PyTorch with the appropriatetraining datasets. Subsequently, we obtain two kinds oftrained models as follows:1)
Model-1:
A standard model trained without addingany random noise to its activations.2)
Model-2:
A model trained with random noiseadded to all neuronal activation values. Such noiseenabled training has been used in past works [42]to mitigate the accuracy degradation observed frommapping DNNs onto crossbars. Essentially, addingrandom noise to the neuronal activations is a crudeand approximate way of modeling non-idealitiesduring the training process.
Attack-SW:
We launch FGSM and PGD attacks on thesoftware models by adding adversarial perturbations to theclean test inputs. We record the adversarial accuracies (AAs)and adversarial losses (ALs) in case of each attack.
Using a PyTorch-based framework, we layer-wise map thesoftware DNN weights separately for both Model-1 andModel-2 onto resistive crossbar instances of sizes- 16x16,32x32 and 64x64 respectively. We follow the procedure ofmapping the weights of the DNN onto crossbars as de-scribed in Section 3.2.We calculate the CAs for both
Model-1 and
Model-2 forthe crossbar mapped DNN, which are expected to be lesserthan the values obtained for software DNN. Thereafter, welaunch FGSM and PGD attacks on the mapped crossbar-based models in two modes termed as:1)
Attack-1:
The adversarial perturbations for each at-tack, FGSM & PGD, are created using the software-based DNN model’s loss function and then added tothe clean input that yields the adverdarial input. Thegenerated adversaries are then fed to the crossbarmapped DNN to monitor AL . TABLE 2Parameters and their values associated with a resistive crossbar array
Parameter Value
Rdriver 1 k Ω Rwire row 5 Ω Rwire col 10 Ω Rsense 1 k Ω R MIN k Ω R MAX k Ω Attack-2:
The adversarial inputs are generated foreach attack, FGSM & PGD, using the loss fromthe crossbar-based hardware models. As a result,we can expect the adversaries in this case will notbe as strong as
Attack-1 adversaries owing to thepresence of non-idealities that can interfere in theattack generation process.We finally record the adversarial accuracies (AAs) andadversarial losses (ALs) for all modes:
Attack-SW, Attack-1,Attack-2 . ESULTS AND D ISCUSSION
The parameters pertaining to the non-ideal resistive cross-bars for mapping the DNNs are listed in TABLE 2 andemployed for all the simulations unless stated otherwise.Also, device-level process variation shown in experimentsbelow has been modelled as a Gaussian variation in theresistances of the synaptic devices with σ/µ = 10% .Fig. 4 presents a comparison of clean accuracies of thetrained VGG8 and VGG16 networks (note,
Model-1 type)when evaluated on software and after mapping on non-ideal crossbars of various dimensions (excluding the device-level variations). It can be observed that the clean accuraciesdrop post-mapping on the crossbars which is a direct im-plication of the inherent non-idealities in a crossbar causingerrors in MVMs as discussed in Section 3.2. We also seethat accuracies drop more for larger sized crossbars. Inthe subsequent subsections, we discuss the implications onadversarial robustness by inducing
Attack-1 and
Attack-2 onthe crossbar-mapped models.
In Fig. 5, it can be observed that AL in case of an FGSMattack on the DNNs mapped onto crossbars of variousdimensions (16x16, 32x32, 64x64) are lesser than that of aDNN evaluated on software. For different attack strengthsquantified by (cid:15) values, the value of AL in case of Attack-SW is significantly greater than
Attack-1 or Attack-2 ( > − ).In other words, the hardware-based non-idealities that comeinto play when DNNs are mapped onto crossbars providerobustness against adversarial inputs.PGD attack, being a multi-step variant of FGSM attack,is much stronger and yields significantly higher adversariallosses in DNNs than FGSM attacks. Similar to the case ofFGSM attacks, non-idealities in crossbars provide robust-ness to the mapped DNNs against adversarial inputs asshown in Fig. 6. Training a fixed precisionVGG8 / VGG16 network onsoftware to generateModel-1 and Model-2 anddetermining CA FGSM and PGD attacks onthe software models(Attack-SW) and determineAA and AL for differentvalues of ϵ Mapping the weights ofthe trained DNNs on thecrossbars (16x16, 32x32,64x64) using a PyTorch-based framework Determining thevalue of AA andAL in each caseFGSM/PGD attack onmapped DNN models(Attack-1 & Attack-2) fordifferent values of ϵ Datasets :CIFAR-10 /CIFAR-100 Circuit-levelnon-idealities Evaluation to obtainCA after crossbarmappingPart-1 Part-2 Device-levelvariation profile (a)
Mapto C1 ...
G1 G2 G3
Mapto C2 Mapto C3Mapto C4
C1 C2 C3 non-ideal
Mapto C1 Mapto C2Mapto C4 Mapto C3
Mapping to nine3x3 crossbarinstances G1 G2 G3G ideal
C1 C2 C3G non-ideal
Circuit-level and Device-level non-idealitiesadded
G1 G2 G3
Merge all non-idealconductance matricesG3G2G1 ........ (b)Fig. 3. (a) Flow diagram explaining the methodology followed (CIFAR-10 dataset is used with a VGG8 network (highlighted in red) while, CIFAR-100dataset is used with a VGG16 network (highlighted in blue)); (b) Pictorial depiction of the steps associated with mapping of an 8x8 weight matrixinto crossbar instances of size 3x3 TABLE 3Table showing AL (%) for different values of (cid:15) in case of Attack-2 (PGD) on crossbar sizes of 16x16, 32x32 and 64x64 on Model-1 and Model-2 ofVGG8 network with CIFAR-10 dataset
Attack-2 (PGD) on Model-1 Attack-2 (PGD) on Model-2 (cid:15)
Cross16
Cross32
Cross64 C A ( % ) Software Cross16 Cross32 Cross64 -3.44% -4.08% -4.58% -13.93% -15.03%
VGG8 network VGG16 network
Fig. 4. Bar diagram showing CA of a VGG8 and VGG16 networks onsoftware and crossbars of sizes 16x16, 32x32 and 64x64
For both FGSM and PGD attacks, we find that ALs incase of
Attack-2 are lesser than
Attack-1 , indicating that themapped DNNs are more resilient to adversarial perturba-tions created using the crossbar-based hardware models than the software-based perturbations. Interestingly, we alsofind that larger crossbar sizes provide greater robustnessagainst adversarial attacks (characterized by lower valuesof AL for the same value of (cid:15) ) than the smaller ones. Thisis because larger crossbars involve greater number of par-asitic components (non-idealities), thereby imparting morerobustness. This has been shown in TABLE 3 where, thecrossbar size of 64x64 provides the best robustness amongthe other crossbar sizes.Fig. 7 shows the variation in the CAs of
Model-1 and
Model-2 for both baseline software DNN (VGG8 network)and when mapped onto crossbars. We find that on traininga DNN with random noise added to its activations (
Model-2 ), CAs are significantly lower than those for a normal DNN(
Model-1 ). As already discussed in the case of
Model-1 , weobserve similar results for both FGSM and PGD attackson
Model-2 shown in TABLE 4 and TABLE 5, all of whichaffirm that hardware-based non-idealities lead to reduc-tion in adversarial losses and improvement in adversarialrobustness( > − ). Furthermore, in case of Model-2 , we also find that larger crossbar sizes provide greaterrobustness against adversarial attacks than the smaller onesas indicated by TABLE 3, where AL for a particular valueof (cid:15) is the highest in case of a 16x16 crossbar, followed bya 32x32 crossbar and the least for a 64x64 crossbar. Fromthe values of AL presented in TABLE 3, the reader might bemisleaded into thinking that
Model-2 yields greater adver-sarial robustness than
Model-1 when mapped on crossbars. (a) (b) (c) A L ( % ) ϵ Cross16 - FGSM attackAttack-SW Attack-1 Attack-2 A L ( % ) ϵ Cross32 - FGSM attackAttack-SW Attack-1 Attack-2 A L ( % ) ϵ Cross64 - FGSM attackAttack-SW Attack-1 Attack-2
Fig. 5. A plot between AL and (cid:15) for Attack-SW, Attack-1 and Attack-2 (FGSM) on Model-1 (VGG8 network with CIFAR-10 dataset) for crossbar sizes- (a) 16x16; (b) 32x32; (c) 64x64 (a) (b) (c)0 A L ( % ) ϵ Cross16 - PGD attack
Attack-SW Attack-1 Attack-2 A L ( % ) ϵ Cross32 - PGD attack
Attack-SW Attack-1 Attack-2 A L ( % ) ϵ Cross64 - PGD attack
Attack-SW Attack-1 Attack-2
Fig. 6. A plot between AL and (cid:15) for Attack-SW, Attack-1 and Attack-2 (PGD) on Model-1 (VGG8 network with CIFAR-10 dataset) for crossbar sizes- (a) 16x16; (b) 32x32; (c) 64x64 TABLE 4Table showing AL (%) for different values of (cid:15) in case of Attack-SW, Attack-1 and Attack-2 (FGSM) on Model-2 (VGG8 network with CIFAR-10dataset) for crossbar sizes of 16x16 and 32x32 (cid:15)
Attack-SW
Attack-1
Attack-2
However, the fact is that the general trend of finding lowervalues of AL for Model-2 with respect to Model-1 for a given (cid:15) is because of significantly smaller CAs of
Model-2 whencompared with
Model-1 (Fig. 7), and not higher values ofAA.
Effect of R MIN on adversarial robustness:
The effectiveresistance of a crossbar structure is the parallel combinationof resistances along its rows and columns. A smaller valueof R MIN reduces the effective resistance of the crossbar andincreases the value of NF for the crossbar [9]. As we havealready argued that an increased value of NF improves theadversarial robustness of crossbars, so on decreasing R MIN to 10 k Ω (maintaining a constant R MAX /R MIN ratio of 10)we find that ALs (for a PGD attack) in case of smaller R MIN are lower than the corresponding ALs for a larger R MIN as shown in Fig. 8.However, we also observe that in case of smaller R MIN ,the DNN achieves greater robustness against
Attack-1 than
Attack-2 , contrary to what has been observed in Fig. 5 andFig. 6. This is because of larger adversarial perturbationscreated on hardware during Attack-2 with lower R MIN .Lowering R MIN causes larger values of output currents inthe crossbar arrays (due to smaller effective resistance of thecrossbars). To verify this, we employ a metric called
Distor-tion Coefficient (d) that quantifies the degree of distortion oftest images of the dataset over a batch during an adversarialattack. Mathematically, it is given as: d = (cid:80) i | N C − N A | N (6) TABLE 5Table showing AL (%) for different values of (cid:15) in case of Attack-SW, Attack-1 and Attack-2 (PGD) on Model-2 (VGG8 network with CIFAR-10dataset) for crossbar sizes of 16x16 and 32x32 (cid:15)
Attack-SW
Attack-1
Attack-2 CA on software
CA on
Cross16
CA on
Cross32
CA on
Cross64 C A ( % ) VGG8 network with CIFAR-10 dataset
Model-1 Model-2
Fig. 7. Comparison of CAs of VGG8 network (sofware-based as well ascrossbar-mapped) for
Model-1 and
Model-2 using CIFAR-10 dataset Attack-1 Attack-2 Attack-1 Attack-2
Rmin = 10 k Ω Rmin = 20 k Ω A L ( % )
32 x 32 Crossbar (Rmax / Rmin = 10) ϵ = 2/255 ϵ = 8/255 ϵ = 32/255
Fig. 8. Bar-diagram showing ALs in case of
Attack-1 and
Attack-2 (PGD) for a VGG8 network mapped on 32x32 crossbars using CIFAR-10 dataset for two different values of R MIN at constant R MAX /R MIN ratio where,
N C = normalized pixel value of clean image,
N A = normalized pixel value of adversarially perturbed image, i denotes the indices of a pixel in an image and N = totalnumber of pixels across an image.From TABLE 6, we observe that the distortion coefficient d over a batch of images for Attack-2 is greater than
Attack-1 . This verifies that
Attack-2 is stronger than
Attack-1 andhence, greater adversarial robustness is observed in case of
Attack-1 w.r.t
Attack-2 with lower R MIN . Effect of R MAX at constant R MIN on adversarialrobustness:
Fig. 9 shows results for PGD attack on a VGG8network mapped onto crossbars with constant R MIN of TABLE 6Table showing values of distortion coefficient over a batch (calculatedusing equation 6) and AL for PGD attack ( (cid:15) = 8 / ) on VGG8 networkmapped onto 32x32 crossbar with CIFAR-10 dataset. The value of R MIN = 10 k Ω and R MAX /R MIN = 10 for the crossbar
Type of Attack Distortion coefficient (d) AL (%)Attack-1
Attack-2 k Ω and R MAX /R MIN ratio increased by increasing thevalue of R MAX . We find that even increasing R MAX bya factor of 200 results in no added advantage in terms ofadversarial robustness for
Attack-1 or Attack-2 . Hence, wefind that there is a greater impact of R MIN on adversarialrobustness than R MAX . Effect of process variation on adversarial robustness:
Fig. 10 shows results for PGD attack on a VGG8 networkmapped onto crossbars by varying the σ/µ ratio, pertainingto synaptic device variation, from − . Similar to thecase of increasing the value of R MAX , we find no addedadvantage in terms of adversarial robustness for
Attack-1 or Attack-2 by increasing the Gaussian variation in the devicesof the crossbars.
Studying the combined effect of input pixel discretiza-tion and crossbar non-idealities:
In [16], the authors showthat input pixel discretization from 256 or 8-bit level to 4-bit, 2-bit improves adversarial resilience of software DNNs.Here, we unleash FGSM attack on the VGG8 networkmapped onto 32x32 crossbars with input image pixels ofthe CIFAR-10 test dataset discretized to 4-bits (or 16 levels)and 2-bits (4 levels). The results are shown in Fig. 11.Interestingly, we find that with pixel discretization, ALson crossbar mapped DNN for both
Attack-1 and
Attack-2 attain a fixed value and do not vary on increasing (cid:15) from0.1 to 0.3. This implies that input pixel discretization doesnot necessarily help in resiliency when attacking hardwaremapped DNNs. For lower values of (cid:15) , greater adversar-ial robustness is observed without pixel discretization. Athigher value of (cid:15) ( (cid:15) = 0 . ), the combined effect of 4-bitpixel discretization and crossbar non-idealities outperformsthe rest in terms of adversarial robustness. Furthermore, 2-bit pixel discretization not only reduces the clean accuracyto . but also imparts marginally lesser adversarialrobustness than 4-bit pixel discretization - < . for Attack-1 and < . for Attack-2 . The results shown in Fig. 12 are similar to those in thecase of the VGG8 network evaluated with CIFAR-10 dataset. % Rmax/Rmin ϵ = 2/255 (PGD attack) CA AL (Attack-1) AL (Attack-2) 6570
10 20 200 % Rmax/Rmin ϵ = 8/255 (PGD attack) CA AL (Attack-1) AL (Attack-2) 65758595 10 20 200 % Rmax/Rmin ϵ = 32/255 (PGD attack) CA AL (Attack-1) AL (Attack-2) (a) (b) (c)
Fig. 9. Bar-diagram showing CAs and ALs (for PGD-based
Attack-1 and
Attack-2 ) for a VGG8 network mapped on 32x32 crossbars using CIFAR-10dataset for different values of R MAX with - (a) (cid:15) = 2 / ; (b) (cid:15) = 8 / ; (c) (cid:15) = 32 / (a) (b) (c) % σ/μ (%) ϵ = 2/255 (PGD attack) CA AL (Attack-1) AL (Attack-2) % σ/μ (%) ϵ = 8/255 (PGD attack) CA AL (Attack-1) AL (Attack-2) % σ/μ (%) ϵ = 32/255 (PGD attack) CA AL (Attack-1) AL (Attack-2)
Fig. 10. Bar-diagram showing CAs and ALs (for PGD-based
Attack-1 and
Attack-2 ) for a VGG8 network mapped on 32x32 crossbars using CIFAR-10 dataset for different values of σ/µ (synaptic device variation) with - (a) (cid:15) = 2 / ; (b) (cid:15) = 8 / ; (c) (cid:15) = 32 / Crossbar-based non-idealities impart adversarial robustnessto the mapped VGG16 network ( > − ) againstboth FGSM and PGD-based attacks. However, with CIFAR-100 dataset, we clearly observe that DNN shows greateradversarial robustness against PGD attack in case of Attack-2w.r.t Attack-1 than what is observed with CIFAR-10 dataset.Quantitatively, there is ∼ greater robustness in case of Attack-2 w.r.t Attack-1 with CIFAR-100 dataset against ∼ with CIFAR-10 dataset, albeit the drop in clean accuracy forthe DNN mapped onto crossbars is higher in case of CIFAR-100 dataset (Fig. 4). Comparison with Related works:
We compare the per-formance of non-ideality-driven adversarial robustness incrossbars against state-of-the-art software-based adversar-ial techniques described in [15], [16]. Note, [15], [16] useefficiency driven transformations (that implicitly translateto hardware benefits) such as, quantization to improveresilience. In contrast, our work utilizes explicit hardwarevariations to improve robustness. We aim to compare therobustness obtained from implicit and explicit hardwaretechniques. We observe that for single-step FGSM attack ona VGG16 network mapped on 32x32 crossbars (note, Model-1 type DNN), adversarial robustness due to crossbar non-idealities,
Attack − results, outperforms all other tech- niques (Fig. 13(a)). For multi-step PGD attack, Attack − ranks second ((Fig. 13(b)). With respect to 4-bit (4b) pixeldiscretization of input data [16], non-idealities in crossbarsimpart ∼ greater adversarial robustness in case ofFGSM attack and ∼ greater adversarial robustness incase of PGD attack. On the other hand, in case of FGSMattack, crossbar-based non-idealities impart ∼ greateradversarial robustness than QUANOS [15], while for PGDattack, QUANOS outperforms by ∼ − . ONCLUSION
In this work, we perform a comprehensive analysis to showhow crossbar-based non-idealities can be harnessed for ad-versarial robustness. This work brings in a new standpointthat does not devalue the importance of non-idealities orparasitics present in crossbar systems. We develop a frame-work based on
PyTorch that maps state-of-the-art DNNs(VGG8 and VGG16 networks) onto resistive crossbar arraysand evaluates them with benchmark datasets (CIFAR-10and CIFAR-100). We show that circuit-level non-idealities( e.g. , interconnect parasitics) and synaptic device-level non-idealities intrinsically provide robustness to the mappedDNNs against adversarial attacks, such as FGSM and PGDattacks. This is reflected by lower accuracy degradations (a) (b) (c) Normal 4b 2b % Bit-quantization ϵ = 0.1 (FGSM attack) CA AL (Attack-1) AL (Attack-2)
Normal 4b 2b % Bit-quantization ϵ = 0.2 (FGSM attack) CA AL (Attack-1) AL (Attack-2) Normal 4b 2b % Bit-quantization ϵ = 0.3 (FGSM attack) CA AL (Attack-1) AL (Attack-2)
Fig. 11. Bar-diagram showing CAs and ALs (for FGSM-based
Attack-1 and
Attack-2 ) for a VGG8 network mapped on 32x32 crossbars usingCIFAR-10 dataset for different bit-discretizations of input pixels (4-bit and 2-bit) with - (a) (cid:15) = 0 . ; (b) (cid:15) = 0 . ; (c) (cid:15) = 0 . (a) (b) (c) (d) A L ( % ) ϵ Cross16 - FGSM attack
Attack-SW Attack-1 Attack-2 A L ( % ) ϵ Cross32 - FGSM attack
Attack-SW Attack-1 Attack-2 A L ( % ) ϵ Cross16 - PGD attack
Attack-SW Attack-1 Attack-2 A L ( % ) ϵ Cross32 - PGD attack
Attack-SW Attack-1 Attack-2
Fig. 12. (a)-(b) A plot between AL and (cid:15) for Attack-SW, Attack-1 and Attack-2 (FGSM) on Model-1 (VGG16 with CIFAR-100 dataset) for crossbarsizes 16x16 and 32x32 respectively; (c)-(d) A plot between AL and (cid:15) for Attack-SW, Attack-1 and Attack-2 (PGD) on Model-1 (VGG16 with CIFAR-100 dataset) for crossbar sizes 16x16 and 32x32 respectively during adversarial attacks in case of DNNs mapped oncrossbars than that of software-based DNNs ( > − ).We also find that larger crossbar sizes extend greater re-silience to the DNNs even against stronger PGD attacks.We investigate the influence of various crossbar param-eters on the adversarial robustness of the mapped DNNs.While large values of R MAX do not produce any appre-ciable effect on adversarial robustness, a smaller value of R MIN makes the network more adversarially robust. Fur-thermore, increasing the σ/µ ratio of the synaptic devicespertaining to process variation does not yield any significantbenefit in terms of adversarial robustness. We further com-pare the performance of our non-ideality driven approach toadversarial robustness in a 32x32 crossbar with other state-of-the-art software-based adversarial defense techniques onCIFAR-100 dataset. We find that our approach performssignificantly well in terms of reducing adversarial lossesduring FGSM or PGD attacks.In our present work, in order to substantiate our claim,we have taken into account a crossbar system that does notinclude selector devices (such as MOSFETs) being connectedin series with the resistive synaptic devices. In other words,we have not considered the impact of non-idealities pertain-ing to 1T-1R crossbar system that are non-linear in nature.Thus, in our future work we shall extend our analysisto 1T-1R memresistive crossbar arrays by employing anarchitecture similar to
GENIE-x [9] that accounts for bothdata-dependent and data-independent non-idealities while modeling the crossbar instances. Finally, our comprehensiveanalysis and encouraging results establish the idea of re-thinking analog crossbar computing for adversarial securityin addition to energy efficiency. A CKNOWLEDGEMENT
This work was supported in part by the National ScienceFoundation (Grant R EFERENCES [1] Catherine D. Schuman et al.
A Survey of Neuromorphic Computingand Neural Networks in Hardware . 2017. arXiv:1705.06963 [cs.NE] .[2] H. -. P. Wong et al. MetalOxide RRAM. In:
Proceedings of theIEEE (2012), pp. 221222.[4] Wei-Hao Chen et al. Circuit design for beyond von Neumannapplications using emerging memory: From nonvolatile logicsto neuromorphic computing. In: (2017), pp. 2328.[5] A. Sengupta, Y. Shim, and K. Roy. Proposal for an All-Spin Artifi-cial Neural Network: Emulating Neural and Synaptic Functionali-ties Through Domain Wall Motion in Ferromagnets. In:
IEEE Trans-actions on Biomedical Circuits and Systems
IEEE Transactions on Nanotechnology A L ( % ) ϵ Comparison with other methods for FGSM attack
Attack-SW Attack-1 on Cross32Input pixel discretization (4b) QUANOS (a) A L ( % ) ϵ Comparison with other methods for
PGD attack
Attack-SW Attack-1 on Cross32
Input pixel discretization (4b) QUANOS (b)Fig. 13. (a) Comparison of our proposed method with other state-of-the-art adversarial defenses during FGSM attack using VGG16 network andCIFAR-100 dataset; (b) Comparison of our proposed method with otherstate-of-the-art adversarial defenses during PGD attack using VGG16network and CIFAR-100 dataset [7] Aayush Ankit et al.
PUMA: A Programmable Ultra-efficientMemristor-based Accelerator for Machine Learning Inference .2019.arXiv:1901.10351 [cs.ET] .[8] Shubham Jain et al.
RxNN: A Framework for Evaluating Deep NeuralNetworks on Resistive Crossbars . 2018. arXiv:1809.00072 [cs.ET] .[9] Indranil Chakraborty et al.
GENIEx: A Generalized Approach to Emu-lating Non-Ideality in Memristive Xbars using Neural Networks . 2020.arXiv:2003.06902 [cs.ET] .[10] Shubham Jain and Anand Raghunathan. CxDNN: Hardware-Software Compensation Methods for Deep Neural Networkson Resistive Crossbar Systems. In:
ACM Trans. Embed. Comput.Syst.
X-CHANGR:Changing Memristive Crossbar Mapping for Mitigating Line-ResistanceInduced Accuracy Degradation in DeepNeural Networks . 2019.arXiv:1907.00285 [cs.ET] .[12] I. Chakraborty, D. Roy, and K. Roy. Technology Aware Train-ing in Memristive Neuromorphic Systems for Nonideal SynapticCrossbars.In:
IEEE Transactions on Emerging Topics in ComputationalIntelligence
Towards Deep Learning Models Resistant toAdversarial Attacks . 2017. arXiv:1706.06083 [stat.ML] .[14] Nicholas Carlini et al.
On Evaluating Adversarial Robustness . 2019.arXiv:1902.06705 [cs.LG] .[15] Priyadarshini Panda.
QUANOS- Adversarial Noise Sensitivity DrivenHybrid Quantization of Neural Networks . 2020. arXiv:2004.11233 [cs.LG] . [16] Priyadarshini Panda, Indranil Chakraborty, and KaushikRoy.Discretization Based Solutions for Secure Machine Learn-ing Against Adversarial Attacks. In:
IEEE Access
Defensive Quantization:WhenEfficiency Meets Robustness . 2019. arXiv:1904.08444 [cs.LG] .[18] Adam Paszke et al. Automatic differentiation in PyTorch. In:
NIPS-W . 2017.[19] Karen Simonyan and Andrew Zisserman.
Very Deep ConvolutionalNetworks for Large-Scale Image Recognition . 2014.arXiv:1409.1556 [cs.CV] .[20] Alex Krizhevsky.
Learning multiple layers of features from tiny images .Tech. rep. 2009.[21] Beiye Liu et al. Reduction and IR-drop compensations techniquesfor reliable neuromorphic computing systems. In:
IEEE/ACM In-ternational Conference on Computer-Aided Design (ICCAD) (2014), pp.6370.[22] Aayush Ankit et al. Resparc: A reconfigurable and energy-efficientarchitecture with memristive crossbars for deep spiking neuralnetworks. In:
Proceedings of the 54th Annual DesignAutomation Con-ference 2017 . 2017, pp. 16.[23] Ali Shafiee et al. ISAAC: A convolutional neural network acceler-ator with in-situ analog arithmetic in crossbars. In:
ACMSIGARCHComputer Architecture News
ACM SIGARCH Computer Architecture News
Explaining and Harnessing Adversarial Examples .2014.arXiv:1412.6572 [stat.ML] .[26] Alexey Kurakin, Ian Goodfellow, and Samy Bengio.
AdversarialMachine Learning at Scale . 2016. arXiv:1611.01236 [cs.CV] .[27] Harini Kannan, Alexey Kurakin, and Ian Goodfellow.
AdversarialLogit Pairing . 2018. arXiv:1803.06373 [cs.LG] .[28] Hyeungill Lee, Sungyeob Han, and Jungwoo Lee.
Generative Ad-versarial Trainer: Defense to Adversarial Perturbations with GAN .2017.arXiv:1705.03387 [cs.LG] .[29] Cihang Xie et al.
Mitigating Adversarial Effects Through Randomiza-tion . 2017. arXiv:1711.01991 [cs.CV] .[30] Xuanqing Liu et al.
Towards Robust Neural Networks via Random Self-ensemble . 2017. arXiv:1712.00673 [cs.LG] .[31] Guneet S. Dhillon et al.
Stochastic Activation Pruning for RobustAdversarial Defense . 2018. arXiv:1803.01442 [cs.LG] .[32] Weilin Xu, David Evans, and Yanjun Qi. Feature Squeez-ing:Detecting Adversarial Examples in Deep Neural Networks.In:
Proceedings 2018 Network and Distributed System Security Sym-posium (2018).[33] Pouya Samangouei, Maya Kabkab, and Rama Chellappa.
Defense-GAN: Protecting Classifiers Against Adversarial Attacks Using Gener-ative Models . 2018. arXiv:1805.06605 [cs.CV] .[34] Dongyu Meng and Hao Chen.
MagNet: a Two-Pronged Defenseagainst Adversarial Examples . 2017. arXiv:1705.09064 [cs.CR] .[35] Fangzhou Liao et al.
Defense against Adversarial Attacks UsingHigh-Level Representation Guided Denoiser . 2017. arXiv:1712.02976 [cs.CV] .[36] Kui Ren et al. Adversarial Attacks and Defenses in Deep Learn-ing.In:
Engineering
CertifiedDefenses against Adversarial Examples . 2018.arXiv:1801.09344 [cs.LG] .[38] Aman Sinha et al.
Certifying Some Distributional Robustness withPrincipled Adversarial Training . 2017. arXiv:1710.10571 [stat.ML] .[39] Yiwen Guo et al.
Sparse DNNs with Improved Adversarial Robustness .2018. arXiv:1810.09619 [cs.LG] .[40] Xuanqing Liu et al.
Adv-BNN: Improved Adversarial Defense throughRobust Bayesian Neural Network . 2018. arXiv:1810.01279 [cs.LG] .[41] Nicolas Papernot et al.
Practical Black-Box Attacks against MachineLearning . 2016. arXiv:1602.02697 [cs.CR][cs.CR]