[PDF] Fault Sneaking Attack: a Stealthy Framework for Misleading Deep Neural Networks

Abstract

Despite the great achievements of deep neural networks (DNNs), the vulnerability of state-of-the-art DNNs raises security concerns of DNNs in many application domains requiring high reliability.We propose the fault sneaking attack on DNNs, where the adversary aims to misclassify certain input images into any target labels by modifying the DNN parameters. We apply ADMM (alternating direction method of multipliers) for solving the optimization problem of the fault sneaking attack with two constraints: 1) the classification of the other images should be unchanged and 2) the parameter modifications should be minimized. Specifically, the first constraint requires us not only to inject designated faults (misclassifications), but also to hide the faults for stealthy or sneaking considerations by maintaining model accuracy. The second constraint requires us to minimize the parameter modifications (using L0 norm to measure the number of modifications and L2 norm to measure the magnitude of modifications). Comprehensive experimental evaluation demonstrates that the proposed framework can inject multiple sneaking faults without losing the overall test accuracy performance.

Full PDF

FFault Sneaking Attack: a Stealthy Framework for MisleadingDeep Neural Networks

Pu Zhao, Siyue Wang, Cheng Gongye, Yanzhi Wang, Yunsi Fei, Xue Lin

Northeastern University, Boston, Massachusetts { zhao.pu, wang.siy, gongye.c } @husky.neu.edu, [email protected],[email protected], [email protected] ABSTRACT

Despite the great achievements of deep neural networks (DNNs),the vulnerability of state-of-the-art DNNs raises security concernsof DNNs in many application domains requiring high reliability. Wepropose the fault sneaking attack on DNNs, where the adversaryaims to misclassify certain input images into any target labels bymodifying the DNN parameters. We apply ADMM (alternating di-rection method of multipliers) for solving the optimization problemof the fault sneaking attack with two constraints: 1) the classifica-tion of the other images should be unchanged and 2) the parametermodifications should be minimized. Specifically, the first constraintrequires us not only to inject designated faults (misclassifications),but also to hide the faults for stealthy or sneaking considerations bymaintaining model accuracy. The second constraint requires us tominimize the parameter modifications (using (cid:96) norm to measurethe number of modifications and (cid:96) norm to measure the magnitudeof modifications). Comprehensive experimental evaluation demon-strates that the proposed framework can inject multiple sneakingfaults without losing the overall test accuracy performance. CCS CONCEPTS • Security and privacy → Domain-specific security and pri-vacy architectures;

Network security; • Networks → Network per-formance analysis; • Theory of computation → Theory and algo-rithms for application domains;

KEYWORDS

Deep neural networks, Fault injection, ADMM

ACM Reference format:

Pu Zhao, Siyue Wang, Cheng Gongye, Yanzhi Wang, Yunsi Fei, Xue Lin.2019. Fault Sneaking Attack: a Stealthy Framework for Misleading DeepNeural Networks. In

Proceedings of The 56th Annual Design AutomationConference 2019, Las Vegas, NV, USA, June 2–6, 2019 (DAC ’19),

Modern technologies based on pattern recognition, machine learn-ing, and specifically deep learning, have achieved significant break-throughs [1] in a variety of application domains. Deep neural net-work (DNN) has become a fundamental element and a core enablerin the ubiquitous artificial intelligence techniques. However, de-spite the impressive performance, many recent studies demonstrate

Permission to make digital or hard copies of all or part of this work for personal orclassroom use is granted without fee provided that copies are not made or distributedfor profit or commercial advantage and that copies bear this notice and the full citationon the first page. Copyrights for components of this work owned by others than ACMmust be honored. Abstracting with credit is permitted. To copy otherwise, or republish,to post on servers or to redistribute to lists, requires prior specific permission and/or afee. Request permissions from [email protected].

DAC ’19, Las Vegas, NV, USA © 2019 ACM. 978-1-4503-6725-7/19/06...$15.00DOI: 10.1145/3316781.3317825 that state-of-the-art DNNs are vulnerable to adversarial attacks[2, 3]. This raises concerns of the DNN robustness in many appli-cations with high reliability and dependability requirements suchas face recognition, autonomous driving, and malware detection[4, 5].After the exploration of adversarial attacks in image classifi-cation and objection detection from 2014, the vulnerability androbustness of DNNs have attracted ever-increasing attentions andefforts in the research field known as adversarial machine learning .Since then, a large amount of efforts have been devoted to: 1) de-sign of adversarial attacks against machine learning tasks [6–8];2) security evaluation methodologies to systematically estimatethe DNN robustness [9, 10]; and 3) defense mechanisms under theattacks [11–13]. This paper falls into the first category.The adversarial attacks can be classified into: 1) evasion attacks[6–8] that perturb input images at test time to fool DNN classifica-tions; 2) poisoning attacks [14, 15] that manipulate training datasets to obtain illy-trained DNN models; and 3) fault injection attacks[16, 17] that change classifications of certain input images to thetarget labels by modifying DNN parameters. The general purposeof an adversarial attack no matter its category is to have misclassi-fications of certain images, while maintaining high model accuracyfor the other images. This work proposes the fault sneaking attack,a new method of the fault injection attack.Fault injection attack perturbs the DNN parameter space. AsDNNs are usually implemented and deployed on various hardwareplatforms including CPUs/GPUs and dedicated accelerators, it ispossible to perturb the DNN parameters stored in memory enabledby the development of memory fault injection techniques such aslaser beam [18] and row hammer [19]. To be practical, we proposethe fault sneaking attack to perturb the DNN parameters withconsiderations of attack implementation in the hardware.It is a more challenging task to perturb the parameters (as faultinjection attack) than to perturb the input images (as evasion at-tack) due to the following two reasons: 1) global effect: perturbingone input would not influence the classifications of other unper-turbed inputs while perturbing the parameters has a global effectfor all inputs; 2) numerous parameters: the DNNs usually have amuch greater number of parameters than the pixel number of aninput image. The fault injection attack should be stealthy in thatmisclassifications are only for certain images while maintaininghigh model accuracy for the other images, and therefore cannot beeasily detected. And it should also be efficient in that the parametermodifications should be as small as possible, and therefore canbe implemented easily in the hardware. This work tackles thesechallenges by proposing the fault sneaking attack based on ADMM(alternating direction method of multipliers).The theoretical contributions of this work are:+

Stealthy injection of multiple faults:

The proposed faultsneaking attack based on ADMM enables to achieve multiple desig-nated faults (misclassifications) with the flexibility to specify any a r X i v : . [ c s . L G ] M a y arget labels and the stealthiness to hide the faults. The fault injec-tion attack [16] can only inject one fault.+ A systematic application of ADMM with analytical so-lutions:

Comparing with the heuristic [16], the proposed faultsneaking attack is an optimization based framework leveragingADMM with analytical solutions. Comparing with evasion attacks[8, 20], the proposed fault sneaking attack deals with a more chal-lenging problem with higher dimensionality, but surprisingly findsmuch less expensive analytical solutions.+

A general ADMM framework for both (cid:96) and (cid:96) norm min-imizations: The proposed ADMM based framework for solvingthe optimization problem of fault sneaking attack can adopt both (cid:96) norm (the number of parameter modifications) and (cid:96) norm (themagnitude of modifications) to measure the difference between orig-inal and modified DNN models with only minor changes. However,[16] cannot deal with the non-differential (cid:96) norm.The experimental contributions of this work are:+ Less model accuracy loss:

Under the same experimental set-tings and misclassification requirements, the proposed fault sneak-ing attack degrades the DNN model accuracy by only 0.8 percentfor MNIST and 1.0 percent for CIFAR, while [16] degrades the DNNmodel accuracy by 3.86 percent and 2.35 percent, respectively.+

Comprehensive analysis of DNN fault tolerance:

We ex-tensively test the capability of DNNs on tolerance of fault injectionattacks. We find that there is an upper limit on the number S ofimages with successful misclassifications depending on the DNNmodel itself. For the DNN models used in this work, the number S is around 10 demonstrating the tolerance for sneaking faults as 10. The adversarial attacks are reviewed from the aspects of perturbingthe inputs and perturbing the DNN parameters.

Evasion attacks generate adversarial examples to fool DNNs byperturbing the legitimate inputs. Basically, an adversarial exampleis produced by adding human-imperceptible distortions onto alegitimate image, such that the adversarial example will be classifiedby the DNN as a target (wrong) label. The norm-ball constrainedevasion attacks have been well studied, including the FGM [21] andIFGSM [22] attacks with (cid:96) ∞ norm restriction, the L-BFGS [3] andC&W [6] attacks minimizing the (cid:96) distortion, and the JSMA [23]and ADMM [24] attacks trying to perturb the minimum number ofpixels, namely, minimizing the (cid:96) distortion.Many defense works have been proposed, including defensivedistillation [25] , defensive dropout [26, 27] , and robust adversarialtraining. [13] The robust adversarial training method ensures strongdefense performance with high computation requirement. Poisoning attacks, which train DNNs by adding poisoned imagesinto the training data sets, and fault injection attacks, which modifythe DNN parameters directly, are attacks that perturb the DNNparameters. Poisoning attack [14] is computation-intensive as itrequires iterative retraining and is not the focus of our paper. Faultinjection attack [16] was first proposed by Liu et al, which uses aheuristic approach to profile the sink class for single bias attackscheme, and compresses the modification by iteratively enforcingthe smallest element as zero and feasibility check for gradient de-scent attack scheme. Different from [16], the fault sneaking attack uses a systematic optimization-based approach, achieving flexibledesignations of target labels and portion of DNN parameters tomodify, and enabling both the (cid:96) and (cid:96) (non-differential) norms inthe objective function. The common techniques flipping the logic values in memory in-clude laser beam and row hammer. Laser beam [28] can preciselychange any single bit in SRAM by carefully tuning the laser beamsuch as diameter and energy level [18]. Row hammer [19] can in-ject faults into DRAM by rapidly and repeatedly accessing a givenphysical memory location to flip corresponding bits [29]. Someworks demonstrate the feasibility of using row hammer on mo-bile platforms [30] and launching the row hammer to trigger theprocessor lockdown [31]. However, fine-tuning the laser beam orlocating the bits in memory can be time consuming [30]. Therefore,it is essential to minimize the number of modified parameters byour fault sneaking attack. Recently, [17] implements the DNN faultinjection attack [16] physically on embedded systems using laserbeam. In particular, [17] injects faults into the widely used activa-tion functions in DNNs and demonstrates the possibility to achievemisclassifications by injecting faults into the DNN hidden layer.

Threat Model:

We consider an adversary tampering with the DNNclassification results of certain input images into designated tar-get labels by modifying the DNN model parameters. In this paper,we assume white-box attack, i.e., the adversary has the completeknowledge of the DNN model (including both structures and pa-rameters) and low-level implementation details (how and whereDNN parameters are located in the memory), as the highest andmost stringent security standard to assess the robustness of DNNsystems under fault sneaking attack. Given existing fault injectiontechniques can precisely flip any bit of the data in memory, we as-sume the adversary can modify any parameter in DNN to any valuethat is in the valid range of the used arithmetic format. Note that,we do not assume the adversary knows the training and testingdata sets, which are usually not available to the system users.The adversary has two constraints when launching the faultsneaking attack: (i) Stealthy, in that the classification results of theother images should be kept as unchanged as possible; (ii) Efficient,in that the modifications of DNN parameters in terms of numberof modified parameters or magnitude of parameter modificationsshould be as small as possible. The first constraint is importantbecause even if the attack is specified for certain input images, itis highly possible to change the classification results of the otherimages when modifying the DNN parameters, thereby resulting inobviously low DNN model accuracy and easy detection of the attack.The second constraint minimizing the parameter modifications canreduce the influence and difficulty of implementing the attack.

Attack Model:

Given R images X = { x i | i = , · · · , R } withtheir correct labels L = { l i | i = , · · · , R } , we would like to changethe classification results of the first S ( S ≤ R ) images to their targetlabels T = { t i | i = , · · · , S } , while the classifications of the rest R − S images are unchanged, by modifying parameters in the DNNmodel. Note that the unchanged labels of the other R − S imagesare to make the attack stealthy and hard to detect.The original DNN model parameters are denoted as θ , and δ represents the parameter modifications. So the parameters after themodification are θ + δ . Note that θ has the flexibility of specifyingither all the DNN parameters or only a portion of the parameters,e.g., weight parameters of the specific layer(s). The fault sneakingattack can be formulated as an optimization problem:min δ D ( δ ) + G ( θ + δ , X , T , L) , (1)where D ( δ ) measures the DNN parameter modifications; and G ( θ + δ , X , T , L) represents the misclassification requirements, i.e., withthe modified DNN model parameters θ + δ , the first S images inset X will be classified as target labels T , while the classificationsof the rest R − S images are kept unchanged. The details of the D and G functions are to be explained in the following sections. D ( δ ) represents the measurement of the parameter modifications,which should be minimized for the attack implementation efficiency.In this paper, (cid:96) and (cid:96) norms are used as D ( δ ) as follows, D ( δ ) = (cid:107) δ (cid:107) or D ( δ ) = (cid:107) δ (cid:107) . (2)The (cid:96) norm of δ measures the number of nonzero elements in δ and therefore measures the number of modified parameters bythe attack. Minimizing (cid:96) norm can make it easier to implementthe attack in DNN systems, considering that the difficulty of pa-rameter modifications in real systems relates to the number ofmodified parameters [19]. The (cid:96) norm of δ denotes the standardEuclidean distance between the modified and original parameters,and therefore measures the magnitude of parameter modifications.Minimizing (cid:96) norm can lead to minimal influence of the attack.Minimizing the (cid:96) norm in the objective function is much harderthan minimizing the (cid:96) norm, because the (cid:96) norm is non-differential.In this paper, the proposed ADMM framework enables both (cid:96) and (cid:96) norms in the objective function with only minor differences inthe solution methods as specified in Sec. 4. In (1), G ( θ + δ , X , T , L) denotes the misclassification requirements:1) the first S images X = { x i | i = , · · · , S } should be classifiedas the target labels T instead of their correct labels, and 2) theclassifications of the rest R − S images X = { x i | i = S + , · · · , R } should remain unchanged as their correct labels.In the area of adversarial machine learning, the most effectiveobjective function to specify that an input x should be labeled as t is the following д function [6]: д ( θ + δ , t ) = max (cid:18) max j (cid:44) t ( Z ( θ + δ , x ) j ) − Z ( θ + δ , x ) t , (cid:19) (3)where Z ( θ + δ , x ) j denotes the j -th element of the logits, i.e., theinput to the softmax layer. The softmax layer is the last layer inthe DNN model, which takes logits as input and generates thefinal probability distribution outputs. The final outputs from thesoftmax layer are not utilized in the above д function, because thefinal outputs are usually dominated by the most significant classin a well trained model and thus less effective during computation.The DNN chooses the label with the largest logit, that is, j ∗ = arg max j Z ( θ + δ , x ) j . To enforce the input x is classified as label t ,the logit of label t , Z ( θ + δ , x ) t , must be larger than all of the otherlogits, max j (cid:44) t ( Z ( θ + δ , x ) j ) . Thus, д ( θ + δ , t ) will achieve its minimalvalue if x is classified as label t .From the above analysis, we propose the detailed form of G as: G ( θ + δ , X , T , L) = G ( θ + δ , X , T ) + G ( θ + δ , X , L) , (4) where G stands for the targeted misclassifications of X and G denotes keeping classifications of X unchanged. G and G are: G ( θ + δ , X , T ) = S (cid:213) i = c i · max (cid:18) max j (cid:44) t i ( Z ( θ + δ , x i ) j ) − Z ( θ + δ , x i ) t i , (cid:19) , (5) G ( θ + δ , X , L) = R (cid:213) i = S + c i · max (cid:18) max j (cid:44) l i ( Z ( θ + δ , x i ) j ) − Z ( θ + δ , x i ) l i , (cid:19) . (6)The c i ’s represent their relative importance to the measurement ofmodifications D ( δ ) . t i represents the target label for the i -th imagein the S images. G achieves its minimum value, when the labels ofthe first S images are changed to their target labels T . Similarly, G obtains its minimum value when the classifications of the rest R − S images are kept unchanged. We propose a solution framework based on ADMM to solve (1) forthe fault sneaking attack. The framework is general in that it candeal with both (cid:96) and (cid:96) norms as D ( δ ) . ADMM was first introducedin the mid-1970s with roots in the 1950s and becomes popularrecently for large scale statistics and machine learning problems[32]. ADMM solves the problems in the form of a decomposition-alternating procedure, where the global problem is split into localsubproblems first, and then the solutions to small local subproblemsare coordinated to find a solution to the large global problem. It hasbeen proved in [33] that ADMM has at least the linear convergencerate, and it empirically converges in a few tens of iterations. As ADMM requires multiple variables for reducing the objectivefunction in alternating directions, we introduce a new auxiliaryvariable z and (1) can now be reformulated as,min δ , z D ( z ) + G ( θ + δ , X , T , L) , s . t . z = δ . (7)The augmented Lagrangian function of the above problem is: L ρ ( δ , z , u ) = D ( z ) + G ( θ + δ , X , T , L) + u T ( z − δ ) + ρ (cid:107) z − δ (cid:107) . (8) Applying the scaled form of ADMM by defining u = ρ s , we obtain L ρ ( δ , z , s ) = D ( z ) + G ( θ + δ , X , T , L) + ρ (cid:107) z − δ + s (cid:107) − ρ (cid:107) s (cid:107) . (9) ADMM optimizes problem (9) in iterations. Specifically, in the k -thiteration, the following steps are performed: z k + = arg min z L ρ ( δ k , z , s k ) , (10) δ k + = arg min δ L ρ ( δ , z k + , s k ) , (11) s k + = s k + z k + − δ k + . (12)As demonstrated above, problem (9) is split into two subproblems,(10) and (11) through ADMM. In (10), the optimal solution z k + is obtained by minimizing the augmented Lagrangian function L ρ ( δ k , z , s k ) with fixed δ k and s k . Similarly, (11) finds the optimal δ k + to minimize L ρ ( δ , z k + , s k ) with fixed z k + and s k . In (12),e update s k + with z k + and δ k + . We can observe that ADMMupdates the two arguments in an alternating fashion, where comesfrom the term alternating direction .In the ADMM iterations, problems (10) and (11) are detailed as:min z D ( z ) + ρ (cid:13)(cid:13)(cid:13) z − δ k + s k (cid:13)(cid:13)(cid:13) , (13)min δ G ( θ + δ , X , T , L) + ρ (cid:13)(cid:13)(cid:13) z k + − δ + s k (cid:13)(cid:13)(cid:13) . (14)The solutions to the two problems are specified as follows. In this step, we mainly solve (13). The specific closed-form solutiondepends on the D function ( (cid:96) or (cid:96) norm). (cid:96) norm. If the D function takes the (cid:96) norm,(13) has the following form:min z (cid:107) z (cid:107) + ρ (cid:13)(cid:13)(cid:13) z − δ k + s k (cid:13)(cid:13)(cid:13) . (15)The solution can be obtained elementwise [34] as z k + i = (cid:40) (cid:16) δ k − s k (cid:17) i , if (cid:16) δ k − s k (cid:17) i > ρ , otherwise . (16) (cid:96) norm. If the D function takes the (cid:96) norm,(13) has the following form:min z (cid:107) z (cid:107) + ρ (cid:13)(cid:13)(cid:13) z − δ k + s k (cid:13)(cid:13)(cid:13) . (17)By ‘block soft thresholding’ operator [34], the solution is given by z k + =  (cid:18) − ρ (cid:107) δ k − s k (cid:107) (cid:19) (cid:16) δ k − s k (cid:17) if (cid:13)(cid:13)(cid:13) δ k − s k (cid:13)(cid:13)(cid:13) ≥ ρ (cid:13)(cid:13)(cid:13) δ k − s k (cid:13)(cid:13)(cid:13) < ρ . (18) δ step In this step, we mainly solve (14). It can be rewritten asmin δ R (cid:213) i = д i ( θ + δ , x i ) + ρ (cid:13)(cid:13)(cid:13) z k + − δ + s k (cid:13)(cid:13)(cid:13) , (19)where д i ( θ + δ , x i ) =  c i · max (cid:18) max j (cid:44) t i (cid:16) Z ( θ + δ , x i ) j (cid:17) − Z ( θ + δ , x i ) t i , (cid:19) , if i ∈ [ , S ] ; c i · max (cid:18) max j (cid:44) l i (cid:16) Z ( θ + δ , x i ) j (cid:17) − Z ( θ + δ , x i ) l i , (cid:19) , if i ∈ [ S + , R ] . (20)The д i function takes different forms according to the i value. If i ∈ [ , S ] , д i obtains its minimum value when the classification of x i is changed to the target label t i . If i ∈ [ S + , R ] , д i achieves itsminimum when the classification is kept as the original label l i .Motivated by the linearized ADMM [35, 36, Sec. 2.2], we replacethe function д i with its first-order Taylor expansion plus a regular-ization term (known as Bregman divergence), ∇ д i ( θ + δ k , x i , l i )( δ − Table 1: (cid:96) norm of DNN parameter modifications (i.e.,the number of modified parameters) in different fully con-nected layers for MNIST. Total Parameters (cid:96) normS=1,R=1 S=4,R=4 S=16,R=16The first FC layer 205000 14016 40649 120597The second FC layer 40200 5390 14086 34069The last FC layer 2010 222 682 1755 Table 2: (cid:96) norm and attack success rate when modifying dif-ferent types of parameters in the last fully connected layerfor MNIST. S=1, R=1 S=2, R=2 S=4, R=4 S=8, R=8 (cid:96) norm for weight params. 236 458 715 1644Success rate for weight params. 100% 100% 100% 100% (cid:96) norm for bias params. 2 4 - * - * Success rate for bias params. 100% 100% 0% 0% * There is no need to show the (cid:96) norm if it can not succeed. δ k ) + (cid:13)(cid:13)(cid:13) δ − δ k (cid:13)(cid:13)(cid:13) H , where H is a pre-defined positive definite ma-trix , and (cid:107) x (cid:107) H = x T Hx . (14) can then be reformulated as:min δ (cid:32) R (cid:213) i = ∇ д i ( θ + δ k , x i ) (cid:33) ( δ − δ k ) + R (cid:13)(cid:13)(cid:13) δ − δ k (cid:13)(cid:13)(cid:13) H + ρ (cid:13)(cid:13)(cid:13) z k + − δ + s k (cid:13)(cid:13)(cid:13) . (21)Letting H = α I , the solution can be obtained through δ k + = αR + ρ (cid:32) ρ (cid:16) z k + + s k (cid:17) + αR δ k − (cid:32) R (cid:213) i = ∇ д i ( θ + δ k , x i ) (cid:33)(cid:33) . (22) We demonstrate the experimental results of the proposed faultsneaking attack on two image classification datasets, MNIST [37]and CIFAR-10 [38]. We train two networks for MNIST and CIFAR-10 datasets, respectively, sharing the same network architecturewith four convolutional layers, two max pooling layers, two fullyconnected layers and one softmax layer. They achieve 99.5% ac-curacy on MNIST and 79.5% accuracy on CIFAR-10, respectively,which are comparable to the state-of-the-arts. The experiments areconducted on machines with NVIDIA GTX 1080 TI GPUs.

The DNN model used has three fully connected (FC) layers. Wemodify the parameters in different FC layers. We show the (cid:96) norm (i.e., the number of parameter modifications) achieved by thefault sneaking attack when we modify each FC layer in Table 1.We observe that more parameters are needed to be modified withincreasing S and R . Besides, changing the last FC layer requiresfewer parameter modifications compared with the first or second FClayer. The reason is that the last FC layer has more direct influenceon the output, leading to smaller number of modifications by thefault sneaking attack. Therefore, in the following experiments, wefocus on modifying only the last FC layer parameters.Next we determine the type of parameters to modify that is moreeffective to implement the fault sneaking attack. In the FC layer, igure 1: (cid:96) norm of DNN parameter modifications in thelast fully connected layer for MNIST.Figure 2: (cid:96) norm of DNN parameter modifications in thelast fully connected layer for CIFAR-10. the output depends on the weights W and the biases b , that is, FC ( x (cid:48) ) = W x (cid:48) + b , where x (cid:48) is the input of the layer. As we cansee, the bias parameters are more directly related to the output thanthe weight parameters. We show the (cid:96) norm and the attack successrate if we only modify the weight parameters or the bias parametersin the last FC layer in Table 2. As the bias parameters are moredirectly related to the output, it usually needs to change fewer biasparameters to achieve the same attack objective. However, onlychanging bias parameters has very limited capability which canonly lead to the misclassification of 1 or 2 images. As observedfrom Table 2, changing the classification of 4 or more images wouldbe beyond the capability of modifying bias parameters only. Thisdemonstrates the limitation of the single bias attack (SBA) schemein [16], which only modifies the bias to misclassify only one image.Also we find that SBA can not be extended to solve the case ofmultiple images with multiple target labels. Considering thelimitation of only modifying bias parameters, we choose to perturbboth the weight and bias parameters in the following experiments. (cid:96) Norm of Parameter Modifications

We demonstrate the number of parameter modifications, i.e., the (cid:96) norm, by the fault sneaking attack in this section. As observed fromFig. 1 and 2, for the same R , the (cid:96) norm of parameter modificationskeeps increasing as S increases since more parameters need to bemodified to change the classifications of more images into theirtarget labels. We have an interesting finding that when S is in therange of { , , } , the (cid:96) norm tends to be smaller as R increasesfrom 200 to 1000 for MNIST. The reason is that larger R means thelabels of more images ( R − S ) need to be kept unchanged, then themodified model should be more similar to the original model andtherefore fewer modifications are required. Table 3: (cid:96) and (cid:96) norms of DNN parameter modifications inthe last fully connected layer for the (cid:96) and (cid:96) based attacksfor MNIST. S=1, R=10 S=5, R=10 S=5, R=20 (cid:96) norm (cid:96) norm (cid:96) norm (cid:96) norm (cid:96) norm (cid:96) norm (cid:96) attack 1026 863 1208 804 1606 498 (cid:96) attack 1431 393 1432 344 1964 226 Table 4: Test accuracy after DNN parameter modificationsfor MNIST and CIFAR.

Dataset Test Acc. S=1 S=2 S=4 S=8 S=16MNIST R=50 85.2% 73.1% 64.7% 37.4% 29.7%R=100 96.9% 86.6% 81.3% 76.1% 65.2%R=200 96.7% 96.1% 95.4% 93.2% 92.6%R=500 98.6% 98.5% 97.8% 96.9% 95.9%R=1000 98.7% 97.9% 98.1% 96.8% 96.9%CIFAR R=50 57.7% 52.9% 44.9% 26.2% 18.3%R=100 67.5% 68.7% 55.8% 42.5% 31.5%R=200 72.3% 67.6% 69.6% 57.2% 35.4%R=500 78.5% 77.4% 76.2% 74.5% 73.2%R=1000 78.5% 78.2% 77.5% 77.9% 76.4%We also notice that this phenomenon disappears when S is largerthan 8 for MNIST or for CIFAR-10. Considering the 99.5% and 79.5%accuracy on MNIST and CIFAR-10, we believe the disappearance isrelated to the DNN model capability. When S is small on MNIST,the DNN model is able to hide a small number of misclassificationsby modifying only a few parameters of the last FC layer. However,when S is relatively large, it is not that easy to hide so many mis-classifications and the fault sneaking attack has to perturb almostall parameters in the last FC layer without extra ability to spare.The reason for CIFAR-10 is similar since the capability of the modelfor CIFAR-10 is limited, with only 79.5% accuracy. (cid:96) and (cid:96) based Attacks In problem (10), the (cid:96) or (cid:96) norm can be minimized, leading tothe corresponding (cid:96) or (cid:96) based fault sneaking attacks. Table 3compares the (cid:96) and (cid:96) norms of the (cid:96) and (cid:96) based attacks forvarious configurations. As seen from Table 3, the (cid:96) based attackachieves smaller (cid:96) norm than the (cid:96) based attack with larger (cid:96) norm, due to the reason that the (cid:96) based attack tries to minimizethe Euclidean distance between the perturbed and original modelwithout considering the number of parameter modifications. As the fault sneaking attack perturbs the DNN parameters to satisfyspecific attack requirements, it is important to measure the influ-ence of the attack beyond the required objective. In the problemformulation, we try to reduce the influence of fault sneaking attackby enforcing the rest R − S images to have unchanged classifications.In Table 4, we show the test accuracy on the whole testing datasetsfor MNIST and CIFAR-10 after perturbing the model.The test accuracy of the original model is 99.5% for MNIST and79.5% for CIFAR. As observed from Table 4, with fixed R , the testaccuracy on the modified model decreases as S increases. Thisdemonstrates that as a nature outcome, changing parameters tomisclassify certain images may downgrade the overall accuracyperformance of the model. In the case of S =

16 and R =

50, the igure 3: Fault sneaking attack success rate of the S imagesafter DNN parameter modifications for MNIST and CIFAR. test accuracy drops from 99.5% to 29.7% for MNIST and from 79.5%to 18.3% for CIFAR. However, we observe that as R increases, thetest accuracy keeps increasing for fixed S . It demonstrates thatkeeping the labels of the R − S images unchanged helps to stabilizethe model and reduce the influence of changing the labels of the S images. In the case of S =

16, if R is increased from 50 to 1000,the test accuracy on the 10,000 test images increases from 29.7%to 96.9% for MNIST and from 18.3% to 76.4% for CIFAR. The faultsneaking attack can achieve classification accuracy as high as 98.7%and 78.5% in the case of S = R = One objective of fault sneaking attack is to hide faults by perturbingthe DNN parameters. In the experiments, we found that in the caseof large S , not all of the S images are changed to their target labelssuccessfully. We define the success rate of the S images as thepercentage of images successfully changed their labels to the targetlabels within the S images. We show the success rate of the S imageswith various S and R configurations in Fig. 3. We observe that thesuccess rate keeps almost 100% if S is smaller than 10. When S islarger than 10, the success rate would drop as S increases. Besides,the number of successful injected faults in S is usually around 10for different configuration of S . This demonstrates a limitation ofchanging the classifications of certain images by modifying DNNparameters. The DNN model has a tolerance for the sneaking faults- 10 successful misclassifications by modifying the last FC layer. In this paper, we propose fault sneaking attack to mislead the DNNby modifying model parameters. The (cid:96) and (cid:96) norms are minimizedby the general framework with constraints to keep the classificationof other images unchanged. The experimental evaluations demon-strate that the ADMM based framework can implement the attacksstealthily and efficiently with negligible test accuracy loss. This work is supported by Air Force Research Laboratory FA8750-18-2-0058, and U.S. Office of Naval Research.

REFERENCES [1] Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” vol. 521, pp. 436–44, 052015.[2] I. Goodfellow, J. Shlens, and C. Szegedy, “Explaining and harnessing adversarialexamples,” , vol. arXiv preprint arXiv:1412.6572, 2015.[3] C. Szegedy, W. Zaremba, and et.al., “Intriguing properties of neural networks,” arXiv preprint arXiv:1312.6199 , 2013.[4] M. Sharif, S. Bhagavatula, and et. al., “Adversarial generative nets: Neural net-work attacks on state-of-the-art face recognition,”

CoRR , vol. abs/1801.00349,2018.[5] I. Evtimov, K. Eykholt, and et. al., “Robust physical-world attacks on machinelearning models,” arXiv preprint arXiv:1707.08945 , 2017.[6] N. Carlini and D. Wagner, “Towards evaluating the robustness of neural networks,”in .[7] P.-Y. Chen, Y. Sharma, and et. al., “Ead: elastic-net attacks to deep neural networksvia adversarial examples,” arXiv preprint arXiv:1709.04114 , 2017.[8] P. Zhao, S. Liu, Y. Wang, and X. Lin, “An admm-based universal framework foradversarial attacks on deep neural networks,” in

ACM Multimedia 2018 .[9] B. Biggio, G. Fumera, and F. Roli, “Security evaluation of pattern classifiers underattack,”

IEEE TKDE , vol. 26, no. 4, pp. 984–996, 2014.[10] H. Zhang, T.-W. Weng, and et. al., “Efficient Neural Network Robustness Certifi-cation with General Activation Functions,” in

NIPS 2018 .[11] S. R. Bul`o, B. Biggio, and et. al., “Randomized prediction games for adversarialmachine learning,”

IEEE Transactions on Neural Networks and Learning Systems ,vol. 28, pp. 2466–2478, Nov 2017.[12] A. Demontis, M. Melis, and et. al., “Yes, machine learning can be more secure! acase study on android malware detection,”

IEEE TDSC , pp. 1–1, 2018.[13] A. Madry, A. Makelov, and et. al., “Towards deep learning models resistant toadversarial attacks,” arXiv preprint arXiv:1706.06083 , 2017.[14] H. Xiao, B. Biggio, and et.al., “Is feature selection secure against training datapoisoning?,” in

ICML , pp. 1689–1698, 2015.[15] B. Biggio, B. Nelson, and P. Laskov, “Poisoning attacks against support vectormachines,” in

Proceedings of the ICLM 2012 , pp. 1467–1474, 2012.[16] Y. Liu, L. Wei, B. Luo, and Q. Xu, “Fault injection attack on deep neural network,”in , pp. 131–138, Nov 2017.[17] J. Breier, X. Hou, and et. al., “Practical fault attack on deep neural networks,” in

Proceedings of the ACM SIGSAC , CCS ’18, pp. 2204–2206, ACM, 2018.[18] B. Selmke, S. Brummer, and et. al., “Precise laser fault injections into 90 nmand 45 nm sram-cells,” in

International Conference on Smart Card Research andAdvanced Applications , pp. 193–205, Springer, 2015.[19] Y. Kim, R. Daly, and et. al., “Flipping bits in memory without accessing them: Anexperimental study of dram disturbance errors,” in

ACM/IEEE ISCA 2014 .[20] N. Carlini and D. Wagner, “Adversarial examples are not easily detected: Bypass-ing ten detection methods,” in

Proceedings of the 10th ACM Workshop on ArtificialIntelligence and Security , pp. 3–14, ACM, 2017.[21] I. J. Goodfellow, J. Shlens, and C. Szegedy, “Explaining and harnessing adversarialexamples,” arXiv preprint arXiv:1412.6572 , 2014.[22] A. Kurakin, I. J. Goodfellow, and S. Bengio, “Adversarial machine learning atscale,” , vol. arXiv preprint arXiv:1611.01236, 2017.[23] N. Papernot, P. McDaniel, and et.al., “The limitations of deep learning in adver-sarial settings,” in

EuroS&P 2016 , pp. 372–387, IEEE, 2016.[24] P. Zhao, K. Xu, T. Zhang, M. Fardad, Y. Wang, and X. Lin, “Reinforced adversarialattacks on deep neural networks using admm,” in , pp. 1169–1173, Nov 2018.[25] N. Papernot, P. McDaniel, and et. al., “Distillation as a defense to adversarialperturbations against deep neural networks,” in

SP 2016 , pp. 582–597, IEEE, 2016.[26] S. Wang, X. Wang, S. Ye, P. Zhao, and X. Lin, “Defending dnn adversarial attackswith pruning and logits augmentation,” in

GlobalSIP ’18 , pp. 1144–1148, Nov2018.[27] S. Wang, X. Wang, P. Zhao, W. Wen, D. Kaeli, P. Chin, and X. Lin, “Defensivedropout for hardening deep neural networks under adversarial attacks,” in

ICCAD’18 , (New York, NY, USA), pp. 71:1–71:8, ACM, 2018.[28] A. Barenghi, L. Breveglieri, and et. al., “Fault injection attacks on cryptographicdevices: Theory, practice, and countermeasures,”

Proceedings of the IEEE, 2012 .[29] Y. Xiao, X. Zhang, and et. al., “One bit flips, one cloud flops: Cross-vm rowhammer attacks and privilege escalation.,” in

USENIX Security Symposium , pp. 19–35, 2016.[30] V. Van Der Veen, Y. Fratantonio, and et. al., “Drammer: Deterministic rowhammerattacks on mobile platforms,” in

Proceedings of the 2016 ACM SIGSAC conferenceon computer and communications security , pp. 1675–1689, ACM.[31] Y. Jang, J. Lee, and et. al., “Sgx-bomb: Locking down the processor via rowhammerattack,” in

Proceedings of the 2nd Workshop on SysTEX , p. 5, ACM, 2017.[32] S. Boyd, N. Parikh, et al. , “Distributed optimization and statistical learning viathe alternating direction method of multipliers,”

Foundations and Trends® inMachine Learning , vol. 3, no. 1, pp. 1–122, 2011.[33] M. Hong and Z.-Q. Luo, “On the linear convergence of the alternating directionmethod of multipliers,”

Mathematical Programming , Mar 2017.[34] N. Parikh, S. Boyd, et al. , “Proximal algorithms,”

Foundations and Trends® inOptimization , vol. 1, no. 3, pp. 127–239, 2014.[35] X. Gao and S.-Z. Zhang, “First-order algorithms for convex optimization withnonseparable objective and coupled constraints,”

Journal of the ORSC, 2017 .36] Q. Liu, X. Shen, and Y. Gu, “Linearized admm for non-convex non-smoothoptimization with convergence analysis,” arXiv preprint arXiv:1705.02502 , 2017.[37] Y. Lecun, L. Bottou, and et. al., “Gradient-based learning applied to documentrecognition,”

Proceedings of the IEEE , vol. 86, pp. 2278–2324, Nov 1998.[38] A. Krizhevsky and G. Hinton, “Learning multiple layers of features from tinyimages,”