Cost-Sensitive Robustness against Adversarial Examples
PPublished as a conference paper at ICLR 2019 C OST -S ENSITIVE R OBUSTNESSAGAINST A DVERSARIAL E XAMPLES
Xiao Zhang
Department of Computer ScienceUniversity of Virginia [email protected]
David Evans
Department of Computer ScienceUniversity of Virginia [email protected] A BSTRACT
Several recent works have developed methods for training classifiers that are cer-tifiably robust against norm-bounded adversarial perturbations. These methodsassume that all the adversarial transformations are equally important, which is sel-dom the case in real-world applications. We advocate for cost-sensitive robustness as the criteria for measuring the classifier’s performance for tasks where someadversarial transformation are more important than others. We encode the potentialharm of each adversarial transformation in a cost matrix, and propose a generalobjective function to adapt the robust training method of Wong & Kolter (2018)to optimize for cost-sensitive robustness. Our experiments on simple MNIST andCIFAR10 models with a variety of cost matrices show that the proposed approachcan produce models with substantially reduced cost-sensitive robust error, whilemaintaining classification accuracy.
NTRODUCTION
Despite the exceptional performance of deep neural networks (DNNs) on various machine learningtasks such as malware detection (Saxe & Berlin, 2015), face recognition (Parkhi et al., 2015) andautonomous driving (Bojarski et al., 2016), recent studies (Szegedy et al., 2014; Goodfellow et al.,2015) have shown that deep learning models are vulnerable to misclassifying inputs, known as adversarial examples , that are crafted with targeted but visually-imperceptible perturbations. Whileseveral defense mechanisms have been proposed and empirically demonstrated to be successfulagainst existing particular attacks (Papernot et al., 2016; Goodfellow et al., 2015), new attacks(Carlini & Wagner, 2017; Tram`er et al., 2018; Athalye et al., 2018) are repeatedly found thatcircumvent such defenses. To end this arm race, recent works (Wong & Kolter, 2018; Raghunathanet al., 2018; Wong et al., 2018; Wang et al., 2018) propose methods to certify examples to be robustagainst some specific norm-bounded adversarial perturbations for given inputs and to train models tooptimize for certifiable robustness.However, all of the aforementioned methods aim at improving the overall robustness of the classifier.This means that the methods to improve robustness are designed to prevent seed examples in anyclass from being misclassified as any other class. Achieving such a goal (at least for some definitionsof adversarial robustness) requires producing a perfect classifier, and has, unsurprisingly, remainedelusive. Indeed, Mahloujifar et al. (2019) proved that if the metric probability space is concentrated,overall adversarial robustness is unattainable for any classifier with initial constant error.We argue that overall robustness may not be the appropriate criteria for measuring system perfor-mance in security-sensitive applications, since only certain kinds of adversarial misclassificationspose meaningful threats that provide value for potential adversaries. Whereas overall robustnessplaces equal emphasis on every adversarial transformation, from a security perspective, only certaintransformations matter. As a simple example, misclassifying a malicious program as benign resultsin more severe consequences than the reverse.In this paper, we propose a general method for adapting provable defenses against norm-boundedperturbations to take into account the potential harm of different adversarial class transformations.Inspired by cost-sensitive learning (Domingos, 1999; Elkan, 2001) for non-adversarial contexts, wecapture the impact of different adversarial class transformations using a cost matrix C , where each1 a r X i v : . [ c s . L G ] M a r ublished as a conference paper at ICLR 2019entry represents the cost of an adversary being able to take a natural example from the first class andperturb it so as to be misclassified by the model as the second class. Instead of reducing the overallrobust error, our goal is to minimize the cost-weighted robust error (which we define for both binaryand real-valued costs in C ). The proposed method incorporates the specified cost matrix into thetraining objective function, which encourages stronger robustness guarantees on cost-sensitive classtransformations, while maintaining the overall classification accuracy on the original inputs. Contributions.
By encoding the consequences of different adversarial transformations into a costmatrix, we introduce the notion of cost-sensitive robustness (Section 3.1) as a metric to assess theexpected performance of a classifier when facing adversarial examples. We propose an objectivefunction for training a cost-sensitive robust classifier (Section 3.2). The proposed method is general inthat it can incorporate any type of cost matrix, including both binary and real-valued. We demonstratethe effectiveness of the proposed cost-sensitive defense model for a variety of cost scenarios ontwo benchmark image classification datasets: MNIST (Section 4.1) and CIFAR10 (Section 4.2).Compared with the state-of-the-art overall robust defense model (Wong & Kolter, 2018), our modelachieves significant improvements in cost-sensitive robustness for different tasks, while maintainingapproximately the same classification accuracy on both datasets.
Notation.
We use lower-case boldface letters such as x for vectors and capital boldface letters suchas A to represent matrices. Let [ m ] be the index set { , , . . . , m } and A ij be the ( i, j ) -th entry ofmatrix A . Denote the i -th natural basis vector, the all-ones vector and the identity matrix by e i , and I respectively. For any vector x ∈ R d , the (cid:96) ∞ -norm of x is defined as (cid:107) x (cid:107) ∞ = max i ∈ [ d ] | x i | . ACKGROUND
In this section, we provide a brief introduction on related topics, including neural network classifiers,adversarial examples, defenses with certified robustness, and cost-sensitive learning.2.1 N
EURAL N ETWORK C LASSIFIERS A K -layer neural network classifier can be represented by a function f : X → Y such that f ( x ) = f K − ( f K − ( · · · ( f ( x )))) , for any x ∈ X . For k ∈ { , , . . . , K − } , the mapping function f k ( · ) typically consists of two operations: an affine transformation (either matrix multiplication orconvolution) and a nonlinear activation. In this paper, we consider rectified linear unit (ReLU) as theactivation function. If denote the feature vector of the k -th layer as z k , then f k ( · ) is defined as z k +1 = f k ( z k ) = max { W k z k + b k , } , ∀ k ∈ { , , . . . K − } , where W k denotes the weight parameter matrix and b k the bias vector. The output function f K − ( · ) maps the feature vector in the last hidden layer to the output space Y solely through matrix multi-plication: z K = f K − ( z K − ) = W K − z K − + b K − , where z K can be regarded as the estimatedscore vector of input x for different possible output classes. In the following discussions, we use f θ to represent the neural network classifier, where θ = { W , . . . , W K − , b , . . . , b K − } denotes themodel parameters.To train the neural network, a loss function (cid:80) Ni =1 L ( f θ ( x i ) , y i ) is defined for a set of trainingexamples { x i , y i } Ni =1 , where x i is the i -th input vector and y i denotes its class label. Cross-entropyloss is typically used for multiclass image classification. With proper initialization, all modelparameters are then updated iteratively using backpropagation. For any input example (cid:101) x , the predictedlabel (cid:98) y is given by the index of the largest predicted score among all classes, argmax j [ f θ ( (cid:101) x )] j .2.2 A DVERSARIAL E XAMPLES
An adversarial example is an input, generated by some adversary, which is visually indistinguishablefrom an example from the natural distribution, but is able to mislead the target classifier. Since “visu-ally indistinguishable” depends on human perception, which is hard to define rigorously, we considerthe most popular alternative: input examples with perturbations bounded in (cid:96) ∞ -norm (Goodfellowet al., 2015). More formally, the set of adversarial examples with respect to seed example { x , y } f θ ( · ) is defined as A (cid:15) ( x , y ; θ ) = (cid:8) x ∈ X : (cid:107) x − x (cid:107) ∞ ≤ (cid:15) and argmax j [ f θ ( x )] j (cid:54) = y (cid:9) , (2.1)where (cid:15) > denotes the maximum perturbation distance. Although (cid:96) p distances are commonly usedin adversarial examples research, they are not an adequate measure of perceptual similarity (Sharifet al., 2018) and other minimal geometric transformations can be used to find adversarial exam-ples (Engstrom et al., 2017; Kanbak et al., 2018; Xiao et al., 2018). Nevertheless, there is considerableinterest in improving robustness in this simple domain, and hope that as this research area matureswe will find ways to apply results from studying simplified problems to more realistic ones.2.3 D EFENSES WITH C ERTIFIED R OBUSTNESS
A line of recent work has proposed defenses that are guaranteed to be robust against norm-boundedadversarial perturbations. Hein & Andriushchenko (2017) proved formal robustness guaranteesagainst (cid:96) -norm bounded perturbations for two-layer neural networks, and provided a training methodbased on a surrogate robust bound. Raghunathan et al. (2018) developed an approach based onsemidefinite relaxation for training certified robust classifiers, but was limited to two-layer fully-connected networks. Our work builds most directly on Wong & Kolter (2018), which can be appliedto deep ReLU-based networks and achieves the state-of-the-art certified robustness on MNIST dataset.Following the definitions in Wong & Kolter (2018), an adversarial polytope Z (cid:15) ( x ) with respect to agiven example x is defined as Z (cid:15) ( x ) = (cid:8) f θ ( x + ∆ ) : (cid:107) ∆ (cid:107) ∞ ≤ (cid:15) (cid:9) , (2.2)which contains all the possible output vectors for the given classifier f θ by perturbing x within an (cid:96) ∞ -norm ball with radius (cid:15) . A seed example, { x , y } , is said to be certified robust with respectto maximum perturbation distance (cid:15) , if the corresponding adversarial example set A (cid:15) ( x , y ; θ ) isempty. Equivalently, if we solve, for any output class y targ (cid:54) = y , the optimization problem, minimize z K [ z K ] y − [ z K ] y targ , subject to z K ∈ Z (cid:15) ( x ) , (2.3)then according to the definition of A (cid:15) ( x , y ; θ ) in (2.1), { x , y } is guaranteed to be robust providedthat the optimal objective value of (2.3) is positive for every output class. To train a robust modelon a given dataset { x i , y i } Ni =1 , the standard robust optimization aims to minimize the sample lossfunction on the worst-case locations through the following adversarial loss minimize θ N (cid:88) i =1 max (cid:107) ∆ (cid:107) ∞ ≤ (cid:15) L (cid:0) f θ ( x i + ∆ ) , y i (cid:1) , (2.4)where L ( · , · ) denotes the cross-entropy loss. However, due to the nonconvexity of the neural networkclassifier f θ ( · ) introduced by the nonlinear ReLU activation, both the adversarial polytope (2.2) andtraining objective (2.4) are highly nonconvex. In addition, solving optimization problem (2.3) foreach pair of input example and output class is computationally intractable.Instead of solving the optimization problem directly, Wong & Kolter (2018) proposed an alternativetraining objective function based on convex relaxation, which can be efficiently optimized through adual network. Specifically, they relaxed Z (cid:15) ( x ) into a convex outer adversarial polytope (cid:101) Z (cid:15) ( x ) byreplacing the ReLU inequalities for each neuron z = max { (cid:98) z, } with a set of inequalities, z ≥ , z ≥ (cid:98) z, − u (cid:98) z + ( u − (cid:96) ) z ≤ − u(cid:96), (2.5)where u, (cid:96) denote the lower and upper bounds on the considered pre-ReLU activation. Based on therelaxed outer bound (cid:101) Z (cid:15) ( x ) , they propose the following alternative optimization problem, minimize z K [ z K ] y − [ z K ] y targ , subject to z K ∈ (cid:101) Z (cid:15) ( x ) , (2.6)which is in fact a linear program. Since Z (cid:15) ( x ) ⊆ (cid:101) Z (cid:15) ( x ) for any x ∈ X , solving (2.6) for alloutput classes provides stronger robustness guarantees compared with (2.3), provided all the optimal The elementwise activation bounds can be computed efficiently using Algorithm 1 in Wong & Kolter (2018). J (cid:15) (cid:0) x , g θ ( e y − e y targ ) (cid:1) , on the optimal objective value of Equation 2.6 using duality theory, where g θ ( · ) is a K -layer feedforward dual network (Theorem 1 in Wong & Kolter (2018)). Finally, accordingto the properties of cross-entropy loss, they minimize the following objective to train the robustmodel, which serves as an upper bound of the adversarial loss (2.4): minimize θ N N (cid:88) i =1 L (cid:18) − J (cid:15) (cid:0) x i , g θ ( e y i · (cid:62) − I ) (cid:1) , y i (cid:19) , (2.7)where g θ ( · ) is regarded as a columnwise function when applied to a matrix. Although the proposedmethod in Wong & Kolter (2018) achieves certified robustness, its computational complexity isquadratic with the network size in the worst case so it only scales to small networks. Recently, Wonget al. (2018) extended the training procedure to scale to larger networks by using nonlinear randomprojections. However, if the network size allows for both methods, we observe a small decrease inperformance using the training method provided in Wong et al. (2018). Therefore, we only use theapproximation techniques for the experiments on CIFAR10 ( § § OST -S ENSITIVE L EARNING
Cost-sensitive learning (Domingos, 1999; Elkan, 2001; Liu & Zhou, 2006) was proposed to dealwith unequal misclassification costs and class imbalance problems commonly found in classificationapplications. The key observation is that cost-blind learning algorithms tend to overwhelm themajor class, but the neglected minor class is often our primary interest. For example, in medicaldiagnosis misclassifying a rare cancerous lesion as benign is extremely costly. Various cost-sensitivelearning algorithms (Kukar & Kononenko, 1998; Zadrozny et al., 2003; Zhou & Liu, 2010; Khanet al., 2018) have been proposed in literature, but only a few algorithms, limited to simple classifiers,considered adversarial settings. Dalvi et al. (2004) studied the naive Bayes classifier for spamdetection in the presence of a cost-sensitive adversary, and developed an adversary-aware classifierbased on game theory. Asif et al. (2015) proposed a cost-sensitive robust minimax approach thathardens a linear discriminant classifier with robustness in the adversarial context. All of thesemethods are designed for simple linear classifiers, and cannot be directly extended to neural networkclassifiers. In addition, the robustness of their proposed classifier is only examined experimentallybased on the performance against some specific adversary, so does not provide any notion of certifiedrobustness. Recently, Dreossi et al. (2018) advocated for the idea of using application-level semanticsin adversarial analysis, however, they didn’t provide a formal method on how to train such classifier.Our work provides a practical training method that hardens neural network classifiers with certifiedcost-sensitive robustness against adversarial perturbations.
RAINING A C OST -S ENSITIVE R OBUST C LASSIFIER
The approach introduced in Wong & Kolter (2018) penalizes all adversarial class transformationsequally, even though the consequences of adversarial examples usually depends on the specific classtransformations. Here, we provide a formal definition of cost-sensitive robustness ( § § ERTIFIED C OST -S ENSITIVE R OBUSTNESS
Our approach uses a cost matrix C that encodes the cost (i.e., potential harm to model deployer)of different adversarial examples. First, we consider the case where there are m classes and C is a m × m binary matrix with C jj (cid:48) ∈ { , } . The value C jj (cid:48) indicates whether we care aboutan adversary transforming a seed input in class j into one recognized by the model as being inclass j (cid:48) . If the adversarial transformation j → j (cid:48) matters, C jj (cid:48) = 1 , otherwise C jj (cid:48) = 0 . Let Ω j = { j (cid:48) ∈ [ m ] : C jj (cid:48) (cid:54) = 0 } be the index set of output classes that induce cost with respect to Given the vulnerability of standard classifiers to adversarial examples, it is not surprising that standardcost-sensitive classifiers are also ineffective against adversaries. The experiments described in Appendix Bsupported this expectation. j . For any j ∈ [ m ] , let δ j = 0 if Ω j is an empty set, and δ j = 1 otherwise. We areonly concerned with adversarial transformations from a seed class j to target classes j (cid:48) ∈ Ω j . Forany example x in seed class j , x is said to be certified cost-sensitive robust if the lower bound J (cid:15) ( x , g θ ( e j − e j (cid:48) )) ≥ for all j (cid:48) ∈ Ω j . That is, no adversarial perturbations in an (cid:96) ∞ -norm ballaround x with radius (cid:15) can mislead the classifier to any target class in Ω j .The cost-sensitive robust error on a dataset { x i , y i } Ni =1 is defined as the number of examples that arenot guaranteed to be cost-sensitive robust over the number of non-zero cost candidate seed examples: cost-sensitive robust error = 1 − { i ∈ [ N ] : J (cid:15) ( x i , g θ ( e y i − e j (cid:48) )) ≥ , ∀ j (cid:48) ∈ Ω y i } (cid:80) j | δj =1 N j , where A represents the cardinality of a set A , and N j is the total number of examples in class j .Next, we consider a more general case where C is a m × m real-valued cost matrix. Each entry of C isa non-negative real number, which represents the cost of the corresponding adversarial transformation.To take into account the different potential costs among adversarial examples, we measure the cost-sensitive robustness by the average certified cost of adversarial examples. The cost of an adversarialexample x in class j is defined as the sum of all C jj (cid:48) such that J (cid:15) ( x , g θ ( e j − e j (cid:48) )) < . Intuitivelyspeaking, an adversarial example will induce more cost if it can be adversarially misclassified as moretarget classes with high cost. Accordingly, the robust cost is defined as the total cost of adversarialexamples divided by the total number of valued seed examples: robust cost = (cid:80) j | δj =1 (cid:80) i | yi = j (cid:80) j (cid:48) ∈ Ω j C jj (cid:48) · (cid:0) J (cid:15) ( x i , g θ ( e j − e j (cid:48) )) < (cid:1)(cid:80) j | δj =1 N j , (3.1)where ( · ) denotes the indicator function.3.2 C OST -S ENSITIVE R OBUST O PTIMIZATION
Recall that our goal is to develop a classifier with certified cost-sensitive robustness, as defined in § J (cid:15) (cid:0) x , g θ ( e y − e y targ ) (cid:1) on Equation 2.6 and inspired by the cost-sensitive CE loss (Khan et al., 2018),we propose the following robust optimization with respect to a neural network classifier f θ : minimize θ N (cid:88) i ∈ [ N ] L (cid:0) f θ ( x i ) , y i (cid:1) + α (cid:88) j ∈ [ m ] δ j N j (cid:88) i | yi = j log (cid:18) (cid:88) j (cid:48) ∈ Ω j C jj (cid:48) · exp (cid:0) − J (cid:15) ( x i , g θ ( e j − e j (cid:48) )) (cid:1)(cid:19) , (3.2)where α ≥ denotes the regularization parameter. The first term in Equation 3.2 denotes thecross-entropy loss for standard classification, whereas the second term accounts for the cost-sensitiverobustness. Compared with the overall robustness training objective function (2.7), we include aregularization parameter α to control the trade-off between classification accuracy on original inputsand adversarial robustness.To provide cost-sensitivity, the loss function selectively penalizes the adversarial examples based ontheir cost. For binary cost matrixes, the regularization term penalizes every cost-sensitive adversarialexample equally, but has no impact for instances where C jj (cid:48) = 0 . For the real-valued costs, a largervalue of C jj (cid:48) increases the weight of the corresponding adversarial transformation in the trainingobjective. This optimization problem (3.2) can be solved efficiently using gradient-based algorithms,such as stochastic gradient descent and ADAM (Kingma & Ba, 2015). XPERIMENTS
We evaluate the performance of our cost-sensitive robustness training method on models for twobenchmark image classification datasets: MNIST (LeCun et al., 2010) and CIFAR10 (Krizhevsky &Hinton, 2009). We compare our results for various cost scenarios with overall robustness training5ublished as a conference paper at ICLR 2019
Number of training epochs E rr o r r a t e
4% train robustvalidation robusttrain classificationvalidation classification (a) learning curves d i g i t d i g i t d i g i t d i g i t d i g i t d i g i t d i g i t d i g i t d i g i t d i g i t target class digit 0digit 1digit 2digit 3digit 4digit 5digit 6digit 7digit 8digit 9 s ee d c l a ss R o b u s t e rr o r r a t e (b) heatmap of robust test error Figure 1: Preliminary results on MNIST using overall robust classifier: (a) learning curves of theclassification error and overall robust error over the 60 training epochs; (b) heatmap of the robust testerror for pairwise class transformations based on the best trained classifier.( § (cid:96) ∞ -norm ball.Our goal in the experiments is to evaluate how well a variety of different types of cost matrices canbe supported. MNIST and CIFAR-10 are toy datasets, thus there are no obvious cost matrices thatcorrespond to meaningful security applications for these datasets. Instead, we select representativetasks and design cost matrices to capture them.4.1 MNISTFor MNIST, we use the same convolutional neural network architecture (LeCun et al., 1998) as Wong& Kolter (2018), which includes two convolutional layers, with 16 and 32 filters respectively, and atwo fully-connected layers, consisting of 100 and 10 hidden units respectively. ReLU activations areapplied to each layer except the last one. For both our cost-sensitive robust model and the overallrobust model, we randomly split the 60,000 training samples into five folds of equal size, and trainthe classifier over 60 epochs on four of them using the Adam optimizer (Kingma & Ba, 2015) withbatch size 50 and learning rate 0.001. We treat the remaining fold as a validation dataset for modelselection. In addition, we use the (cid:15) -scheduling and learning rate decay techniques, where we increase (cid:15) from . to the desired value linearly over the first 20 epochs and decay the learning rate by 0.5every 10 epochs for the remaining epochs. Baseline: Overall Robustness.
Figure 1(a) illustrates the learning curves of both classificationerror and overall robust error during training based on robust loss (2.7) with maximum perturbationdistance (cid:15) = 0 . . The model with classification error less than and minimum overall robust erroron the validation dataset is selected over the 60 training epochs. The best classifier reaches . classification error and . overall robust error on the 10,000 MNIST testing samples. We reportthe robust test error for every adversarial transformation in Figure 1(b) (for the model without anyrobustness training all of the values are ). The ( i, j ) -th entry is a bound on the robustness ofthat seed-target transformation—the fraction of testing examples in class i that cannot be certifiedrobust against transformation into class j for any (cid:15) norm-bounded attack. As shown in Figure 1(b),the vulnerability to adversarial transformations differs considerably among class pairs and appearscorrelated with perceptual similarity. For instance, only . of seeds in class 1 cannot be certifiedrobust for target class 9 compare to of seeds from class 9 into class 4. Binary Cost Matrix.
Next, we evaluate the effectiveness of cost-sensitive robustness training in producing models that aremore robust for adversarial transformations designated as valuable. We consider four types of tasksdefined by different binary cost matrices that capture different sets of adversarial transformations:6ublished as a conference paper at ICLR 2019Table 1: Comparisons between different robust defense models on MNIST dataset against (cid:96) ∞ norm-bounded adversarial perturbations with (cid:15) = 0 . . The sparsity gives the number of non-zero entries inthe cost matrix over the total number of possible adversarial transformations. The candidates columnis the number of potential seed examples for each task. Task Description Sparsity Candidates Best α Classification Error Robust Error baseline ours baseline ours single pair (0,2) 1/90 980 10.0 .
39% 2 .
68% 0 .
92% 0 . (6,5) 1/90 958 5.0 .
39% 2 .
49% 3 .
55% 0 . (4,9) 1/90 982 4.0 .
39% 3 .
00% 10 .
08% 1 . single seed digit 0 9/90 980 10.0 .
39% 3 .
48% 3 .
67% 0 . digit 2 9/90 1032 1.0 .
39% 2 .
91% 14 .
34% 3 . digit 8 9/90 974 0.4 .
39% 3 .
37% 22 .
28% 5 . single target digit 1 9/90 8865 4.0 .
39% 3 .
29% 2 .
23% 0 . digit 5 9/90 9108 2.0 .
39% 3 .
24% 3 .
10% 0 . digit 8 9/90 9026 1.0 .
39% 3 .
52% 5 .
24% 0 . multiple top 10 10/90 6024 0.4 .
39% 3 .
34% 11 .
14% 7 . random 10 10/90 7028 0.4 .
39% 3 .
18% 5 .
01% 2 . odd digit 45/90 5074 0.2 .
39% 3 .
30% 14 .
45% 9 . even digit 45/90 4926 0.1 .
39% 2 .
82% 13 .
13% 9 . Cost-sensitive robust error digit 9digit 8digit 7digit 6digit 5digit 4digit 3digit 2digit 1digit 0 baseline modelour model (a) single seed class
Cost-sensitive robust error digit 9digit 8digit 7digit 6digit 5digit 4digit 3digit 2digit 1digit 0 baseline modelour model (b) single target class
Figure 2: Cost-sensitive robust error using the proposed model and baseline model on MNIST fordifferent binary tasks: (a) treat each digit as the seed class of concern respectively; (b) treat each digitas the target class of concern respectively. single pair : particular seed class s to particular target class t ; single seed : particular seed class s to any target class; single target : any seed class to particular target class t ; and multiple : multipleseed and target classes. For each setting, the cost matrix is defined as C ij = 1 if ( i, j ) is selected;otherwise, C ij = 0 . In general, we expect that the sparser the cost matrix, the more opportunitythere is for cost-sensitive training to improve cost-sensitive robustness over models trained for overallrobustness.For the single pair task, we selected three representative adversarial goals: a low vulnerability pair(0, 2), medium vulnerability pair (6, 5) and high vulnerability pair (4, 9). We selected these pairsby considering the robust error results on the overall-robustness trained model (Figure 1(b)) as arough measure for transformation hardness. This is generally consistent with intuitions about theMNIST digit classes (e.g., 9 and 4 look similar, so are harder to induce robustness against adversarialtransformation), as well as with the visualization results produced by dimension reduction techniques,such as t-SNE (Maaten & Hinton, 2008). 7ublished as a conference paper at ICLR 2019Table 2: Comparison results of different robust defense models for tasks with real-valued cost matrix. Dataset Task Sparsity Candidates Best α Classification Error Robust Cost baseline ours baseline ours
MNIST small-large 45/90 10000 0.04 .
39% 3 .
47% 2 .
245 0 . MNIST large-small 45/90 10000 0.04 .
39% 3 .
13% 3 .
344 1 . CIFAR vehicle 40/90 4000 0.1 .
80% 26 .
19% 4 .
183 3 . Similarly, for the single seed and single target tasks we select three representative examples repre-senting low, medium, and high vulnerability to include in Table 1 and provide full results for allthe single-seed and single target tasks for MNIST in Figure 2. For the multiple transformationstask, we consider four variations: (i) the ten most vulnerable seed-target transformations; (ii) tenrandomly-selected seed-target transformations; (iii) all the class transformations from odd digit seedto any other class; (iv) all the class transformations from even digit seed to any other class.Table 1 summarizes the results, comparing the cost-sensitive robust error between the baseline modeltrained for overall robustness and a model trained using our cost-sensitive robust optimization. Thecost-sensitive robust defense model is trained with (cid:15) = 0 . based on loss function (3.2) and thecorresponding cost matrix C . The regularization parameter α is tuned via cross validation (seeAppendix A for details). We report the selected best α , classification error and cost-sensitive robusterror on the testing dataset.Our model achieves a substantial improvement on the cost-sensitive robustness compared with thebaseline model on all of the considered tasks, with no significant increases in normal classificationerror. The cost-sensitive robust error reduction varies from to , and is generally higher forsparse cost matrices. In particular, our classifier reduces the number of cost-sensitive adversarialexamples from 198 to 12 on the single target task with digit 1 as the target class. Real-valued Cost Matrices.
Loosely motivated by a check forging adversary who obtains value bychanging the semantic interpretation of a number (Papernot et al., 2016), we consider two real-valuedcost matrices: small-large , where only adversarial transformations from a smaller digit class to alarger one are valued, and the cost of valued-transformation is quadratic with the absolute differencebetween the seed and target class digits: C ij = ( i − j ) if j > i , otherwise C ij = 0 ; large-small :only adversarial transformations from a larger digit class to a smaller one are valued: C ij = ( i − j ) if i > j , otherwise C ij = 0 . We tune α for the cost-sensitive robust model on the training MNISTdataset via cross validation, and set all the other parameters the same as in the binary case. Thecertified robust error for every adversarial transformation on MNIST testing dataset is shown inFigure 3, and the classification error and robust cost are given in Table 2. Compared with themodel trained for overall robustness (Figure 1(b)), our trained classifier achieves stronger robustnessguarantees on the adversarial transformations that induce costs, especially for those with larger costs.4.2 CIFAR10We use the same neural network architecture for the CIFAR10 dataset as Wong et al. (2018), withfour convolutional layers and two fully-connected layers. For memory and computational efficiency,we incorporate the approximation technique based on nonlinear random projection during the trainingphase (Wong et al. (2018), § randomly-selected trainingexamples, and tune the regularization parameter α according to the performance on the remainingexamples as validation dataset. The tasks are similar to those for MNIST ( § d i g i t d i g i t d i g i t d i g i t d i g i t d i g i t d i g i t d i g i t d i g i t d i g i t digit 0digit 1digit 2digit 3digit 4digit 5digit 6digit 7digit 8digit 9 (a) small-large d i g i t d i g i t d i g i t d i g i t d i g i t d i g i t d i g i t d i g i t d i g i t d i g i t digit 0digit 1digit 2digit 3digit 4digit 5digit 6digit 7digit 8digit 9 (b) large-small Figure 3: Heatmaps of robust test error using our cost-sensitive robust classifier on MNIST for variousreal-valued cost tasks: (a) small-large ; (b) large-small .Table 3: Cost-sensitive robust models for CIFAR10 dataset against adversarial examples, (cid:15) = 2 / . Task Description Sparsity Candidates Best α Classification Error Robust Error baseline ours baseline ours single pair (frog, bird) 1/90 1000 10.0 .
80% 27 .
88% 19 .
90% 1 . (cat, plane) 1/90 1000 10.0 .
80% 28 .
63% 9 .
30% 2 . single seed dog 9/90 1000 0.2 .
80% 30 .
69% 57 .
20% 28 . truck 9/90 1000 0.8 .
80% 31 .
55% 35 .
60% 15 . single target deer 9/90 9000 0.1 .
80% 26 .
69% 16 .
99% 3 . ship 9/90 9000 0.1 .
80% 24 .
80% 9 .
42% 3 . multiple A-V 24/90 6000 0.1 .
80% 26 .
65% 16 .
67% 7 . V-A 24/90 4000 0.2 .
80% 27 .
60% 12 .
07% 8 . Table 3 shows results on the testing data based on different robust defense models with (cid:15) = 2 / .For all of the aforementioned tasks, our models substantially reduce the cost-sensitive robust errorwhile keeping a lower classification error than the baseline.For the real-valued task, we are concerned with adversarial transformations from seed examplesin vehicle classes to other target classes. In addition, more cost is placed on transformations fromvehicle to animal, which is 10 times larger compared with that from vehicle to vehicle. Figures 4(a)and 4(b) illustrate the pairwise robust test error using overall robust model and the proposed classifierfor the aforementioned real-valued task on CIFAR10.4.3 V ARYING A DVERSARY S TRENGTH
We investigate the performance of our model against different levels of adversarial strength byvarying the value of (cid:15) that defines the (cid:96) ∞ ball available to the adversary. Figure 5 show the overallclassification and cost-sensitive robust error of our best trained model, compared with the baselinemodel, on the MNIST single seed task with digit 9 and CIFAR single seed task with dog as the seedclass of concern, as we vary the maximum (cid:96) ∞ perturbation distance.Under all the considered attack models, the proposed classifier achieves better cost-sensitive adversar-ial robustness than the baseline, while maintaining similar classification accuracy on original datapoints. As the adversarial strength increases, the improvement for cost-sensitive robustness overoverall robustness becomes more significant. 9ublished as a conference paper at ICLR 2019 p l a n e c a r b i r d c a t d ee r d o g f r o g h o r s e s h i p t r u c k planecarbirdcatdeerdogfroghorseshiptruck (a) baseline model p l a n e c a r b i r d c a t d ee r d o g f r o g h o r s e s h i p t r u c k planecarbirdcatdeerdogfroghorseshiptruck (b) our model Figure 4: Heatmaps of robust test error for the real-valued task on CIFAR10 using different robustclassifiers: (a) baseline model; (b) our proposed cost-sensitive robust model. = 0.1 = 0.15 = 0.2 = 0.25 maximum perturbation distance e rr o r r a t e b a s e d o n d i ff e r e n t m o d e l s classification error (baseline)classification error (ours)robust error (baseline)robust error (ours) (a) MNIST = 2/255 = 4/255 = 6/255 maximum perturbation distance e rr o r r a t e b a s e d o n d i ff e r e n t m o d e l s classification error (baseline)classification error (ours)robust error (baseline)robust error (ours) (b) CIFAR10 Figure 5: Results for different adversary strengths, (cid:15) , for different settings: (a) MNIST single seedtask with digit 9 as the chosen class; (b) CIFAR10 single seed task with dog as the chosen class.
ONCLUSION
By focusing on overall robustness, previous robustness training methods expend a large fraction of thecapacity of the network on unimportant transformations. We argue that for most scenarios, the actualharm caused by an adversarial transformation often varies depending on the seed and target class,so robust training methods should be designed to account for these differences. By incorporating acost matrix into the training objective, we develop a general method for producing a cost-sensitiverobust classifier. Our experimental results show that our cost-sensitive training method works acrossa variety of different types of cost matrices, so we believe it can be generalized to other cost matrixscenarios that would be found in realistic applications.There remains a large gap between the small models and limited attacker capabilities for whichwe can achieve certifiable robustness, and the complex models and unconstrained attacks that maybe important in practice. The scalability of our techniques is limited to the toy models and simpleattack norms for which certifiable robustness is currently feasible, so considerable process is neededbefore they could be applied to realistic scenarios. However, we hope that considering cost-sensitiverobustness instead of overall robustness is a step towards achieving more realistic robustness goals.10ublished as a conference paper at ICLR 2019A
VAILABILITY
Our implementation, including code for reproducing all our experiments, is available as open sourcecode at https://github.com/xiaozhanguva/Cost-Sensitive-Robustness .A CKNOWLEDGEMENTS
We thank Eric Wong for providing the implementation of certified robustness we built on, as well asfor insightful discussions. We thank Jianfeng Chi for helpful advice on implementing our experiments.This work was supported by grants from the National Science Foundation ( R EFERENCES
Kaiser Asif, Wei Xing, Sima Behpour, and Brian D Ziebart. Adversarial cost-sensitive classification.In st Conference on Uncertainty in Artificial Intelligence , 2015.Anish Athalye, Nicholas Carlini, and David Wagner. Obfuscated gradients give a false sense ofsecurity: Circumventing defenses to adversarial examples. In
International Conference on MachineLearning , 2018.Mariusz Bojarski, Davide Del Testa, Daniel Dworakowski, Bernhard Firner, Beat Flepp, PrasoonGoyal, Lawrence D. Jackel, Mathew Monfort, Urs Muller, Jiakai Zhang, Xin Zhang, Jake Zhao,and Karol Zieba. End to end learning for self-driving cars. arXiv preprint arXiv:1604.07316 , 2016.Nicholas Carlini and David Wagner. Towards evaluating the robustness of neural networks. In
IEEESymposium on Security and Privacy , 2017.Nilesh Dalvi, Pedro Domingos, Sumit Sanghai, Deepak Verma, et al. Adversarial classification. In th ACM SIGKDD Conference on Knowledge Discovery and Data Mining , 2004.Pedro Domingos. Metacost: A general method for making classifiers cost-sensitive. In
Fifth ACMSIGKDD International Conference on Knowledge Discovery and Data Mining , 1999.Tommaso Dreossi, Somesh Jha, and Sanjit A Seshia. Semantic adversarial deep learning. In
International Conference on Computer Aided Verification , 2018.Charles Elkan. The foundations of cost-sensitive learning. In
International Joint Conference onArtificial Intelligence , 2001.Logan Engstrom, Dimitris Tsipras, Ludwig Schmidt, and Aleksander Madry. A rotation and atranslation suffice: Fooling CNNs with simple transformations. arXiv preprint arXiv:1712.02779 ,2017.Ian Goodfellow, Jonathon Shlens, and Christian Szegedy. Explaining and harnessing adversarialexamples. In
International Conference on Learning Representations , 2015.Matthias Hein and Maksym Andriushchenko. Formal guarantees on the robustness of a classifieragainst adversarial manipulation. In
Advances in Neural Information Processing Systems , 2017.Can Kanbak, Seyed-Mohsen Moosavi-Dezfooli, and Pascal Frossard. Geometric robustness of deepnetworks: analysis and improvement. In
Computer Vision and Pattern Recognition , 2018.Salman H Khan, Munawar Hayat, Mohammed Bennamoun, Ferdous A Sohel, and Roberto Togneri.Cost-sensitive learning of deep feature representations from imbalanced data.
IEEE Transactionson Neural Networks and Learning Systems , 29(8):3573–3587, 2018.Diederik P. Kingma and Jimmy Ba. Adam: A method for stochastic optimization. In
InternationalConference on Learning Representations , 2015.Alex Krizhevsky and Geoffrey Hinton. Learning multiple layers of features from tiny images.Technical report, University of Toronto, 2009.11ublished as a conference paper at ICLR 2019Matjaˇz Kukar and Igor Kononenko. Cost-sensitive learning with neural networks. In , 1998.Yann LeCun, L´eon Bottou, Yoshua Bengio, and Patrick Haffner. Gradient-based learning applied todocument recognition.
Proceedings of the IEEE , 86(11):2278–2324, 1998.Yann LeCun, Corinna Cortes, and CJ Burges. MNIST handwritten digit database. http://yann.lecun.com/exdb/mnist , 2010.Xu-Ying Liu and Zhi-Hua Zhou. The influence of class imbalance on cost-sensitive learning: Anempirical study. In
Sixth International Conference on Data Mining , 2006.Laurens van der Maaten and Geoffrey Hinton. Visualizing data using t-SNE.
Journal of MachineLearning Research , 2008.Saeed Mahloujifar, Dimitrios I Diochnos, and Mohammad Mahmoody. The curse of concentration inrobust learning: Evasion and poisoning attacks from concentration of measure. In
AAAI Conferenceon Artificial Intelligence , 2019.Nicolas Papernot, Patrick McDaniel, Xi Wu, Somesh Jha, and Ananthram Swami. Distillation as adefense to adversarial perturbations against deep neural networks. In
IEEE Symposium on Securityand Privacy , 2016.Omkar M Parkhi, Andrea Vedaldi, and Andrew Zisserman. Deep face recognition. In
British MachineVision Conference , 2015.Aditi Raghunathan, Jacob Steinhardt, and Percy Liang. Certified defenses against adversarialexamples. In
International Conference on Learning Representations , 2018.Joshua Saxe and Konstantin Berlin. Deep neural network based malware detection using twodimensional binary program features. In , 2015.Mahmood Sharif, Lujo Bauer, and Michael K Reiter. On the suitability of l p -norms for creatingand preventing adversarial examples. In CVPR Workshop on Bright and Dark Sides of ComputerVision: Challenges and Opportunities for Privacy and Security , 2018.Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian Goodfellow,and Rob Fergus. Intriguing properties of neural networks. In
International Conference on LearningRepresentations , 2014.Florian Tram`er, Alexey Kurakin, Nicolas Papernot, Ian Goodfellow, Dan Boneh, and Patrick Mc-Daniel. Ensemble adversarial training: Attacks and defenses. In
International Conference onLearning Representations , 2018.Shiqi Wang, Kexin Pei, Justin Whitehouse, Junfeng Yang, and Suman Jana. Formal security analysisof neural networks using symbolic intervals. In
USENIX Security Symposium , 2018.Eric Wong and Zico Kolter. Provable defenses against adversarial examples via the convex outeradversarial polytope. In
International Conference on Machine Learning , 2018.Eric Wong, Frank Schmidt, Jan Hendrik Metzen, and J Zico Kolter. Scaling provable adversarialdefenses. In
Conference on Neural Information Processing Systems , 2018.Chaowei Xiao, Jun-Yan Zhu, Bo Li, Warren He, Mingyan Liu, and Dawn Song. Spatially transformedadversarial examples. In
International Conference on Learning Representations , 2018.Bianca Zadrozny, John Langford, and Naoki Abe. Cost-sensitive learning by cost-proportionateexample weighting. In
Third IEEE International Conference on Data Mining , 2003.Zhi-Hua Zhou and Xu-Ying Liu. On multi-class cost-sensitive learning.
Computational Intelligence ,26(3):232–257, 2010. 12ublished as a conference paper at ICLR 2019
Cost-Sensitive Robustness against Adversarial Examples
Supplemental Materials
A P
ARAMETER T UNING
For experiments on the MNIST dataset, we first perform a coarse tuning on regularization parameter α with searching grid { − , − , , , } , and select the most appropriate one, denoted by α coarse , with overall classification error less than and the lowest cost-sensitive robust error onvalidation dataset. Then, we further finely tune α from the range { − , − , − , , , , } · α coarse , and choose the best robust model according to the same criteria.Figures 6(a) and 6(b) show the learning curves for task B with digit 9 as the selected seed class basedon the proposed cost-sensitive robust model with varying α (we show digit 9 because it is one of themost vulnerable seed classes). The results suggest that as the value of α increases, the correspondingclassifier will have a lower cost-sensitive robust error but a higher classification error, which is whatwe expect from the design of (3.2).We observe similar trends for the learning curves for the other tasks, so do not present them here. Forthe CIFAR10 experiments, a similar tuning strategy is implemented. The only difference is that weuse as the threshold of overall classification error for selecting the best α . Number of training epochs C l a ss i f i c a t i o n e rr o r
4% alpha=0.01alpha=0.1alpha=1.0alpha=10.0 (a) classification error
Number of training epochs C o s t - s e n s i t i v e r o b u s t e rr o r alpha=0.01alpha=0.1alpha=1.0alpha=10.0 (b) cost-sensitive robust error Figure 6: Learning curves for single seed task with digit 9 as the selected seed class on MNIST usingthe proposed model with varying α : (a) learning curves of classification error; (b) learning curves ofcost-sensitive robust error. The maximum perturbation distance is specified as (cid:15) = 0 . . B C
OMPARISON WITH S TANDARD C OST -S ENSITIVE C LASSIFIER
As discussed in Section 2.4, prior work on cost-sensitive learning mainly focuses on the non-adversarial setting. In this section, we investigate the robustness of the cross-entropy based cost-sensitive classifier proposed in Khan et al. (2018), and compare the performance of their classifierwith our proposed cost-sensitive robust classifier. Given a set of training examples { ( x i , y i ) } Ni =1 and cost matrix C with each entry representing the cost of the corresponding misclassification, theevaluation metric for cost-sensitive learning is defined as the average cost of misclassifications, ormore concretely misclassfication cost = 1 N (cid:88) i ∈ [ N ] C y i (cid:98) y i , where (cid:98) y i = argmax j ∈ [ m ] [ f θ ( x i )] j , where m is the total number of class labels and f θ ( · ) denotes the neural network classifier asintroduced in Section 2.1. In addition, the cross-entropy based cost-sensitive training objective takesthe following form: minimize θ N (cid:88) j ∈ [ m ] (cid:88) i | yi = j log (cid:18) (cid:88) j (cid:48) (cid:54) = j C jj (cid:48) · exp (cid:0) [ f θ ( x i )] j (cid:48) − [ f θ ( x i )] j (cid:1)(cid:19) . (B.1)13ublished as a conference paper at ICLR 2019Table 4: Comparison results of different trained classifiers for small-large real-valued task on MNISTwith maximum perturbation distance (cid:15) = 0 . . Classifier Classification Error Misclassification Cost Robust CostBaseline
Cost-Sensitive Standard
Overall Robustness
Cost-Sensitive Robustness C is designedas C ij = 0 . , if i > j ; C ij = 0 , if i = j ; C ij = ( i − j )2