[PDF] Cost-Sensitive Robustness against Adversarial Examples

Abstract

Several recent works have developed methods for training classifiers that are certifiably robust against norm-bounded adversarial perturbations. These methods assume that all the adversarial transformations are equally important, which is seldom the case in real-world applications. We advocate for cost-sensitive robustness as the criteria for measuring the classifier's performance for tasks where some adversarial transformation are more important than others. We encode the potential harm of each adversarial transformation in a cost matrix, and propose a general objective function to adapt the robust training method of Wong & Kolter (2018) to optimize for cost-sensitive robustness. Our experiments on simple MNIST and CIFAR10 models with a variety of cost matrices show that the proposed approach can produce models with substantially reduced cost-sensitive robust error, while maintaining classification accuracy.

Full PDF

PPublished as a conference paper at ICLR 2019 C OST -S ENSITIVE R OBUSTNESSAGAINST A DVERSARIAL E XAMPLES

Xiao Zhang

Department of Computer ScienceUniversity of Virginia [email protected]

David Evans

Department of Computer ScienceUniversity of Virginia [email protected] A BSTRACT

Several recent works have developed methods for training classiﬁers that are cer-tiﬁably robust against norm-bounded adversarial perturbations. These methodsassume that all the adversarial transformations are equally important, which is sel-dom the case in real-world applications. We advocate for cost-sensitive robustness as the criteria for measuring the classiﬁer’s performance for tasks where someadversarial transformation are more important than others. We encode the potentialharm of each adversarial transformation in a cost matrix, and propose a generalobjective function to adapt the robust training method of Wong & Kolter (2018)to optimize for cost-sensitive robustness. Our experiments on simple MNIST andCIFAR10 models with a variety of cost matrices show that the proposed approachcan produce models with substantially reduced cost-sensitive robust error, whilemaintaining classiﬁcation accuracy.

NTRODUCTION

Despite the exceptional performance of deep neural networks (DNNs) on various machine learningtasks such as malware detection (Saxe & Berlin, 2015), face recognition (Parkhi et al., 2015) andautonomous driving (Bojarski et al., 2016), recent studies (Szegedy et al., 2014; Goodfellow et al.,2015) have shown that deep learning models are vulnerable to misclassifying inputs, known as adversarial examples , that are crafted with targeted but visually-imperceptible perturbations. Whileseveral defense mechanisms have been proposed and empirically demonstrated to be successfulagainst existing particular attacks (Papernot et al., 2016; Goodfellow et al., 2015), new attacks(Carlini & Wagner, 2017; Tram`er et al., 2018; Athalye et al., 2018) are repeatedly found thatcircumvent such defenses. To end this arm race, recent works (Wong & Kolter, 2018; Raghunathanet al., 2018; Wong et al., 2018; Wang et al., 2018) propose methods to certify examples to be robustagainst some speciﬁc norm-bounded adversarial perturbations for given inputs and to train models tooptimize for certiﬁable robustness.However, all of the aforementioned methods aim at improving the overall robustness of the classiﬁer.This means that the methods to improve robustness are designed to prevent seed examples in anyclass from being misclassiﬁed as any other class. Achieving such a goal (at least for some deﬁnitionsof adversarial robustness) requires producing a perfect classiﬁer, and has, unsurprisingly, remainedelusive. Indeed, Mahloujifar et al. (2019) proved that if the metric probability space is concentrated,overall adversarial robustness is unattainable for any classiﬁer with initial constant error.We argue that overall robustness may not be the appropriate criteria for measuring system perfor-mance in security-sensitive applications, since only certain kinds of adversarial misclassiﬁcationspose meaningful threats that provide value for potential adversaries. Whereas overall robustnessplaces equal emphasis on every adversarial transformation, from a security perspective, only certaintransformations matter. As a simple example, misclassifying a malicious program as benign resultsin more severe consequences than the reverse.In this paper, we propose a general method for adapting provable defenses against norm-boundedperturbations to take into account the potential harm of different adversarial class transformations.Inspired by cost-sensitive learning (Domingos, 1999; Elkan, 2001) for non-adversarial contexts, wecapture the impact of different adversarial class transformations using a cost matrix C , where each1 a r X i v : . [ c s . L G ] M a r ublished as a conference paper at ICLR 2019entry represents the cost of an adversary being able to take a natural example from the ﬁrst class andperturb it so as to be misclassiﬁed by the model as the second class. Instead of reducing the overallrobust error, our goal is to minimize the cost-weighted robust error (which we deﬁne for both binaryand real-valued costs in C ). The proposed method incorporates the speciﬁed cost matrix into thetraining objective function, which encourages stronger robustness guarantees on cost-sensitive classtransformations, while maintaining the overall classiﬁcation accuracy on the original inputs. Contributions.

By encoding the consequences of different adversarial transformations into a costmatrix, we introduce the notion of cost-sensitive robustness (Section 3.1) as a metric to assess theexpected performance of a classiﬁer when facing adversarial examples. We propose an objectivefunction for training a cost-sensitive robust classiﬁer (Section 3.2). The proposed method is general inthat it can incorporate any type of cost matrix, including both binary and real-valued. We demonstratethe effectiveness of the proposed cost-sensitive defense model for a variety of cost scenarios ontwo benchmark image classiﬁcation datasets: MNIST (Section 4.1) and CIFAR10 (Section 4.2).Compared with the state-of-the-art overall robust defense model (Wong & Kolter, 2018), our modelachieves signiﬁcant improvements in cost-sensitive robustness for different tasks, while maintainingapproximately the same classiﬁcation accuracy on both datasets.

Notation.

We use lower-case boldface letters such as x for vectors and capital boldface letters suchas A to represent matrices. Let [ m ] be the index set { , , . . . , m } and A ij be the ( i, j ) -th entry ofmatrix A . Denote the i -th natural basis vector, the all-ones vector and the identity matrix by e i , and I respectively. For any vector x ∈ R d , the (cid:96) ∞ -norm of x is deﬁned as (cid:107) x (cid:107) ∞ = max i ∈ [ d ] | x i | . ACKGROUND

In this section, we provide a brief introduction on related topics, including neural network classiﬁers,adversarial examples, defenses with certiﬁed robustness, and cost-sensitive learning.2.1 N

EURAL N ETWORK C LASSIFIERS A K -layer neural network classiﬁer can be represented by a function f : X → Y such that f ( x ) = f K − ( f K − ( · · · ( f ( x )))) , for any x ∈ X . For k ∈ { , , . . . , K − } , the mapping function f k ( · ) typically consists of two operations: an afﬁne transformation (either matrix multiplication orconvolution) and a nonlinear activation. In this paper, we consider rectiﬁed linear unit (ReLU) as theactivation function. If denote the feature vector of the k -th layer as z k , then f k ( · ) is deﬁned as z k +1 = f k ( z k ) = max { W k z k + b k , } , ∀ k ∈ { , , . . . K − } , where W k denotes the weight parameter matrix and b k the bias vector. The output function f K − ( · ) maps the feature vector in the last hidden layer to the output space Y solely through matrix multi-plication: z K = f K − ( z K − ) = W K − z K − + b K − , where z K can be regarded as the estimatedscore vector of input x for different possible output classes. In the following discussions, we use f θ to represent the neural network classiﬁer, where θ = { W , . . . , W K − , b , . . . , b K − } denotes themodel parameters.To train the neural network, a loss function (cid:80) Ni =1 L ( f θ ( x i ) , y i ) is deﬁned for a set of trainingexamples { x i , y i } Ni =1 , where x i is the i -th input vector and y i denotes its class label. Cross-entropyloss is typically used for multiclass image classiﬁcation. With proper initialization, all modelparameters are then updated iteratively using backpropagation. For any input example (cid:101) x , the predictedlabel (cid:98) y is given by the index of the largest predicted score among all classes, argmax j [ f θ ( (cid:101) x )] j .2.2 A DVERSARIAL E XAMPLES

An adversarial example is an input, generated by some adversary, which is visually indistinguishablefrom an example from the natural distribution, but is able to mislead the target classiﬁer. Since “visu-ally indistinguishable” depends on human perception, which is hard to deﬁne rigorously, we considerthe most popular alternative: input examples with perturbations bounded in (cid:96) ∞ -norm (Goodfellowet al., 2015). More formally, the set of adversarial examples with respect to seed example { x , y } f θ ( · ) is deﬁned as A (cid:15) ( x , y ; θ ) = (cid:8) x ∈ X : (cid:107) x − x (cid:107) ∞ ≤ (cid:15) and argmax j [ f θ ( x )] j (cid:54) = y (cid:9) , (2.1)where (cid:15) > denotes the maximum perturbation distance. Although (cid:96) p distances are commonly usedin adversarial examples research, they are not an adequate measure of perceptual similarity (Sharifet al., 2018) and other minimal geometric transformations can be used to ﬁnd adversarial exam-ples (Engstrom et al., 2017; Kanbak et al., 2018; Xiao et al., 2018). Nevertheless, there is considerableinterest in improving robustness in this simple domain, and hope that as this research area matureswe will ﬁnd ways to apply results from studying simpliﬁed problems to more realistic ones.2.3 D EFENSES WITH C ERTIFIED R OBUSTNESS

A line of recent work has proposed defenses that are guaranteed to be robust against norm-boundedadversarial perturbations. Hein & Andriushchenko (2017) proved formal robustness guaranteesagainst (cid:96) -norm bounded perturbations for two-layer neural networks, and provided a training methodbased on a surrogate robust bound. Raghunathan et al. (2018) developed an approach based onsemideﬁnite relaxation for training certiﬁed robust classiﬁers, but was limited to two-layer fully-connected networks. Our work builds most directly on Wong & Kolter (2018), which can be appliedto deep ReLU-based networks and achieves the state-of-the-art certiﬁed robustness on MNIST dataset.Following the deﬁnitions in Wong & Kolter (2018), an adversarial polytope Z (cid:15) ( x ) with respect to agiven example x is deﬁned as Z (cid:15) ( x ) = (cid:8) f θ ( x + ∆ ) : (cid:107) ∆ (cid:107) ∞ ≤ (cid:15) (cid:9) , (2.2)which contains all the possible output vectors for the given classiﬁer f θ by perturbing x within an (cid:96) ∞ -norm ball with radius (cid:15) . A seed example, { x , y } , is said to be certiﬁed robust with respectto maximum perturbation distance (cid:15) , if the corresponding adversarial example set A (cid:15) ( x , y ; θ ) isempty. Equivalently, if we solve, for any output class y targ (cid:54) = y , the optimization problem, minimize z K [ z K ] y − [ z K ] y targ , subject to z K ∈ Z (cid:15) ( x ) , (2.3)then according to the deﬁnition of A (cid:15) ( x , y ; θ ) in (2.1), { x , y } is guaranteed to be robust providedthat the optimal objective value of (2.3) is positive for every output class. To train a robust modelon a given dataset { x i , y i } Ni =1 , the standard robust optimization aims to minimize the sample lossfunction on the worst-case locations through the following adversarial loss minimize θ N (cid:88) i =1 max (cid:107) ∆ (cid:107) ∞ ≤ (cid:15) L (cid:0) f θ ( x i + ∆ ) , y i (cid:1) , (2.4)where L ( · , · ) denotes the cross-entropy loss. However, due to the nonconvexity of the neural networkclassiﬁer f θ ( · ) introduced by the nonlinear ReLU activation, both the adversarial polytope (2.2) andtraining objective (2.4) are highly nonconvex. In addition, solving optimization problem (2.3) foreach pair of input example and output class is computationally intractable.Instead of solving the optimization problem directly, Wong & Kolter (2018) proposed an alternativetraining objective function based on convex relaxation, which can be efﬁciently optimized through adual network. Speciﬁcally, they relaxed Z (cid:15) ( x ) into a convex outer adversarial polytope (cid:101) Z (cid:15) ( x ) byreplacing the ReLU inequalities for each neuron z = max { (cid:98) z, } with a set of inequalities, z ≥ , z ≥ (cid:98) z, − u (cid:98) z + ( u − (cid:96) ) z ≤ − u(cid:96), (2.5)where u, (cid:96) denote the lower and upper bounds on the considered pre-ReLU activation. Based on therelaxed outer bound (cid:101) Z (cid:15) ( x ) , they propose the following alternative optimization problem, minimize z K [ z K ] y − [ z K ] y targ , subject to z K ∈ (cid:101) Z (cid:15) ( x ) , (2.6)which is in fact a linear program. Since Z (cid:15) ( x ) ⊆ (cid:101) Z (cid:15) ( x ) for any x ∈ X , solving (2.6) for alloutput classes provides stronger robustness guarantees compared with (2.3), provided all the optimal The elementwise activation bounds can be computed efﬁciently using Algorithm 1 in Wong & Kolter (2018). J (cid:15) (cid:0) x , g θ ( e y − e y targ ) (cid:1) , on the optimal objective value of Equation 2.6 using duality theory, where g θ ( · ) is a K -layer feedforward dual network (Theorem 1 in Wong & Kolter (2018)). Finally, accordingto the properties of cross-entropy loss, they minimize the following objective to train the robustmodel, which serves as an upper bound of the adversarial loss (2.4): minimize θ N N (cid:88) i =1 L (cid:18) − J (cid:15) (cid:0) x i , g θ ( e y i · (cid:62) − I ) (cid:1) , y i (cid:19) , (2.7)where g θ ( · ) is regarded as a columnwise function when applied to a matrix. Although the proposedmethod in Wong & Kolter (2018) achieves certiﬁed robustness, its computational complexity isquadratic with the network size in the worst case so it only scales to small networks. Recently, Wonget al. (2018) extended the training procedure to scale to larger networks by using nonlinear randomprojections. However, if the network size allows for both methods, we observe a small decrease inperformance using the training method provided in Wong et al. (2018). Therefore, we only use theapproximation techniques for the experiments on CIFAR10 ( § § OST -S ENSITIVE L EARNING

Cost-sensitive learning (Domingos, 1999; Elkan, 2001; Liu & Zhou, 2006) was proposed to dealwith unequal misclassiﬁcation costs and class imbalance problems commonly found in classiﬁcationapplications. The key observation is that cost-blind learning algorithms tend to overwhelm themajor class, but the neglected minor class is often our primary interest. For example, in medicaldiagnosis misclassifying a rare cancerous lesion as benign is extremely costly. Various cost-sensitivelearning algorithms (Kukar & Kononenko, 1998; Zadrozny et al., 2003; Zhou & Liu, 2010; Khanet al., 2018) have been proposed in literature, but only a few algorithms, limited to simple classiﬁers,considered adversarial settings. Dalvi et al. (2004) studied the naive Bayes classiﬁer for spamdetection in the presence of a cost-sensitive adversary, and developed an adversary-aware classiﬁerbased on game theory. Asif et al. (2015) proposed a cost-sensitive robust minimax approach thathardens a linear discriminant classiﬁer with robustness in the adversarial context. All of thesemethods are designed for simple linear classiﬁers, and cannot be directly extended to neural networkclassiﬁers. In addition, the robustness of their proposed classiﬁer is only examined experimentallybased on the performance against some speciﬁc adversary, so does not provide any notion of certiﬁedrobustness. Recently, Dreossi et al. (2018) advocated for the idea of using application-level semanticsin adversarial analysis, however, they didn’t provide a formal method on how to train such classiﬁer.Our work provides a practical training method that hardens neural network classiﬁers with certiﬁedcost-sensitive robustness against adversarial perturbations.

RAINING A C OST -S ENSITIVE R OBUST C LASSIFIER

The approach introduced in Wong & Kolter (2018) penalizes all adversarial class transformationsequally, even though the consequences of adversarial examples usually depends on the speciﬁc classtransformations. Here, we provide a formal deﬁnition of cost-sensitive robustness ( § § ERTIFIED C OST -S ENSITIVE R OBUSTNESS

Our approach uses a cost matrix C that encodes the cost (i.e., potential harm to model deployer)of different adversarial examples. First, we consider the case where there are m classes and C is a m × m binary matrix with C jj (cid:48) ∈ { , } . The value C jj (cid:48) indicates whether we care aboutan adversary transforming a seed input in class j into one recognized by the model as being inclass j (cid:48) . If the adversarial transformation j → j (cid:48) matters, C jj (cid:48) = 1 , otherwise C jj (cid:48) = 0 . Let Ω j = { j (cid:48) ∈ [ m ] : C jj (cid:48) (cid:54) = 0 } be the index set of output classes that induce cost with respect to Given the vulnerability of standard classiﬁers to adversarial examples, it is not surprising that standardcost-sensitive classiﬁers are also ineffective against adversaries. The experiments described in Appendix Bsupported this expectation. j . For any j ∈ [ m ] , let δ j = 0 if Ω j is an empty set, and δ j = 1 otherwise. We areonly concerned with adversarial transformations from a seed class j to target classes j (cid:48) ∈ Ω j . Forany example x in seed class j , x is said to be certiﬁed cost-sensitive robust if the lower bound J (cid:15) ( x , g θ ( e j − e j (cid:48) )) ≥ for all j (cid:48) ∈ Ω j . That is, no adversarial perturbations in an (cid:96) ∞ -norm ballaround x with radius (cid:15) can mislead the classiﬁer to any target class in Ω j .The cost-sensitive robust error on a dataset { x i , y i } Ni =1 is deﬁned as the number of examples that arenot guaranteed to be cost-sensitive robust over the number of non-zero cost candidate seed examples: cost-sensitive robust error = 1 − { i ∈ [ N ] : J (cid:15) ( x i , g θ ( e y i − e j (cid:48) )) ≥ , ∀ j (cid:48) ∈ Ω y i } (cid:80) j | δj =1 N j , where A represents the cardinality of a set A , and N j is the total number of examples in class j .Next, we consider a more general case where C is a m × m real-valued cost matrix. Each entry of C isa non-negative real number, which represents the cost of the corresponding adversarial transformation.To take into account the different potential costs among adversarial examples, we measure the cost-sensitive robustness by the average certiﬁed cost of adversarial examples. The cost of an adversarialexample x in class j is deﬁned as the sum of all C jj (cid:48) such that J (cid:15) ( x , g θ ( e j − e j (cid:48) )) < . Intuitivelyspeaking, an adversarial example will induce more cost if it can be adversarially misclassiﬁed as moretarget classes with high cost. Accordingly, the robust cost is deﬁned as the total cost of adversarialexamples divided by the total number of valued seed examples: robust cost = (cid:80) j | δj =1 (cid:80) i | yi = j (cid:80) j (cid:48) ∈ Ω j C jj (cid:48) · (cid:0) J (cid:15) ( x i , g θ ( e j − e j (cid:48) )) < (cid:1)(cid:80) j | δj =1 N j , (3.1)where ( · ) denotes the indicator function.3.2 C OST -S ENSITIVE R OBUST O PTIMIZATION

Recall that our goal is to develop a classiﬁer with certiﬁed cost-sensitive robustness, as deﬁned in § J (cid:15) (cid:0) x , g θ ( e y − e y targ ) (cid:1) on Equation 2.6 and inspired by the cost-sensitive CE loss (Khan et al., 2018),we propose the following robust optimization with respect to a neural network classiﬁer f θ : minimize θ N (cid:88) i ∈ [ N ] L (cid:0) f θ ( x i ) , y i (cid:1) + α (cid:88) j ∈ [ m ] δ j N j (cid:88) i | yi = j log (cid:18) (cid:88) j (cid:48) ∈ Ω j C jj (cid:48) · exp (cid:0) − J (cid:15) ( x i , g θ ( e j − e j (cid:48) )) (cid:1)(cid:19) , (3.2)where α ≥ denotes the regularization parameter. The ﬁrst term in Equation 3.2 denotes thecross-entropy loss for standard classiﬁcation, whereas the second term accounts for the cost-sensitiverobustness. Compared with the overall robustness training objective function (2.7), we include aregularization parameter α to control the trade-off between classiﬁcation accuracy on original inputsand adversarial robustness.To provide cost-sensitivity, the loss function selectively penalizes the adversarial examples based ontheir cost. For binary cost matrixes, the regularization term penalizes every cost-sensitive adversarialexample equally, but has no impact for instances where C jj (cid:48) = 0 . For the real-valued costs, a largervalue of C jj (cid:48) increases the weight of the corresponding adversarial transformation in the trainingobjective. This optimization problem (3.2) can be solved efﬁciently using gradient-based algorithms,such as stochastic gradient descent and ADAM (Kingma & Ba, 2015). XPERIMENTS

We evaluate the performance of our cost-sensitive robustness training method on models for twobenchmark image classiﬁcation datasets: MNIST (LeCun et al., 2010) and CIFAR10 (Krizhevsky &Hinton, 2009). We compare our results for various cost scenarios with overall robustness training5ublished as a conference paper at ICLR 2019

Number of training epochs E rr o r r a t e

4% train robustvalidation robusttrain classificationvalidation classification (a) learning curves d i g i t d i g i t d i g i t d i g i t d i g i t d i g i t d i g i t d i g i t d i g i t d i g i t target class digit 0digit 1digit 2digit 3digit 4digit 5digit 6digit 7digit 8digit 9 s ee d c l a ss R o b u s t e rr o r r a t e (b) heatmap of robust test error Figure 1: Preliminary results on MNIST using overall robust classiﬁer: (a) learning curves of theclassiﬁcation error and overall robust error over the 60 training epochs; (b) heatmap of the robust testerror for pairwise class transformations based on the best trained classiﬁer.( § (cid:96) ∞ -norm ball.Our goal in the experiments is to evaluate how well a variety of different types of cost matrices canbe supported. MNIST and CIFAR-10 are toy datasets, thus there are no obvious cost matrices thatcorrespond to meaningful security applications for these datasets. Instead, we select representativetasks and design cost matrices to capture them.4.1 MNISTFor MNIST, we use the same convolutional neural network architecture (LeCun et al., 1998) as Wong& Kolter (2018), which includes two convolutional layers, with 16 and 32 ﬁlters respectively, and atwo fully-connected layers, consisting of 100 and 10 hidden units respectively. ReLU activations areapplied to each layer except the last one. For both our cost-sensitive robust model and the overallrobust model, we randomly split the 60,000 training samples into ﬁve folds of equal size, and trainthe classiﬁer over 60 epochs on four of them using the Adam optimizer (Kingma & Ba, 2015) withbatch size 50 and learning rate 0.001. We treat the remaining fold as a validation dataset for modelselection. In addition, we use the (cid:15) -scheduling and learning rate decay techniques, where we increase (cid:15) from . to the desired value linearly over the ﬁrst 20 epochs and decay the learning rate by 0.5every 10 epochs for the remaining epochs. Baseline: Overall Robustness.

Figure 1(a) illustrates the learning curves of both classiﬁcationerror and overall robust error during training based on robust loss (2.7) with maximum perturbationdistance (cid:15) = 0 . . The model with classiﬁcation error less than and minimum overall robust erroron the validation dataset is selected over the 60 training epochs. The best classiﬁer reaches . classiﬁcation error and . overall robust error on the 10,000 MNIST testing samples. We reportthe robust test error for every adversarial transformation in Figure 1(b) (for the model without anyrobustness training all of the values are ). The ( i, j ) -th entry is a bound on the robustness ofthat seed-target transformation—the fraction of testing examples in class i that cannot be certiﬁedrobust against transformation into class j for any (cid:15) norm-bounded attack. As shown in Figure 1(b),the vulnerability to adversarial transformations differs considerably among class pairs and appearscorrelated with perceptual similarity. For instance, only . of seeds in class 1 cannot be certiﬁedrobust for target class 9 compare to of seeds from class 9 into class 4. Binary Cost Matrix.

Next, we evaluate the effectiveness of cost-sensitive robustness training in producing models that aremore robust for adversarial transformations designated as valuable. We consider four types of tasksdeﬁned by different binary cost matrices that capture different sets of adversarial transformations:6ublished as a conference paper at ICLR 2019Table 1: Comparisons between different robust defense models on MNIST dataset against (cid:96) ∞ norm-bounded adversarial perturbations with (cid:15) = 0 . . The sparsity gives the number of non-zero entries inthe cost matrix over the total number of possible adversarial transformations. The candidates columnis the number of potential seed examples for each task. Task Description Sparsity Candidates Best α Classiﬁcation Error Robust Error baseline ours baseline ours single pair (0,2) 1/90 980 10.0 .

39% 2 .

68% 0 .

92% 0 . (6,5) 1/90 958 5.0 .

39% 2 .

49% 3 .

55% 0 . (4,9) 1/90 982 4.0 .

39% 3 .

00% 10 .

08% 1 . single seed digit 0 9/90 980 10.0 .

39% 3 .

48% 3 .

67% 0 . digit 2 9/90 1032 1.0 .

39% 2 .

91% 14 .

34% 3 . digit 8 9/90 974 0.4 .

39% 3 .

37% 22 .

28% 5 . single target digit 1 9/90 8865 4.0 .

39% 3 .

29% 2 .

23% 0 . digit 5 9/90 9108 2.0 .

39% 3 .

24% 3 .

10% 0 . digit 8 9/90 9026 1.0 .

39% 3 .

52% 5 .

24% 0 . multiple top 10 10/90 6024 0.4 .

39% 3 .

34% 11 .

14% 7 . random 10 10/90 7028 0.4 .

39% 3 .

18% 5 .

01% 2 . odd digit 45/90 5074 0.2 .

39% 3 .

30% 14 .

45% 9 . even digit 45/90 4926 0.1 .

39% 2 .

82% 13 .

13% 9 . Cost-sensitive robust error digit 9digit 8digit 7digit 6digit 5digit 4digit 3digit 2digit 1digit 0 baseline modelour model (a) single seed class

Cost-sensitive robust error digit 9digit 8digit 7digit 6digit 5digit 4digit 3digit 2digit 1digit 0 baseline modelour model (b) single target class

Figure 2: Cost-sensitive robust error using the proposed model and baseline model on MNIST fordifferent binary tasks: (a) treat each digit as the seed class of concern respectively; (b) treat each digitas the target class of concern respectively. single pair : particular seed class s to particular target class t ; single seed : particular seed class s to any target class; single target : any seed class to particular target class t ; and multiple : multipleseed and target classes. For each setting, the cost matrix is deﬁned as C ij = 1 if ( i, j ) is selected;otherwise, C ij = 0 . In general, we expect that the sparser the cost matrix, the more opportunitythere is for cost-sensitive training to improve cost-sensitive robustness over models trained for overallrobustness.For the single pair task, we selected three representative adversarial goals: a low vulnerability pair(0, 2), medium vulnerability pair (6, 5) and high vulnerability pair (4, 9). We selected these pairsby considering the robust error results on the overall-robustness trained model (Figure 1(b)) as arough measure for transformation hardness. This is generally consistent with intuitions about theMNIST digit classes (e.g., 9 and 4 look similar, so are harder to induce robustness against adversarialtransformation), as well as with the visualization results produced by dimension reduction techniques,such as t-SNE (Maaten & Hinton, 2008). 7ublished as a conference paper at ICLR 2019Table 2: Comparison results of different robust defense models for tasks with real-valued cost matrix. Dataset Task Sparsity Candidates Best α Classiﬁcation Error Robust Cost baseline ours baseline ours

MNIST small-large 45/90 10000 0.04 .

39% 3 .

47% 2 .

245 0 . MNIST large-small 45/90 10000 0.04 .

39% 3 .

13% 3 .

344 1 . CIFAR vehicle 40/90 4000 0.1 .

80% 26 .

19% 4 .

183 3 . Similarly, for the single seed and single target tasks we select three representative examples repre-senting low, medium, and high vulnerability to include in Table 1 and provide full results for allthe single-seed and single target tasks for MNIST in Figure 2. For the multiple transformationstask, we consider four variations: (i) the ten most vulnerable seed-target transformations; (ii) tenrandomly-selected seed-target transformations; (iii) all the class transformations from odd digit seedto any other class; (iv) all the class transformations from even digit seed to any other class.Table 1 summarizes the results, comparing the cost-sensitive robust error between the baseline modeltrained for overall robustness and a model trained using our cost-sensitive robust optimization. Thecost-sensitive robust defense model is trained with (cid:15) = 0 . based on loss function (3.2) and thecorresponding cost matrix C . The regularization parameter α is tuned via cross validation (seeAppendix A for details). We report the selected best α , classiﬁcation error and cost-sensitive robusterror on the testing dataset.Our model achieves a substantial improvement on the cost-sensitive robustness compared with thebaseline model on all of the considered tasks, with no signiﬁcant increases in normal classiﬁcationerror. The cost-sensitive robust error reduction varies from to , and is generally higher forsparse cost matrices. In particular, our classiﬁer reduces the number of cost-sensitive adversarialexamples from 198 to 12 on the single target task with digit 1 as the target class. Real-valued Cost Matrices.

Loosely motivated by a check forging adversary who obtains value bychanging the semantic interpretation of a number (Papernot et al., 2016), we consider two real-valuedcost matrices: small-large , where only adversarial transformations from a smaller digit class to alarger one are valued, and the cost of valued-transformation is quadratic with the absolute differencebetween the seed and target class digits: C ij = ( i − j ) if j > i , otherwise C ij = 0 ; large-small :only adversarial transformations from a larger digit class to a smaller one are valued: C ij = ( i − j ) if i > j , otherwise C ij = 0 . We tune α for the cost-sensitive robust model on the training MNISTdataset via cross validation, and set all the other parameters the same as in the binary case. Thecertiﬁed robust error for every adversarial transformation on MNIST testing dataset is shown inFigure 3, and the classiﬁcation error and robust cost are given in Table 2. Compared with themodel trained for overall robustness (Figure 1(b)), our trained classiﬁer achieves stronger robustnessguarantees on the adversarial transformations that induce costs, especially for those with larger costs.4.2 CIFAR10We use the same neural network architecture for the CIFAR10 dataset as Wong et al. (2018), withfour convolutional layers and two fully-connected layers. For memory and computational efﬁciency,we incorporate the approximation technique based on nonlinear random projection during the trainingphase (Wong et al. (2018), § randomly-selected trainingexamples, and tune the regularization parameter α according to the performance on the remainingexamples as validation dataset. The tasks are similar to those for MNIST ( § d i g i t d i g i t d i g i t d i g i t d i g i t d i g i t d i g i t d i g i t d i g i t d i g i t digit 0digit 1digit 2digit 3digit 4digit 5digit 6digit 7digit 8digit 9 (a) small-large d i g i t d i g i t d i g i t d i g i t d i g i t d i g i t d i g i t d i g i t d i g i t d i g i t digit 0digit 1digit 2digit 3digit 4digit 5digit 6digit 7digit 8digit 9 (b) large-small Figure 3: Heatmaps of robust test error using our cost-sensitive robust classiﬁer on MNIST for variousreal-valued cost tasks: (a) small-large ; (b) large-small .Table 3: Cost-sensitive robust models for CIFAR10 dataset against adversarial examples, (cid:15) = 2 / . Task Description Sparsity Candidates Best α Classiﬁcation Error Robust Error baseline ours baseline ours single pair (frog, bird) 1/90 1000 10.0 .

80% 27 .

88% 19 .

90% 1 . (cat, plane) 1/90 1000 10.0 .

80% 28 .

63% 9 .

30% 2 . single seed dog 9/90 1000 0.2 .

80% 30 .

69% 57 .

20% 28 . truck 9/90 1000 0.8 .

80% 31 .

55% 35 .

60% 15 . single target deer 9/90 9000 0.1 .

80% 26 .

69% 16 .

99% 3 . ship 9/90 9000 0.1 .

80% 24 .

80% 9 .

42% 3 . multiple A-V 24/90 6000 0.1 .

80% 26 .

65% 16 .

67% 7 . V-A 24/90 4000 0.2 .

80% 27 .

60% 12 .

07% 8 . Table 3 shows results on the testing data based on different robust defense models with (cid:15) = 2 / .For all of the aforementioned tasks, our models substantially reduce the cost-sensitive robust errorwhile keeping a lower classiﬁcation error than the baseline.For the real-valued task, we are concerned with adversarial transformations from seed examplesin vehicle classes to other target classes. In addition, more cost is placed on transformations fromvehicle to animal, which is 10 times larger compared with that from vehicle to vehicle. Figures 4(a)and 4(b) illustrate the pairwise robust test error using overall robust model and the proposed classiﬁerfor the aforementioned real-valued task on CIFAR10.4.3 V ARYING A DVERSARY S TRENGTH

We investigate the performance of our model against different levels of adversarial strength byvarying the value of (cid:15) that deﬁnes the (cid:96) ∞ ball available to the adversary. Figure 5 show the overallclassiﬁcation and cost-sensitive robust error of our best trained model, compared with the baselinemodel, on the MNIST single seed task with digit 9 and CIFAR single seed task with dog as the seedclass of concern, as we vary the maximum (cid:96) ∞ perturbation distance.Under all the considered attack models, the proposed classiﬁer achieves better cost-sensitive adversar-ial robustness than the baseline, while maintaining similar classiﬁcation accuracy on original datapoints. As the adversarial strength increases, the improvement for cost-sensitive robustness overoverall robustness becomes more signiﬁcant. 9ublished as a conference paper at ICLR 2019 p l a n e c a r b i r d c a t d ee r d o g f r o g h o r s e s h i p t r u c k planecarbirdcatdeerdogfroghorseshiptruck (a) baseline model p l a n e c a r b i r d c a t d ee r d o g f r o g h o r s e s h i p t r u c k planecarbirdcatdeerdogfroghorseshiptruck (b) our model Figure 4: Heatmaps of robust test error for the real-valued task on CIFAR10 using different robustclassiﬁers: (a) baseline model; (b) our proposed cost-sensitive robust model. = 0.1 = 0.15 = 0.2 = 0.25 maximum perturbation distance e rr o r r a t e b a s e d o n d i ff e r e n t m o d e l s classification error (baseline)classification error (ours)robust error (baseline)robust error (ours) (a) MNIST = 2/255 = 4/255 = 6/255 maximum perturbation distance e rr o r r a t e b a s e d o n d i ff e r e n t m o d e l s classification error (baseline)classification error (ours)robust error (baseline)robust error (ours) (b) CIFAR10 Figure 5: Results for different adversary strengths, (cid:15) , for different settings: (a) MNIST single seedtask with digit 9 as the chosen class; (b) CIFAR10 single seed task with dog as the chosen class.

ONCLUSION

By focusing on overall robustness, previous robustness training methods expend a large fraction of thecapacity of the network on unimportant transformations. We argue that for most scenarios, the actualharm caused by an adversarial transformation often varies depending on the seed and target class,so robust training methods should be designed to account for these differences. By incorporating acost matrix into the training objective, we develop a general method for producing a cost-sensitiverobust classiﬁer. Our experimental results show that our cost-sensitive training method works acrossa variety of different types of cost matrices, so we believe it can be generalized to other cost matrixscenarios that would be found in realistic applications.There remains a large gap between the small models and limited attacker capabilities for whichwe can achieve certiﬁable robustness, and the complex models and unconstrained attacks that maybe important in practice. The scalability of our techniques is limited to the toy models and simpleattack norms for which certiﬁable robustness is currently feasible, so considerable process is neededbefore they could be applied to realistic scenarios. However, we hope that considering cost-sensitiverobustness instead of overall robustness is a step towards achieving more realistic robustness goals.10ublished as a conference paper at ICLR 2019A

VAILABILITY

Our implementation, including code for reproducing all our experiments, is available as open sourcecode at https://github.com/xiaozhanguva/Cost-Sensitive-Robustness .A CKNOWLEDGEMENTS

We thank Eric Wong for providing the implementation of certiﬁed robustness we built on, as well asfor insightful discussions. We thank Jianfeng Chi for helpful advice on implementing our experiments.This work was supported by grants from the National Science Foundation ( R EFERENCES

Kaiser Asif, Wei Xing, Sima Behpour, and Brian D Ziebart. Adversarial cost-sensitive classiﬁcation.In st Conference on Uncertainty in Artiﬁcial Intelligence , 2015.Anish Athalye, Nicholas Carlini, and David Wagner. Obfuscated gradients give a false sense ofsecurity: Circumventing defenses to adversarial examples. In

International Conference on MachineLearning , 2018.Mariusz Bojarski, Davide Del Testa, Daniel Dworakowski, Bernhard Firner, Beat Flepp, PrasoonGoyal, Lawrence D. Jackel, Mathew Monfort, Urs Muller, Jiakai Zhang, Xin Zhang, Jake Zhao,and Karol Zieba. End to end learning for self-driving cars. arXiv preprint arXiv:1604.07316 , 2016.Nicholas Carlini and David Wagner. Towards evaluating the robustness of neural networks. In

IEEESymposium on Security and Privacy , 2017.Nilesh Dalvi, Pedro Domingos, Sumit Sanghai, Deepak Verma, et al. Adversarial classiﬁcation. In th ACM SIGKDD Conference on Knowledge Discovery and Data Mining , 2004.Pedro Domingos. Metacost: A general method for making classiﬁers cost-sensitive. In

Fifth ACMSIGKDD International Conference on Knowledge Discovery and Data Mining , 1999.Tommaso Dreossi, Somesh Jha, and Sanjit A Seshia. Semantic adversarial deep learning. In

International Conference on Computer Aided Veriﬁcation , 2018.Charles Elkan. The foundations of cost-sensitive learning. In

International Joint Conference onArtiﬁcial Intelligence , 2001.Logan Engstrom, Dimitris Tsipras, Ludwig Schmidt, and Aleksander Madry. A rotation and atranslation sufﬁce: Fooling CNNs with simple transformations. arXiv preprint arXiv:1712.02779 ,2017.Ian Goodfellow, Jonathon Shlens, and Christian Szegedy. Explaining and harnessing adversarialexamples. In

International Conference on Learning Representations , 2015.Matthias Hein and Maksym Andriushchenko. Formal guarantees on the robustness of a classiﬁeragainst adversarial manipulation. In

Advances in Neural Information Processing Systems , 2017.Can Kanbak, Seyed-Mohsen Moosavi-Dezfooli, and Pascal Frossard. Geometric robustness of deepnetworks: analysis and improvement. In

Computer Vision and Pattern Recognition , 2018.Salman H Khan, Munawar Hayat, Mohammed Bennamoun, Ferdous A Sohel, and Roberto Togneri.Cost-sensitive learning of deep feature representations from imbalanced data.

IEEE Transactionson Neural Networks and Learning Systems , 29(8):3573–3587, 2018.Diederik P. Kingma and Jimmy Ba. Adam: A method for stochastic optimization. In

InternationalConference on Learning Representations , 2015.Alex Krizhevsky and Geoffrey Hinton. Learning multiple layers of features from tiny images.Technical report, University of Toronto, 2009.11ublished as a conference paper at ICLR 2019Matjaˇz Kukar and Igor Kononenko. Cost-sensitive learning with neural networks. In , 1998.Yann LeCun, L´eon Bottou, Yoshua Bengio, and Patrick Haffner. Gradient-based learning applied todocument recognition.

Proceedings of the IEEE , 86(11):2278–2324, 1998.Yann LeCun, Corinna Cortes, and CJ Burges. MNIST handwritten digit database. http://yann.lecun.com/exdb/mnist , 2010.Xu-Ying Liu and Zhi-Hua Zhou. The inﬂuence of class imbalance on cost-sensitive learning: Anempirical study. In

Sixth International Conference on Data Mining , 2006.Laurens van der Maaten and Geoffrey Hinton. Visualizing data using t-SNE.

Journal of MachineLearning Research , 2008.Saeed Mahloujifar, Dimitrios I Diochnos, and Mohammad Mahmoody. The curse of concentration inrobust learning: Evasion and poisoning attacks from concentration of measure. In

AAAI Conferenceon Artiﬁcial Intelligence , 2019.Nicolas Papernot, Patrick McDaniel, Xi Wu, Somesh Jha, and Ananthram Swami. Distillation as adefense to adversarial perturbations against deep neural networks. In

IEEE Symposium on Securityand Privacy , 2016.Omkar M Parkhi, Andrea Vedaldi, and Andrew Zisserman. Deep face recognition. In

British MachineVision Conference , 2015.Aditi Raghunathan, Jacob Steinhardt, and Percy Liang. Certiﬁed defenses against adversarialexamples. In

International Conference on Learning Representations , 2018.Joshua Saxe and Konstantin Berlin. Deep neural network based malware detection using twodimensional binary program features. In , 2015.Mahmood Sharif, Lujo Bauer, and Michael K Reiter. On the suitability of l p -norms for creatingand preventing adversarial examples. In CVPR Workshop on Bright and Dark Sides of ComputerVision: Challenges and Opportunities for Privacy and Security , 2018.Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian Goodfellow,and Rob Fergus. Intriguing properties of neural networks. In

International Conference on LearningRepresentations , 2014.Florian Tram`er, Alexey Kurakin, Nicolas Papernot, Ian Goodfellow, Dan Boneh, and Patrick Mc-Daniel. Ensemble adversarial training: Attacks and defenses. In

International Conference onLearning Representations , 2018.Shiqi Wang, Kexin Pei, Justin Whitehouse, Junfeng Yang, and Suman Jana. Formal security analysisof neural networks using symbolic intervals. In

USENIX Security Symposium , 2018.Eric Wong and Zico Kolter. Provable defenses against adversarial examples via the convex outeradversarial polytope. In

International Conference on Machine Learning , 2018.Eric Wong, Frank Schmidt, Jan Hendrik Metzen, and J Zico Kolter. Scaling provable adversarialdefenses. In

Conference on Neural Information Processing Systems , 2018.Chaowei Xiao, Jun-Yan Zhu, Bo Li, Warren He, Mingyan Liu, and Dawn Song. Spatially transformedadversarial examples. In

International Conference on Learning Representations , 2018.Bianca Zadrozny, John Langford, and Naoki Abe. Cost-sensitive learning by cost-proportionateexample weighting. In

Third IEEE International Conference on Data Mining , 2003.Zhi-Hua Zhou and Xu-Ying Liu. On multi-class cost-sensitive learning.

Computational Intelligence ,26(3):232–257, 2010. 12ublished as a conference paper at ICLR 2019

Cost-Sensitive Robustness against Adversarial Examples

Supplemental Materials

A P

ARAMETER T UNING

For experiments on the MNIST dataset, we ﬁrst perform a coarse tuning on regularization parameter α with searching grid { − , − , , , } , and select the most appropriate one, denoted by α coarse , with overall classiﬁcation error less than and the lowest cost-sensitive robust error onvalidation dataset. Then, we further ﬁnely tune α from the range { − , − , − , , , , } · α coarse , and choose the best robust model according to the same criteria.Figures 6(a) and 6(b) show the learning curves for task B with digit 9 as the selected seed class basedon the proposed cost-sensitive robust model with varying α (we show digit 9 because it is one of themost vulnerable seed classes). The results suggest that as the value of α increases, the correspondingclassiﬁer will have a lower cost-sensitive robust error but a higher classiﬁcation error, which is whatwe expect from the design of (3.2).We observe similar trends for the learning curves for the other tasks, so do not present them here. Forthe CIFAR10 experiments, a similar tuning strategy is implemented. The only difference is that weuse as the threshold of overall classiﬁcation error for selecting the best α . Number of training epochs C l a ss i f i c a t i o n e rr o r

4% alpha=0.01alpha=0.1alpha=1.0alpha=10.0 (a) classiﬁcation error

Number of training epochs C o s t - s e n s i t i v e r o b u s t e rr o r alpha=0.01alpha=0.1alpha=1.0alpha=10.0 (b) cost-sensitive robust error Figure 6: Learning curves for single seed task with digit 9 as the selected seed class on MNIST usingthe proposed model with varying α : (a) learning curves of classiﬁcation error; (b) learning curves ofcost-sensitive robust error. The maximum perturbation distance is speciﬁed as (cid:15) = 0 . . B C

OMPARISON WITH S TANDARD C OST -S ENSITIVE C LASSIFIER

As discussed in Section 2.4, prior work on cost-sensitive learning mainly focuses on the non-adversarial setting. In this section, we investigate the robustness of the cross-entropy based cost-sensitive classiﬁer proposed in Khan et al. (2018), and compare the performance of their classiﬁerwith our proposed cost-sensitive robust classiﬁer. Given a set of training examples { ( x i , y i ) } Ni =1 and cost matrix C with each entry representing the cost of the corresponding misclassiﬁcation, theevaluation metric for cost-sensitive learning is deﬁned as the average cost of misclassiﬁcations, ormore concretely misclassﬁcation cost = 1 N (cid:88) i ∈ [ N ] C y i (cid:98) y i , where (cid:98) y i = argmax j ∈ [ m ] [ f θ ( x i )] j , where m is the total number of class labels and f θ ( · ) denotes the neural network classiﬁer asintroduced in Section 2.1. In addition, the cross-entropy based cost-sensitive training objective takesthe following form: minimize θ N (cid:88) j ∈ [ m ] (cid:88) i | yi = j log (cid:18) (cid:88) j (cid:48) (cid:54) = j C jj (cid:48) · exp (cid:0) [ f θ ( x i )] j (cid:48) − [ f θ ( x i )] j (cid:1)(cid:19) . (B.1)13ublished as a conference paper at ICLR 2019Table 4: Comparison results of different trained classiﬁers for small-large real-valued task on MNISTwith maximum perturbation distance (cid:15) = 0 . . Classiﬁer Classiﬁcation Error Misclassiﬁcation Cost Robust CostBaseline

Cost-Sensitive Standard

Overall Robustness

Cost-Sensitive Robustness C is designedas C ij = 0 . , if i > j ; C ij = 0 , if i = j ; C ij = ( i − j )2