[PDF] Adversarial Self-Supervised Contrastive Learning

Abstract

Existing adversarial learning approaches mostly use class labels to generate adversarial samples that lead to incorrect predictions, which are then used to augment the training of the model for improved robustness. While some recent works propose semi-supervised adversarial learning methods that utilize unlabeled data, they still require class labels. However, do we really need class labels at all, for adversarially robust training of deep neural networks? In this paper, we propose a novel adversarial attack for unlabeled data, which makes the model confuse the instance-level identities of the perturbed data samples. Further, we present a self-supervised contrastive learning framework to adversarially train a robust neural network without labeled data, which aims to maximize the similarity between a random augmentation of a data sample and its instance-wise adversarial perturbation. We validate our method, Robust Contrastive Learning (RoCL), on multiple benchmark datasets, on which it obtains comparable robust accuracy over state-of-the-art supervised adversarial learning methods, and significantly improved robustness against the black box and unseen types of attacks. Moreover, with further joint fine-tuning with supervised adversarial loss, RoCL obtains even higher robust accuracy over using self-supervised learning alone. Notably, RoCL also demonstrate impressive results in robust transfer learning.

Full PDF

AAdversarial Self-Supervised Contrastive Learning

Minseon Kim , Jihoon Tack , Sung Ju Hwang , KAIST , AITRICS {minseonkim, jihoontack, sjhwang82}@kaist.ac.kr Abstract

Existing adversarial learning approaches mostly use class labels to generate adver-sarial samples that lead to incorrect predictions, which are then used to augment thetraining of the model for improved robustness. While some recent works proposesemi-supervised adversarial learning methods that utilize unlabeled data, they stillrequire class labels. However, do we really need class labels at all , for adversariallyrobust training of deep neural networks? In this paper, we propose a novel adver-sarial attack for unlabeled data, which makes the model confuse the instance-levelidentities of the perturbed data samples. Further, we present a self-supervised con-trastive learning framework to adversarially train a robust neural network withoutlabeled data, which aims to maximize the similarity between a random augmenta-tion of a data sample and its instance-wise adversarial perturbation. We validateour method,

Robust Contrastive Learning (RoCL) , on multiple benchmark datasets,on which it obtains comparable robust accuracy over state-of-the-art supervisedadversarial learning methods, and signiﬁcantly improved robustness against the black box and unseen types of attacks. Moreover, with further joint ﬁne-tuningwith supervised adversarial loss, RoCL obtains even higher robust accuracy overusing self-supervised learning alone. Notably, RoCL also demonstrate impressiveresults in robust transfer learning.

The vulnerability of neural networks to imperceptibly small perturbations [1] has been a crucialchallenge in deploying them to safety-critical applications, such as autonomous driving. Variousstudies have been proposed to ensure the robustness of the trained networks against adversarialattacks [2, 3, 4], random noise [5], and corruptions [6, 7]. Perhaps the most popular approach toachieve adversarial robustness is adversarial learning, which trains the model with samples perturbedto maximize the loss on the target model. Starting from Fast Gradient Sign Method [8] which apply aperturbation in the gradient direction, to Projected Gradient Descent [9] that maximizes the loss overiterations, and TRADES [2] that trades-off clean accuracy and adversarial robustness, adversariallearning has evolved substantially over the past few years. However, conventional methods withadversarial learning all require class labels to generate adversarial attacks.Recently, self-supervised learning [10, 11, 12, 13, 14], which trains the model on unlabeled data ina supervised manner by utilizing self-generated labels from the data itself, has become popular asmeans of learning representations for deep neural networks. For example, prediction of the rotationangles [10], and solving randomly generated Jigsaw puzzles [11] are examples of such self-supervisedlearning methods. Recently, instance-level identity preservation [12, 13] with contrastive learninghas shown to be very effective in learning the rich representations for classiﬁcation. Contrastiveself-supervised learning frameworks such as [12, 13, 14] basically aim to maximize the similarity ofa sample to its augmentation, while minimizing its similarity to other instances.In this work, we propose a contrastive self-supervised learning framework to train an adversariallyrobust neural network without any class labels. Our intuition is that we can fool the model by generat-

Preprint. Under review. a r X i v : . [ c s . L G ] J un a) Instance-wise adversarial examples (b) RoCL: Robust contrastive learning (c) RoCL: Robust contrastivelearningFigure 1: Overview of our adversarial contrastive self-supervised learning. (a) We generate instance-wiseadversarial examples from an image transformed using a stochastic augmentation, which makes the modelconfuse the instance-level identity of the perturbed sample. (b) We then maximize the similarity between eachtransformed sample and their instance-wise adversaries using contrastive learning. (c) After training, eachsample will have signiﬁcantly reduced adversarial vulnerability in the latent representation space. ing instance-wise adversarial examples (See Figure 1(a)). Speciﬁcally, we generate perturbations onaugmentations of the samples to maximize their contrastive loss, such that the instance-level classiﬁerbecomes confused about the identities of the perturbed samples. Then, we maximize the similaritybetween clean samples and their adversarial counterparts using contrastive learning (Figure 1(b)), toobtain representations that suppress distortions caused by adversarial perturbations. This will resultin learning representations that are robust against adversarial attacks (Figure 1(c)).We refer to this novel adversarial self-supervised learning method as

Robust Contrastive Learning(RoCL) . To the best of our knowledge, this is the ﬁrst attempt to train robust neural networks withoutany labels , and to generate instance-wise adversarial examples. Recent works on semi-supervisedadversarial learning [15, 16] or self-supervised adversarial learning [17] still require labeled instancesto generate pseudo-labels on unlabeled instances or class-wise attacks for adversarial training, andthus cannot be considered as fully-unsupervised adversarial learning approaches.To verify the efﬁcacy of the proposed RoCL, we suggest a robust-linear evaluation for self-supervisedadversarial learning and validate our method on benchmark datasets (CIFAR-10 and CIFAR-100)against supervised adversarial learning approaches. The results show that RoCL obtains comparableaccuracy to strong supervised adversarial learning methods such as TRADES [2], although it does notuse any labels during training. Further, when we extend the method to utilize class labels to ﬁne-tunethe network trained on RoCL with class-adversarial loss, we achieve even stronger robustness, without losing accuracy when clean samples. Moreover, we verify our rich robust representation with transferlearning which shows impressive performance. In sum, the contributions of this paper are as follows: • We propose a novel instance-wise adversarial perturbation method which does not requireany labels, by making the model confuse its instance-level identity. • We propose a adversarial self-supervised learning method to explicitly suppress the vul-nerability in the representation space by maximizing the similarity between clean examplesand their instance-wise adversarial perturbations. • Our method obtains comparable robustness to supervised adversarial learning approacheswithout using any class labels on the target attack type, while achieving signiﬁcantly better clean accuracy and robustness on unseen type of attacks and transfer learning.

Adversarial robustness

Obtaining deep neural networks that are robust to adversarial attacks hasbeen an active topic of research since Szegedy et al.[1] ﬁrst showed their fragility to imperceptibledistortions. Goodfellow et al.[8] proposed the fast gradient sign method (FGSM), which perturbs atarget sample to its gradient direction, to increase its loss, and also use the generated samples to trainthe model for improved robustness. Follow-up works [9, 18, 19, 20] proposed iterative variants of thegradient attack with improved adversarial learning frameworks. After these gradient-based attackshave become standard in evaluating the robustness of deep neural networks, many more defenses2ollowed, but Athalye et al. [21] showed that many of them appear robust only because they mask outthe gradients, and proposed new types of attacks that circumvent gradient obfuscation. Recent worksfocus on the vulnerability of the latent representations, hypothesizing them as the main cause of theadversarial vulnerability of deep neural networks. TRADES [2] uses Kullback-Leibler divergenceloss between a clean example and its adversarial counterpart to push the decision boundary, to obtaina more robust latent space. Ilyas et al. [22] showed the existence of imperceptible features that helpwith the prediction of clean examples but are vulnerable to adversarial attacks. On the other hand,instead of defending the adversarial attacks, guarantee the robustness become one of the solutions tothe safe model. Li et al.[23], "randomized smoothing" technique has been empirically proposed ascertiﬁed robustness. Then, Cohen et al. [24], prove the robustness guarantee of randomized smoothingin (cid:96) norm adversarial attack. Moreover, to improve the performance of randomized smoothing[25] directly attack the smoothed classiﬁer. A common requirement of existing adversarial learningtechniques is the availability of class labels, since they are essential in generating adversarial attacks.Recently, semi-supervised adversarial learning [15, 16] approaches have proposed to use unlabeleddata and achieved large enhancement in adversarial robustness. Yet, they still require a portion oflabeled data, and does not change the class-wise nature of the attack. Contrarily, in this work, wepropose instance-wise adversarial attacks that do not require any class labels. Self-supervised learning

As acquiring manual annotations on data could be costly, self-supervisedlearning, which generates supervised learning problems out of unlabeled data and solves for them, isgaining increasingly more popularity. The convention is to train the network to solve a manually-deﬁned (pretext) task for representation learning, which will be later used for a speciﬁc supervisedlearning task (e.g., image classiﬁcation). Predicting the relative location of the patches of images[26, 27, 11] has shown to be a successful pretext task, which opened the possibility of self-supervisedlearning. Gidaris et al. [10] propose to learn image features by training deep networks to recognizethe 2D rotation angles, which largely outperforms previous self-supervised learning approaches.Corrupting the given images with gray-scaling [28] and random cropping [29], then restoring themto their original condition, has also shown to work well. Recently, leveraging the instance-levelidentity is becoming a popular paradigm for self-supervised learning due to its generality. Using thecontrastive loss between two different transformed images from one identity [12, 13, 30] have shownto be highly effective in learning the rich representations, which achieve comparable performance tofully-supervised models. Moreover, even with the labels, the contrastive loss leverage the performanceof the model than using the cross-entropy loss [31].

Self-supervised learning and adversarial robustness

Recent works have shown that using unla-beled data could help the model to obtain more robust representations [15]. Moreover, [32] showsthat a model trained with self-supervision improves the robustness. Even ﬁnetuning the pretrainedself-supervised learning helps the robustness [17], and self-supervised adversarial training coupledwith the K-Nearest Neighbour classiﬁcation improves the robustness of KNN [33]. However, to thebest of our knowledge, none of these previous works explicitly target for adversarial robustness onunlabeled training. Contrarily, we propose a novel instance-wise attack, which leads the model topredict an incorrect instance for an instance-discrimination problem. This allows the trained model toobtain robustness that is on par or even better than supervised adversarial learning methods.

We now describe how to obtain adversarial robustness in the representations without any class labels,using instance-wise attacks and adversarial self-supervised contrastive learning. Before describingours, we ﬁrst brieﬂy describe supervised adversarial training and self-supervised contrastive learning.

Adversarial robustness

We start with the deﬁnition of adversarial attacks under supervised settings.Let us denote the dataset D = { X, Y } , where x ∈ X is training sample and y ∈ Y is a correspondinglabel, and a supervised learning model f θ : X → Y where θ is parameters of the model. Given sucha dataset and a model, adversarial attacks aim towards ﬁnding the worst-case examples nearby bysearching for the perturbation, which maximizes the loss within a certain radius from the sample(e.g., norm balls). We can deﬁne such adversarial attacks as follows: x i +1 = Π B ( x,(cid:15) ) ( x i − α sign ( ∇ x i L CE ( θ, x i , y )) (1)where B ( x, (cid:15) ) is the (cid:96) ∞ norm-ball around x with radius (cid:15) , and Π is the projection function fornorm-ball. The α is the step size of the attacks and sign ( · ) returns the sign of the vector. Further, L CE is the cross-entropy loss for supervised training, and i is the number of attack iterations. This3ormulation generalizes across different types of gradient attacks. For example, Projected GradientDescent (PGD) [9] starts from a random point within the x ± (cid:15) and perform i gradient steps, to obtainan attack x i +1 .The simplest and most straightforward way to defend against such adversarial attacks is to minimizethe loss of adversarial examples, which is often called adversarial learning . The adversarial learningframework proposed by Madry et al.[9] solve the following non-convex outer minimization problemand non-convex inner maximization problem where δ is the perturbation of the adversarial images,and x + δ is an adversarial example x adv , as follow: argmin θ E ( x,y ) ∼ D [ max δ ∈ B ( x,(cid:15) ) L CE ( θ, x + δ, y )] (2)In standard adversarial learning framework, including PGD [9], TRADES [2], and many others,generating such adversarial attacks require to have a class label y ∈ Y . Thus, conventional adversarialattacks are inapplicable to unlabeled data. Self-supervised contrastive learning

The self-supervised contrastive learning framework [12, 13]aims to maximize the agreement between different augmentations of the same instance in the learnedlatent space while minimizing the agreement between different instances. Let us deﬁne some notionsand brieﬂy recap the SimCLR. To project the image into a latent space, SimCLR uses an encoder f θ ( · ) network followed by a projector, which is a two-layer multi-layer perceptron (MLP) g π ( · ) thatprojects the features into latent vector z . SimCLR uses a stochastic data augmentation t , randomlyselected from the family of augmentations T , including random cropping, random color distortion,and random Gaussian blur. Applying any two transformations, t , t (cid:48) ∼ T , will yield two samplesdenoted t ( x ) and t (cid:48) ( x ) , that are different in appearance but retains the instance-level identity of thesample. We deﬁne t ( x ) ’s positive set as { x pos } = t (cid:48) ( x ) from the same original sample x , while thenegative set { x neg } as the set of pairs containing the other instances x (cid:48) . Then, the contrastive lossfunction L con can be deﬁned as follows: L con ,θ,π ( x, { x pos } , { x neg } ) := − log (cid:80) { z pos } exp(sim(z , { z pos } ) /τ ) (cid:80) { z pos } exp(sim(z , { z pos } ) /τ ) + (cid:80) { z neg } exp(sim(z , z neg ) /τ ) , (3)where z , { z pos } , and { z neg } are corresponding 128-dimensional latent vectors ( z ) of x obtained bythe encoder and projector z = p ( f θ ( x )) , { x pos } , and { x neg } , respectively. The standard contrastivelearning only contains a single sample in the positive set { pos } , which is t ( x ) . The sim( u, v ) = u T v/ (cid:107) u (cid:107)(cid:107) v (cid:107) denote cosine similarity between two vectors and τ is a temperature parameter.We show that standard contrastive learning, such as SimCLR, is vulnerable to the adversarial attacksas shown in Table 1. To achieve robustness with such self-supervised contrastive learning frameworks,we need a way to adversarially train them, which we will describe in the next subsection. We now introduce a simple yet novel and effective approach to adversarially train a self-supervisedlearning model, using unlabeled data, which we coin as robust contrastive learning (RoCL) . RoCLis trained without a class label by using instance-wise attacks, which makes the model confusethe instance-level identity of a given sample. Then, we use a contrastive learning framework tomaximize the similarity between a transformed example and the instance-wise adversarial exampleof another transformed example. Algorithm 1 summarizes our robust contrastive learning frameworkin supplementary B.

Instance-wise adversarial attacks

Since class-wise adversarial attacks for existing approaches areinapplicable to the unlabeled case we target, we propose a novel instance-wise attack. Speciﬁcally,given a sample of an input instance, we generate a perturbation to fool the model by confusing itsinstance-level identity ; such that it mistakes it as an another sample. This is done by generatinga perturbation that maximizes the self-supervised contrastive loss for discriminating between theinstances, as follows: t ( x ) i +1 = Π B ( t ( x ) ,(cid:15) ) ( t ( x ) i − α sign ( ∇ t ( x ) i L con ,θ,π ( t ( x ) i , { t (cid:48) ( x ) } , { t ( x ) neg } )) (4)where t ( x ) and t (cid:48) ( x ) are transformed images with stochastic data augmentations t, t (cid:48) ∼ T , and t ( x ) neg are the negative instances for t ( x ) , which are examples of other samples x (cid:48) .4 a) Robust contrastive learning training (b) Linear evaluation. (Adversarial training op-tional)Figure 2: Adversarial training and evaluation steps for RoCL.

During adversarial training, we maximizethe similarity between two differently transformed examples { t ( x ) , t (cid:48) ( x ) } and their adversarial perturbations t ( x ) adv . After the model is fully trained to obtain robustness, then we evaluate the model on the targetclassiﬁcation task by using linear model instead of projector. Here, we could either train the linear classiﬁer onlyon clean examples, or adversarially train it with class-adversarial examples. Robust Contrastive Learning (RoCL)

We now present a framework to learn robust representationvia self-supervised contrastive learning. The adversarial learning objective for an instance-wise attack,following the min-max formulation of [9] could be given as follows: argmin θ,π E ( x ) ∼ D [ max δ ∈ B ( t ( x ) ,(cid:15) ) L con,θ,π ( t ( x ) + δ, { t (cid:48) ( x ) } , { t ( x ) neg } )] (5)where t ( x ) + δ is the adversarial image t ( x ) adv generated by instance-wise attacks (eq. 4). Notethat we generate the adversarial example of x using a stochastically transformed image t ( x ) , ratherthan the original image x , which will allow us to generate diverse attack samples. This adversariallearning framework is essentially the same as the supervised adversarial learning framework, exceptthat we train the model to be robust against m-way instance-wise adversarial attacks. Note that theproposed regularization can be interpreted as a denoiser. Since the contrastive objective maximizethe similarity between clean samples: t ( x ) , t (cid:48) ( x ) , and generated adversarial example, t ( x ) adv .We generate label-free adversarial examples using instance-wise adversarial attacks in equation 4.Then we use the contrastive learning objective to maximize the similarity between clean examplesand their instance-wise perturbation. This is done using a simple modiﬁcation of the contrastivelearning objective in Eq. 3, by using the instance-wise adversarial examples as additional elements inthe positive set. Then we can formulate our Robust Contrastive Learning objective as follow: L RoCL ,θ,π ( t ( x ) , { t (cid:48) ( x ) , t ( x ) adv } , { t ( x ) neg } ):= − log (cid:80) { z pos } exp(sim(z , { z pos } ) /τ ) (cid:80) { z pos } exp(sim(z , { z pos } ) /τ ) + (cid:80) { z neg } exp(sim(z , { z neg } ) /τ ) , (6)where t ( x ) adv are the adversarial perturbation of an augmented sample t ( x ) , and t (cid:48) ( x ) is anotherstochastic augmentation. The { z pos } , which is the set of positive samples in the latent feature space,is compose of z (cid:48) and z adv which are latent vectors of t (cid:48) ( x ) and t ( x ) adv respectively. The { z neg } isthe set of latent vectors for negative samples in { t ( x ) neg } . Linear Evaluation of RoCL

With RoCL, we can adversarially train the model without any classlabels (Figure 2a). Yet, since the model is trained for instance-wise classiﬁcation, it cannot be directlyused for class-level classiﬁcation. Thus, existing self-supervised learning models leverage linearevaluation [28, 34, 35, 12], which learns a linear layer l ψ ( · ) on top of the ﬁxed f θ ( · ) embedding layer(Figure 2b) with clean examples. While RoCL achieves impressive robustness with this standardevaluation (Table 1), to properly evaluate the robustness against a speciﬁc type of attack, we proposea new evaluation protocol which we refer to as robust-linear evaluation (r-LE) . r-LE trains a linear classiﬁer with class-level adversarial examples of speciﬁc attack (e.g. (cid:96) ∞ ) with the ﬁxed encoder asfollows: argmin ψ E ( x,y ) ∼ D [ max δ ∈ B ( x,(cid:15) ) L CE ( ψ, x + δ, y )] (7)where L CE is the cross-entropy that only optimize parameters of linear model ψ . While we proposer-LE as an evaluation measure, it could be also used as an efﬁcient means of obtaining an adversariallyrobust network using network pretrained using self-supervised learning.5able 1: Experimental results with white box attacks on ResNet18 and ResNet50 trained on the CIFAR-10 .r-LE denotes robust linear evaluation, and SCL is the supervised contrastive learning [31] which uses the labelsin the contrastive loss. Baselines with * are the models with our data augmentation applied during training. ATdenotes the supervised adversarial training[9], and SS denotes the self-supervised loss. tInf is the test inferenceby the transformed smoothed classiﬁer with 30 iterations. Rot+pretrained is the model [17] which ﬁnetunes thenetwork trained with rotation-prediction self-supervised learning. For a fair comparison, we report the singleself-supervised model pretrained version with the ResNet50-v2 model. + is the reported performance of [17].All models are trained with (cid:96) ∞ ; thus the (cid:96) ∞ is the seen adversarial attack and (cid:96) , and (cid:96) attacks are unseen . Traintype Method ResNet18 ResNet50 A nat seen unseen A nat seen unseen (cid:96) ∞ (cid:96) (cid:96) (cid:96) ∞ (cid:96) (cid:96) (cid:15) (cid:15) L CE [ ] [ ] TRADES* [ ] [ ] [ ] [ ] RoCL [ ] - - - - - - - + + - - - - - RoCL +AT 80.26 40.77 +TRADES 84.55 43.85 14.29

RoCL +AT+SS tInf

RoCL+tInf 84.11

Transformation smoothed inference

We further propose a simple inference method for robustrepresentation. Previous works [25, 24] proposed smoothed classiﬁers , which obtain smooth decisionboundaries for the ﬁnal classiﬁer by taking an expectation over classiﬁers with Gaussian noise per-turbed samples. This method addresses the problem with sharp classiﬁers obtained using supervisedlearning, which may result in misclassiﬁcation of the points even with small perturbations. Inspiredby this, we observe that our objective enforces to assemble all differently transformed images into theadjacent area, and propose a novel transformation smoothed classiﬁer for RoCL, which predicts theclass c by calculating expectation E over the transformation t ∼ T for a given input x as follows: S ( x ) = argmax c ∈ Y E t ∼T ( l ( f ( t ( x ))) = c ) (8) We verify the efﬁcacy of our RoCL in various settings against both supervised adversarial learningmethods, and self-supervised pretraining with adversarial ﬁnetuning. We report the results of white-box and black-box attacks in Table 1 and 2, respectively, under ResNet18, ResNet50 [36] trained onCIFAR-10 [37]. We evaluate multiple versions of our model on diverse scenarios : models trainedwith RoCL only, with self-supervised learning only (RoCL, RoCL+rLE), models that use RoCL forpretraining and perform further standard adversarial training with class-wise attacks (RoCL+AT,RoCL+TRADES, RoCL+AT+SS), and the RoCL with the transformation smoothed classiﬁer. Forall baselines and our method, we train with the same attack strength of (cid:15) = 16 / . For results onCIFAR-100 and details of the evaluation setup, please see the supplementary C . To our knowledge, our

RoCL is the ﬁrst attempt to achieverobustness in a fully self-supervised learning setting, since existing approaches used self-supervisedlearning as a pretraining step before supervised adversarial training. We ﬁrst compare RoCL againstSimCLR[12], which is a vanilla self-supervised contrastive learning model. The result shows that thevanilla model is extremely vulnerable to adversarial attacks. However, RoCL achieves high robustnessagainst the target (cid:96) ∞ attacks, outperforming supervised adversarial training by Madry et al. [9], andobtaining comparable performance to TRADES [2]. This is an impressive result which demonstratesthat it is possible to train adversarially robust models without any labeled data. Note that while weused the same number of instances in this experiment, in practice, we can use any number of unlabeled6able 2: Performance of RoCL against black boxattacks on the CIFAR-10. Each column denotes theblack box model used to generate the (cid:96) ∞ adversarialimages with (cid:15) = (cid:96) ∞ . ResNet18Target Source 8/255 16/255AT TRADES AT TRADESAT [9] - - TRADES [2] 60.73 - 41.87 -RoCL

Table 3:

Experimental results of transfer learning onResNet18 trained on the CIFAR-10 and CIFAR-100dataset, respectively. We compare with the freezed trans-fered model in [38]. The model is modiﬁed WRN 32-10 [39]. + is the reported performance of [38]. source target Method A nat (cid:96) ∞ CIFAR-100 CIFAR-10 Transfer + [38] 72.05 17.70 RoCL 73.93 18.62

CIFAR-10 CIFAR-100 Transfer + [38] 41.59 11.63 RoCL 45.84 15.33

Table 4:

Attack target image ablation (cid:96) ∞ train/ (cid:96) ∞ test A nat x t (cid:48) ( x ) Table 5:

Attack iteration ablation (cid:96) ∞ train/ (cid:96) ∞ test 20 40 100RoCL 40.27 39.80 39.74 Table 6:

Ablation on the attack loss type (cid:96) ∞ train/ (cid:96) ∞ test (cid:15) L θ,π A nat MSE data available to train the model, which may lead to larger performance improvements. To show thatthis is not the effect of using augmented samples for self-supervised learning, we applied the sameset of augmentations for TRADES (TRADES*), but it obtains worse performance over the originalTRADES. Moreover, RoCL obtains much better clean accuracy, and signiﬁcantly higher robustnessover the supervised adversarial learning approaches against unseen types of attacks (See the results on (cid:96) , (cid:96) attacks in Table 1), and black box attacks (See Table 2). This makes RoCL more appealing overbaselines, and suggests that our method of using instance-wise attacks and suppression of distortionat the latent representation space is a more fundamental solution to ensure robustness against generaltypes of attacks. This point is made more clear in the comparison of RoCL against RoCL with linearevaluation (RoCL+rLE), which trains the linear classiﬁer with class-wiser adversaries. RoCL+rLEimproves the robustness against the target (cid:96) ∞ attacks, but degenerates robustness on unseen types ofattacks. Finally, with our proposed transformation smoothed classiﬁer, RoCL obtains even strongerperformance on unseen types of attacks (Table 1, the last three rows). Self-supervised based adversarial ﬁne-tuning

Existing works have shown that [40, 17] pretrainingthe networks with supervised or self-supervised learning improve adversarial robustness. This is alsoconﬁrmed with our results in Table 1, which show that the models ﬁne-tuned with our method obtaineven better robustness and higher clean accuracy over models trained from scratch. We observe thatusing self-supervised loss during adversarial ﬁnetuning further improves robustness. Moreover, ourmethod outperforms on robustness compares to the single self-supervised adversarially pretrainedmodel with adversarially ﬁnetuned in the previous work [17].

Comparison to semi-supervised learning

Recently, semi-supervised learning[15, 16] have beenshown to largely enhance the adversarial robustness of deep networks, by exploiting unlabeled data.However, they eventually require labeled data, to generate pseudo-labels on the unlabeled samples,and to generate class-wise adversaries. Also, they assume the availability of a larger dataset to improverobustness on the target dataset and require extremely large computation resources. Compared to thesemi-supervised learning methods, RoCL takes about 1/4 times faster with the same computationresources. Moreover, ours acquires sufﬁciently high clean accuracy and robustness after epochs(Fig. 3(c)) which takes 25 hours with two RTX 2080 GPUs.

Results on black box attacks

We also validate our models in black box attacks setting. We generatethe adversarial examples on AT, and TRADES model and test to four different models. As you cansee in Table 2, our model is superior than the TRADES [2] at AT black box images. Also, our modeleven shows comparable performance to AT [9] model in the TRADES black box images.

Transformation smoothed classiﬁer

Transformation smoothed classiﬁer can boost the accuracy ofnot only our models but also in the other models that use the stochastic transformation during training.However, compared to the TRADES in Table 1, we obtain a larger margin in robustness through thetransformation smoothed classiﬁer. Intuitively, since we enforce differently transformed samples to7 a) TSNE of SimCLR (b) TSNE of RoCL (c) Learning curve of RoCL (d) Black box robustnessFigure 3: (a,b) Visualizations of the embedding of instance-wise adversarial examples and clean examples forSimCLR and RoCL. (c) The learning curve of ResNet18 RoCL. (d) The transformation smoothed classiﬁerperformance on AT’s black box attack over iteration. agree, when we operate different transformations on one sample, the latent vectors will be placed insimilar representation space. Therefore, we can calculate the transformation ball around the sampleswhich acts as Gaussian ball in [24]. Accordingly, compare to the TRADES, we can obtain smootherclassiﬁer and able to acquire better gain in robustness not only white box attacked images, but alsoin black box images 3(d). As shown in Fig. 3(d), as iteration number increases the robustness alsoincrease. The best part of the transformation smoothed classiﬁer is that we do not have any trade-offin clean accuracy (Table 1).

Transfer learning

Unsupervised learning representation has the beneﬁt of applying to the otherdownstream not only the classiﬁcation. We demonstrate the effectiveness of our works on transferlearning in Table 3. Surprisingly, our model shows high clean accuracy and high robustness inboth CIFAR-100 and CIFAR-10 when the source model is trained with CIFAR-10 and CIFAR-100respectively, without any other additional loss. Since our method learns rich robust representation, ourmodel shows even better transfer results compare to the fully supervised robust transfer learning[38].The more experiment results of transfer learning are in the supplementary A . Instance-wise attack veriﬁcation

We verify our instance-wise attacks on SimCLR (Fig. 3(a)). Ourattacks generate the confusing samples as red edge markers which are far apart from the identicalinstances. Even though the same shapes are placed in adjacent space except for adversarial exampleswhich are all identical samples but differently transformed. However, if we train the model withRoCL (Fig. 3(b)), instance-wise attacks are gathered with transformed images of the same identity.

Attack target image ablation

Since our RoCL is an instance discriminative model, we can use anyidentity for the target for the attack. As shown in Table 5, even when we use the original x for theattacks, our RoCL still shows high natural accuracy along with high robustness. This is because thekey to our method is matching the instance-level identity which is not biased with transformation.Therefore, our methods show stable performance with any kind of target which has the same identity. Attack loss ( L ) Various distance measures can be used to compute the distance between two samplesin the representation space. Here, we apply four different distance functions: mean square error(MSE), cosine similarity, Manhattan distance (MD), and contrastive loss. The results in Table 6shows that contrastive loss is the most effective attack loss, compared to others.

In this paper, we tackled a novel problem of learning robust representations without any class labels.We ﬁrst proposed a instance-wise attack to make the model confuse the instance-level identityof a given sample. Then, we proposed a robust contrastive learning framework to suppress theiradversarial vulnerability by maximizing the similarity between a transformed sample and its instance-wise adversary. Furthermore, we demonstrate an effective transformation smoothed classiﬁer whichboosts our performance during the test inference. We validated our method on multiple benchmarkswith different neural architectures, on which it obtained comparable robustness to the supervisedbaselines on the targeted attack without any labels. Notably, RoCL obtained signiﬁcantly better cleanaccuracy and better robustness against black box, unseen attacks, and transfer learning, which makesit more appealing as a general defense mechanism. Our work opened a door to more interestingfollow-up works on unsupervised adversarial learning , which we believe is a more fundamentalsolution to achieving adversarial robustness with deep neural networks.8 roader Impact

The adversarial example brought alertness to blind deep learning beliefs. Due to the adversarialexample, it is necessary to consider the vulnerability of deep learning, and we have been doingvarious studies to make deep learning more secure. To make a more robust model, we always presentnew strong attacks which is an additional concern to the research ﬁeld. However, we can preventthose attacks by using adversarial training. Even though advanced attacks introduce, it is necessary toﬁnd a way to deal with those vulnerabilities. In such a paradigm, we should not allow vulnerabilitieseven for models that learn without labels. We believe that this is the ﬁrst prominent step toward usingsafer deep learning in the ﬁeld of self-supervised learning.

References [1] C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. Goodfellow, and R. Fergus, “Intriguingproperties of neural networks,” arXiv preprint arXiv:1312.6199 , 2013.[2] H. Zhang, Y. Yu, J. Jiao, E. P. Xing, L. E. Ghaoui, and M. I. Jordan, “Theoretically principled trade-offbetween robustness and accuracy,” in

Proceedings of the 36th International Conference on MachineLearning , 2019.[3] F. Tramèr and D. Boneh, “Adversarial training and robustness for multiple perturbations,” in

Advances inNeural Information Processing Systems , pp. 5858–5868, 2019.[4] J. Madaan, Divyam Jin and S. J. Hwang, “Adversarial neural pruning with latent vulnerability suppression,” arXiv preprint arXiv:1908.04355 , 2019.[5] S. Zheng, Y. Song, T. Leung, and I. Goodfellow, “Improving the robustness of deep neural networks viastability training,” in

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition ,pp. 4480–4488, 2016.[6] D. Hendrycks and T. Dietterich, “Benchmarking neural network robustness to common corruptions andperturbations,” in

International Conference on Learning Representations , 2019.[7] D. Yin, R. G. Lopes, J. Shlens, E. D. Cubuk, and J. Gilmer, “A fourier perspective on model robustness incomputer vision,” in

Advances in Neural Information Processing Systems , pp. 13255–13265, 2019.[8] I. J. Goodfellow, J. Shlens, and C. Szegedy, “Explaining and harnessing adversarial examples,” in

Interna-tional Conference on Learning Representations , 2015.[9] A. Madry, A. Makelov, L. Schmidt, D. Tsipras, and A. Vladu, “Towards deep learning models resistant toadversarial attacks,” in

International Conference on Learning Representations , 2018.[10] S. Gidaris, P. Singh, and N. Komodakis, “Unsupervised representation learning by predicting imagerotations,” in

International Conference on Learning Representations , 2018.[11] M. Noroozi and P. Favaro, “Unsupervised learning of visual representations by solving jigsaw puzzles,” in

European Conference on Computer Vision , pp. 69–84, Springer, 2016.[12] T. Chen, S. Kornblith, M. Norouzi, and G. Hinton, “A simple framework for contrastive learning of visualrepresentations,” arXiv preprint arXiv:2002.05709 , 2020.[13] K. He, H. Fan, Y. Wu, S. Xie, and R. Girshick, “Momentum contrast for unsupervised visual representationlearning,” in

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , 2020.[14] Z. Wu, Y. Xiong, S. X. Yu, and D. Lin, “Unsupervised feature learning via non-parametric instancediscrimination,” in

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition ,pp. 3733–3742, 2018.[15] Y. Carmon, A. Raghunathan, L. Schmidt, J. C. Duchi, and P. S. Liang, “Unlabeled data improves adversarialrobustness,” in

Advances in Neural Information Processing Systems , pp. 11190–11201, 2019.[16] R. Stanforth, A. Fawzi, P. Kohli, et al. , “Are labels required for improving adversarial robustness?,” in

Advances in Neural Information Processing Systems , 2019.[17] T. Chen, S. Liu, S. Chang, Y. Cheng, L. Amini, and Z. Wang, “Adversarial robustness: From self-supervised pre-training to ﬁne-tuning,” in

Proceedings of the IEEE Conference on Computer Vision andPattern Recognition , 2020.[18] S.-M. Moosavi-Dezfooli, A. Fawzi, and P. Frossard, “Deepfool: a simple and accurate method to fool deepneural networks,” in

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition ,pp. 2574–2582, 2016.[19] A. Kurakin, I. Goodfellow, and S. Bengio, “Adversarial examples in the physical world,” arXiv preprintarXiv:1607.02533 , 2016.

20] N. Carlini and D. Wagner, “Towards evaluating the robustness of neural networks,” in , pp. 39–57, IEEE, 2017.[21] A. Athalye, N. Carlini, and D. Wagner, “Obfuscated gradients give a false sense of security: Circumventingdefenses to adversarial examples,” in

Proceedings of the 35th International Conference on MachineLearning , 2018.[22] A. Ilyas, S. Santurkar, D. Tsipras, L. Engstrom, B. Tran, and A. Madry, “Adversarial examples are notbugs, they are features,” in

Advances in Neural Information Processing Systems , pp. 125–136, 2019.[23] B. Li, C. Chen, W. Wang, and L. Carin, “Certiﬁed adversarial robustness with additive noise,” in

Advancesin Neural Information Processing Systems , pp. 9459–9469, 2019.[24] J. Cohen, E. Rosenfeld, and Z. Kolter, “Certiﬁed adversarial robustness via randomized smoothing,” in

Proceedings of the 36th International Conference on Machine Learning , pp. 1310–1320, 2019.[25] H. Salman, J. Li, I. Razenshteyn, P. Zhang, H. Zhang, S. Bubeck, and G. Yang, “Provably robust deeplearning via adversarially trained smoothed classiﬁers,” in

Advances in Neural Information ProcessingSystems , pp. 11289–11300, 2019.[26] A. Dosovitskiy, P. Fischer, J. T. Springenberg, M. Riedmiller, and T. Brox, “Discriminative unsupervisedfeature learning with exemplar convolutional neural networks,”

IEEE transactions on pattern analysis andmachine intelligence , vol. 38, no. 9, pp. 1734–1747, 2015.[27] C. Doersch, A. Gupta, and A. A. Efros, “Unsupervised visual representation learning by context prediction,”in

Proceedings of the IEEE International Conference on Computer Vision , pp. 1422–1430, 2015.[28] R. Zhang, P. Isola, and A. A. Efros, “Colorful image colorization,” in

European conference on computervision , pp. 649–666, Springer, 2016.[29] D. Pathak, P. Krahenbuhl, J. Donahue, T. Darrell, and A. A. Efros, “Context encoders: Feature learningby inpainting,” in

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition ,pp. 2536–2544, 2016.[30] Y. Tian, C. Sun, B. Poole, D. Krishnan, C. Schmid, and P. Isola, “What makes for good views for contrastivelearning,” arXiv preprint arXiv:2005.10243 , 2020.[31] P. Khosla, P. Teterwak, C. Wang, A. Sarna, Y. Tian, P. Isola, A. Maschinot, C. Liu, and D. Krishnan,“Supervised contrastive learning,” arXiv preprint arXiv:2004.11362 , 2020.[32] D. Hendrycks, M. Mazeika, S. Kadavath, and D. Song, “Using self-supervised learning can improve modelrobustness and uncertainty,” in

Advances in Neural Information Processing Systems , pp. 15637–15648,2019.[33] K. Chen, H. Zhou, Y. Chen, X. Mao, Y. Li, Y. He, H. Xue, W. Zhang, and N. Yu, “Self-supervisedadversarial training,” arXiv preprint arXiv:1911.06470 , 2019.[34] P. Bachman, R. D. Hjelm, and W. Buchwalter, “Learning representations by maximizing mutual informationacross views,” in

Advances in Neural Information Processing Systems , pp. 15509–15519, 2019.[35] A. Kolesnikov, X. Zhai, and L. Beyer, “Revisiting self-supervised visual representation learning,”

CoRR ,vol. abs/1901.09005, 2019.[36] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in

Proceedings of theIEEE Conference on Computer Vision and Pattern Recognition , pp. 770–778, 2016.[37] A. Krizhevsky, G. Hinton, et al. , “Learning multiple layers of features from tiny images,” 2009.[38] A. Shafahi, P. Saadatpanah, C. Zhu, A. Ghiasi, C. Studer, D. Jacobs, and T. Goldstein, “Adversariallyrobust transfer learning,” in

International Conference on Learning Representations , 2020.[39] S. Zagoruyko and N. Komodakis, “Wide residual networks,” arXiv preprint arXiv:1605.07146 , 2016.[40] D. Hendrycks, K. Lee, and M. Mazeika, “Using pre-training can improve model robustness and uncertainty,”in

Proceedings of the 36th International Conference on Machine Learning , 2019. upplementary Adversarial Self-Supervised Contrastive Learning

Organization

The supplementary ﬁle is organized as follows. In section A, we describe the experimentaldetails, including the descriptions of the datasets and the evaluation process. We then provide an algorithmwhich summarizes our RoCL in section B. Then, we further report the RoCL results on both

CIFAR 10 and

CIFAR 100 against PGD attacks and CW attacks in Section C. Finally, perform ablation studies of our RoCL insection D.

A Experimental Setup

A.1 Training detail and datasetCIFAR 10

The CIFAR 10 [37] dataset consists of 60,000 RGB images of size × , from ten generalobject classes. The dataset consists of 5,000 training images and 1,000 test images for each class. We useResNet18 and ResNet50 [36] for this dataset without any linear layers. For the projector network we set theoutput dimension as 128. For adversarial training parameters, we set the perturbation (cid:15) = 0 . , step size α = 0 . , and number of iteration K = 7 . We train the model with the batch size B = 256 , λ = 1 / for1000 epochs. CIFAR 100

The CIFAR 100 [37] dataset consists of 60,000 RGB images of size × , from 100 generalobject classes. The dataset consists of 500 training images and 100 test images for each class. For thisexeperiments on this dataset, we use ResNet18 [36] without any linear layers, and use the projector with 128output dimensions. We set the perturbation (cid:15) = 0 . , step size α = 0 . , and the number of iteration K = 7 . We train the model with batch size B = 256 , λ = 1 / , and train the model for 1000 epochs. A.2 EvaluationLinear evaluation setup

In the linear evaluation phase, we train the linear layer ψ on the top of the frozenencoder f . We train the linear layer for 150 epochs with the learning rate of 0.2. The learning rate is dropped bya factor of 10 at 30, 50, 100 epoch of the training progress. We use stochastic gradient descent (SGD) optimizerwith a momentum of 0.9, weight decay of 5 e -4, and train the linear layer with the cross-entropy (CE) loss. Robust linear evaluation setup

For robust linear evaluation, we train the linear layer π on the top of thefrozen encoder f θ , as done with linear evaluation. We train the linear layer for 150 epochs with an learning rateof 0.02. The learning rate scheduling and the optimizer setup is the same with the setup for linear evaluation.We use the project gradient descent (PGD) attack to generate class-wise adversarial examples. We perform (cid:96) ∞ attack with epsilon (cid:15) = 0 . and the step size α = 0 . for 10 steps. Robustness evaluation setup

For evaluation of adversarial robustness, we use white-box project gradientdescent (PGD) attack. We evaluate under PGD attacks with 20, 40, 100 steps. We set (cid:96) ∞ , (cid:96) , (cid:96) attacks with (cid:15) = 0 . , . for (cid:96) ∞ , (cid:15) = 0 . , . for (cid:96) , and (cid:15) = 7 . , for (cid:96) for testing CIFAR 10 and CIFAR 100. Transfer learning setup

We ﬁrst brieﬂy describe robust transfer learning and our experiments in itsexperimental setting. Shafahi et al. [38] suggest that an adversarially trained model can be transferred to anothermodel to improve upon its robustness. They used modiﬁed WRN 32-10 to train the fully supervised adversarialmodel. Moreover, they initialize the student network with an adversarially trained teacher network and utilize thedistillation loss and cross-entropy loss to train the student network’s linear layer on the top of the encoder layer.We follow the experimental settings of Shafahi et al. [38], and train only the linear layer with cross-entropyloss. However, we did not use the distillation loss in order to evaluate the robustness of the encoder trained withour RoCL only (ResNet18). We train the linear model with CIFAR 100 on top of the frozen encoder, whichis trained on CIFAR 10. We also train the linear layer with CIFAR 10 on top of the frozen encoder, which istrained on CIFAR 100. We train the linear layer for 100 epochs with a learning rate of 0.2. We use stochasticgradient descent (SGD) for optimization. Algorithm of RoCL

We present the algorithm for RoCL in Algorithm 1. During training, we generate the instance-wise adversarialexamples using contrastive loss and then train the model using two differently transformed images and theirinstance-wise adversarial perturbations. We also include a regularization term that is deﬁned as a contrastive lossbetween the adversarial examples and clean transformed examples.

Algorithm 1

Robust Contrastive Learning (RoCL)

Input:

Dataset D , parameter of model θ , model f , parameter of projector π , projector p , constant λ for all iter ∈ number of training iteration dofor all x ∈ minibatch B = { x , . . . , x m } do Generate adversarial examples from transformed inputs (cid:46) instance-wise attacks t ( x ) i +1 = Π B ( t ( x ) ,(cid:15) ) ( t ( x ) i − α sign ( ∇ t ( x ) i L con ,θ,π ( t ( x ) i , { t (cid:48) ( x ) } , t ( x ) neg ))) end for L total = N (cid:80) Ni =1 [ L RoCL ,θ,π ( t ( x ) i , { t (cid:48) ( x ) i , t ( x ) advi } , { t ( x ) neg } )+ λ L con ,θ,π ( t ( x ) advi , { t (cid:48) ( x ) i } , { t ( x ) neg } )] (cid:46) Contrastive lossOptimize the weight θ , π over L total end for C Results of CIFAR 10 and CIFAR 100

While we only report the performance of RoCL on CIFAR 10 in the main paper as the baselines we mainlycompare against only experimented on this dataset, we further report the performance of RoCL on CIFAR 100 aswell (Table 7) and performance against CW attacks [20] (Table 8). We observe that RoCL consistently achievescomparable performance to that of the supervised adversarial learning methods, even on the CIFAR 100 dataset.Moreover, when employing the robust linear evaluation, RoCL acquires better robustness over the standardlinear evaluation. Finally, the transformation smoothed classiﬁer further boosts the performance of RoCL onboth datasets.

Table 7:

Experimental results with white box attacks on ResNet18 trained on the CIFAR 10 and CIFAR 100dataset. r-LE denotes robust linear evaluation. AT denotes the supervised adversarial training[9]. All models aretrained with (cid:96) ∞ ; thus the (cid:96) ∞ is the seen adversarial attack and (cid:96) , and (cid:96) attacks are unseen . Traintype Method CIFAR10 CIFAR100 A nat seen unseen A nat seen unseen (cid:96) ∞ (cid:96) (cid:96) (cid:96) ∞ (cid:96) (cid:96) (cid:15) (cid:15) L CE [ ] [ ] Self-supervised SimCLR [ ] RoCL

RoCL+tInf

Table 8:

Experimental results with white box CW attacks [20] on ResNet18 trained on the CIFAR 10. r-LEdenotes robust linear evaluation. All models are trained with (cid:96) ∞ Train type Method CIFAR10 CIFAR100 A nat CW A nat CWSelf-supervised

RoCL

RoCL+rLE Ablation

In this section, we report the results of several ablation studies of our RoCL model. For all experiments, wetrain the backbone network with 500 epochs and train the linear layer with 100 epochs, which yield models withsufﬁciently high clean accuracy and robustness. We ﬁrst examine the effects of the target image when generatingthe instance-wise adversarial examples. Along with instance-wise attacks, the regularization term in algorithm 1can also affect the ﬁnal performance of the model. To examine lambda’s effect on the transformed images, weset lambda as λ = 1 / for CIFAR 10 and λ = 1 / for CIFAR 100. We also examine the effects of lambda λ on the CIFAR 10 dataset. D.1 Adversarial contrastive learning

We examine the effect of the transformation function on the instance-wise attack and the regularization. Foreach input instance x , we generated three transformed images t ( x ) , t (cid:48) ( x ) , and t ( x ) adv and use them as thepositive set. The results in Table 9 demonstrate that using any transformed images from the same identityfor instance-wise attacks is equally effective. In contrast, for regularization, using images transformed with adifferent transformation function from the one used to generate attack helps obtain improved clean accuracy androbustness. Instance-wise attack

To generate instance-wise attacks, we can decide which identity we will use forinstance-wise attack. Since the original transformed image t ( x ) and image transformed with another transfor-mation t (cid:48) ( x ) have the same identity, we can use both of them in instance-wise attacks. To ﬁnd the optimalperturbation that maximizes the contrastive loss between adversarial examples and same identity images, wevary X in the following equation: t ( x ) i +1 = Π B ( t ( x ) ,(cid:15) ) ( t ( x ) i − α sign ( ∇ t ( x ) i L con ,θ,π ( t ( x ) i , { X } , t ( x ) neg ))) (9)where X is either t (cid:48) ( x ) and t ( x ) . Regularization

To regularize the learning, we can calculate the contrastive loss between adversarial examplesand clean samples with the same instance-level identity. We vary Y in the regularization term to examine whichidentity is the most effective, as follows: λ L con ,θ,π ( t ( x ) advi , { Y } , { t ( x ) neg } ) (10)where Y can be t (cid:48) ( x ) and t ( x ) . Table 9:

Experimental results with white box attacks on ResNet18 trained on the CIFAR 10 and CIFAR 100dataset. All models are trained with (cid:96) ∞ .instance-wise attack ( X ) Regularization ( Y ) CIFAR 10 CIFAR 100Method t (cid:48) ( x ) t ( x ) t (cid:48) ( x ) t ( x ) A nat (cid:96) ∞ A nat (cid:96) ∞ RoCL (cid:88) - (cid:88) - (cid:88) - - (cid:88) (cid:88) (cid:88) - 82.43 34.93 55.61 17.42- (cid:88) - (cid:88) .2 Lambda λ and batch size B We observe that λ , which controls the amount of regularization in the robust contrastive loss, and the batchsize for calculating the contrastive loss, are two important hyperparameters for our robust contrastive learningframework. We examine the effect of two hyperparameters in Table 10, and Table 11. We observe that theoptimal lambda λ is different for each batch size B . Table 10: lambda λ ablation experimental results with white box attacks on ResNet18 trained on the CIFAR 10dataset. All models are trained with (cid:96) ∞ . (cid:96) ∞ CIFAR 10 λ A nat

RoCL

Ablation study of the batch size B , for the white box attacks on ResNet18 trained on the CIFAR 10dataset. All models are trained with (cid:96) ∞ attacks. (cid:96) ∞ CIFAR 10

B λ A nat