Cross-modal Adversarial Reprogramming
Paarth Neekhara, Shehzeen Hussain, Jinglong Du, Shlomo Dubnov, Farinaz Koushanfar, Julian McAuley
CCross-modal Adversarial Reprogramming
Paarth Neekhara * 1
Shehzeen Hussain * 2
Jinglong Du Shlomo Dubnov Farinaz Koushanfar Julian McAuley Abstract
With the abundance of large-scale deep learningmodels, it has become possible to repurpose pre-trained networks for new tasks. Recent workson adversarial reprogramming have shown thatit is possible to repurpose neural networks foralternate tasks without modifying the network ar-chitecture or parameters. However these worksonly consider original and target tasks within thesame data domain. In this work, we broaden thescope of adversarial reprogramming beyond thedata modality of the original task. We analyzethe feasibility of adversarially repurposing imageclassification neural networks for Natural Lan-guage Processing (NLP) and other sequence clas-sification tasks. We design an efficient adversarialprogram that maps a sequence of discrete tokensinto an image which can be classified to the de-sired class by an image classification model. Wedemonstrate that by using highly efficient adver-sarial programs, we can reprogram image clas-sifiers to achieve competitive performance on avariety of text and sequence classification bench-marks without retraining the network.
1. Introduction
Transfer learning (Raina et al., 2007) and adversarial re-programming (Elsayed et al., 2019) are two closely relatedtechniques used for repurposing well-trained neural networkmodels for new tasks. Neural networks when trained on alarge dataset for a particular task, learn features that canbe useful across multiple related tasks. Transfer learningaims at exploiting this learned representation for adapting apre-trained neural network for an alternate task. Typically,the last few layers of a neural network are modified to mapto a new output space, followed by fine-tuning the network * Equal contribution Department of Computer Science, UCSan Diego Department of Electrical and Computer Engineering,UC San Diego Department of Computer Music, UC San Diego.Correspondence to: Shehzeen Hussain < [email protected] > ,Paarth Neekhara < [email protected] > . parameters on the dataset of the target task. Such tech-niques are especially useful when there is a limited amountof training data available for the target task.Adversarial reprogramming shares the same objective astransfer learning with an additional constraint: the networkarchitecture or parameters cannot be modified. Instead, theadversary can only adapt the input and output interfacesof the network to perform the new adversarial task. Thismore constrained problem setting of adversarial reprogram-ming poses a security challenge to neural networks. Anadversary can potentially re-purpose cloud-hosted machinelearning (ML) models for new tasks thereby leading to theftof computational resources. Additionally, the attacker mayreprogram models for tasks that violate the code of ethics ofthe service provider. For example, an adversary can repur-pose a cloud-hosted ML API for solving captchas to createspam accounts.Prior works on adversarial reprogramming (Elsayed et al.,2019; Neekhara et al., 2019; Kloberdanz, 2020; Tsai et al.,2020) have demonstrated success in repurposing Deep Neu-ral Networks (DNNs) for new tasks using computationallyinexpensive input and label transformation functions. Oneinteresting finding of (Elsayed et al., 2019) is that neuralnetworks can be reprogrammed even if the training datafor the new task has no resemblance to the original data.The authors empirically demonstrate this by repurposingImageNet (Deng et al., 2009) classifiers on MNIST (LeCun& Cortes, 2010) digits with shuffled pixels showing thattransfer learning does not fully explain the success of ad-versarial reprogramming. These results suggest that neuralcircuits hold properties that can be useful across multipletasks which are not necessarily related. Hence neural net-work reprogramming not only poses a security threat, butalso holds the promise of more reusable and efficient MLsystems by enabling shared compute of the neural networkbackbone during inference time.In existing work on adversarial reprogramming, the targetadversarial task has the same data domain as the originaltask. Recent work has shown that network architecturesbased on the transformer model can achieve state-of-the-artresults on language (Vaswani et al., 2017), audio (Ren et al.,2019) and vision (Dosovitskiy et al., 2021) benchmarks sug- a r X i v : . [ c s . A I] F e b ross-modal Adversarial Reprogramming gesting that transformer networks serve as good inductivebiases in various domains. Given this commonality betweenthe neural architectures in different domains, an interestingquestion that arises is whether we can perform cross-modaladversarial reprogramming: For example, Can we repurposea vision transformer model for a language task?Cross-modal adversarial reprogramming increases the scopeof target tasks for which a neural network can be repur-posed. In this work, we develop techniques to adversari-ally reprogram image classification networks for discretesequence classification tasks. We propose a simple and com-putationally inexpensive adversarial program that embedsa sequence of discrete tokens into an image and proposetechniques to train this adversarial program subject to a la-bel remapping defined between the labels of the originaland new task. We demonstrate that we can reprogram anumber of image classification neural networks based onboth Convolutional Neural Network (CNN) (LeCun et al.,1998) and Vision Transformer (Dosovitskiy et al., 2021) ar-chitectures to achieve competitive performance on a numberof sequence classification benchmarks. Additionally, weshow that it is possible to conceal the adversarial programas a perturbation in a real-world image thereby posing astronger security threat. The technical contributions of thispaper are summarized below:• We propose Cross-modal Adversarial Reprogramming,a novel approach to repurpose ML models originallytrained for image classification to perform sequenceclassification tasks. To the best of our knowledge, thisis the first work that expands adversarial reprogram-ming beyond the data domain of the original task.• We demonstrate the feasibility of our method by re-purposing four image classification networks for sixdifferent sequence classification benchmarks cover-ing sentiment, topic, and DNA sequence classification.Our results show that a computationally-inexpensiveadversarial program can leverage the learned neuralcircuits of the victim model and outperform word-frequency based classifiers trained from scratch onseveral tasks studied in our work.• We demonstrate for the first time the threat imposed byadversarial reprogramming to the transformer modelarchitecture by repurposing the Vision Transformermodel for six different sequence classification tasks.The reprogrammed transformer model outperforms al-ternate architectures on five out of six tasks studied inour work.
2. Background and Related Work
Neural networks have been shown to be vulnerable to ad-versarial examples (Goodfellow et al., 2015; Papernot et al.,2015; 2016; Moosavi-Dezfooli et al., 2017; Yang et al.,2018; Tu et al., 2020; Hussain et al., 2021) which are slightlyperturbed inputs that cause victim models to make a mistake.Adversarial Reprogramming was introduced by (Elsayedet al., 2019) as a new form of adversarial threat that allowsan adversary to repurpose neural networks to perform newtasks, which are different from the tasks they were origi-nally trained for. The proposed technique trains a singleadversarial perturbation that can be added to all inputs inorder to re-purpose the target model for an attacker’s chosentask. The adversary achieves this by first defining a hard-coded one-to-one label remapping function that maps theoutput labels of the adversarial task to the label space of theclassifier; and learning a corresponding adversarial repro-gramming function that transforms an input from the inputspace of the new task to the input space of the classifier.The authors demonstrated the feasibility of their attack al-gorithm by reprogramming ImageNet classification modelsfor classifying MNIST and CIFAR-10 data in a white-boxsetting, where the attacker has access to the victim modelparameters.While the above attack does not require any changes to thevictim model parameters or architecture, the adversarial pro-gram proposed (Elsayed et al., 2019) is only applicable totasks where the input space of the the original and adversar-ial task is continuous. To understand the feasibility of attackin a discrete data domain, (Neekhara et al., 2019) proposedmethods to repurpose text classification neural networks foralternate tasks, which operate on sequences from a discreteinput space. The attack algorithm used a context-based vo-cabulary remapping method that performs a computationallyinexpensive input transformation to reprogram a victim clas-sification model for a new set of sequences. This work wasalso the first in designing algorithms for training such aninput transformation function in both white-box and black-box settings—where the adversary may or may not haveaccess to the victim model’s architecture and parameters.They demonstrated the success of their proposed reprogram-ming functions by adversarially re-purposing various text-classification models including Long Short Term Memorynetworks (LSTM) (Hochreiter & Schmidhuber, 1997), bi-directional LSTMs (Graves et al., 2005) and CNNs (Zhanget al., 2015) for alternate text classification tasks.Recent works (Kloberdanz, 2020; Tsai et al., 2020) haveargued that reprogramming techniques can be viewed as anefficient training method and can be a superior alternativeto transfer learning. Particularly (Tsai et al., 2020) arguethat one of the major limitations of current transfer learning ross-modal Adversarial Reprogramming f θ (t) VictimImage Classifier (C)
Adversarial Program (f’ θ ) Label Remapping (f L ) 𝝐 x f’ θ (t) Prediction: Joy E m bedd i ng Loo k up ( θ ) amveryIexcited
JoyAngerSorrowOceanSkySandBridge…..
Original Labels New Labels x c Figure 1.
Schematic overview of our proposed cross-modal adversarial reprogramming method: The adversarial reprogramming function f θ embeds a sequence of discrete tokens t into an image. The image can also be concealed as an additive addition to some real-worldimage x c using the alternate reprogramming function f (cid:48) θ . Finally, the victim model is queried with the generated image and the predictedlabel is mapped to the target label using the label remapping function f L . techniques is the requirement of large amounts of targetdomain data, which is needed to fine-tune pre-trained neuralnetworks. They demonstrated the advantage of instead usingreprogramming techniques to repurpose existing ML mod-els for alternate tasks, which can be done even when trainingdata is scarce. The authors designed a black-box adversar-ial reprogramming method, that can be trained iterativelyfrom input-output model responses, and demonstrated itssuccess in repurposing ImageNet models for medical imag-ing tasks such as classification of autism spectrum disorders,melanoma detection, etc.All of these existing reprogramming techniques are onlyable to reprogram ML models when the data domain of thetarget adversarial task and the original task are the same. Weaddress this limitation in our work by designing adversarialinput transformation functions that allow image classifica-tion models to be reprogrammed for sequence classificationtasks such as natural language and protein sequence classifi-cation. While Convolutional Neural Networks (CNNs) have longachieved state-of-the-art performance on vision benchmarks,the recently proposed Vision Transformers (ViTs) (Dosovit-skiy et al., 2021) have been shown to outperform CNNs onseveral image classification tasks. Transformers (Vaswaniet al., 2017) are known for achieving state-of-the-art per-formance in natural language processing (NLP). In orderto train transformers for image classification tasks, theauthors (Dosovitskiy et al., 2021) divided an image intopatches and provide the sequence of linear embeddings ofthese patches as an input to a transformer. Image patchesare treated the same way as tokens (words) in an NLP ap-plication and the model is trained on image classificationin a supervised manner. The authors report that when ViTsare trained on large-scale image datasets, they are competi- tive and also outperform state-of-the-art models on multipleimage recognition benchmarks.Since transformers can model both language and vision datain a similar manner, that is, as a sequence of embeddings,we are curious to investigate whether a vision transformercan be reprogrammed for a text classification task. In theprocess, we find that CNN network architectures can alsobe reprogrammed to achieve competitive performance ondiscrete sequence classification tasks. In the next section,we discuss our cross-modal adversarial reprogramming ap-proach.
3. Methodology
Consider a victim image classifier C trained for mappingimages x ∈ X to a label l X ∈ L X . That is, C : x (cid:55)→ l X An adversary wishes to repurpose this victim image classi-fier for an alternate text classification task C (cid:48) of mappingsequences t ∈ T to a label l T ∈ L T . That is, C (cid:48) : t (cid:55)→ l T To achieve this goal, the adversary needs to learn appropriatemapping functions between the input and output spaces ofthe original and the new task. We solve this by first defininga label remapping f L that maps label spaces of the twotasks: f L : l X (cid:55)→ l T ; and then learning a correspondingadversarial program f θ that maps a sequence t ∈ T to animage x ∈ X i.e., f θ : t (cid:55)→ x such that f L ( C ( f θ ( t ))) acts asthe target classifier C (cid:48) .We assume a white-box adversarial reprogramming settingwhere the adversary has complete knowledge about archi-tecture and model parameters of the victim image classifier. ross-modal Adversarial Reprogramming In the next few sections we describe the adversarial program f θ , the label remapping function and the training procedureto learn the adversarial program. The goal of our adversarial program is to map a sequenceof discrete tokens t ∈ T to an image x ∈ X . Without lossof generalizability, we assume X = [ − , h × w × c to bethe scaled input space of the image classifier C where h , w are the height and width of the input image and c is thenumber of channels. The tokens in the sequence t belongto some vocabulary list V T . We can represent the sequence t as t = t , t , . . . , t N where t i is the vocabulary index ofthe i th token in sequence t in the vocabulary list V T .When designing the adversarial program it is importantto consider the computational cost of the reprogrammingfunction f θ . This is because if a classification model thatperforms equally well can be trained from scratch for theclassification task C (cid:48) and is computationally cheaper thanthe reprogramming function, it would defeat the purpose ofadversarial reprogramming.Keeping the above in mind, we design a reprogrammingfunction that looks up embeddings of the tokens t i and ar-ranges them as contiguous patches of size p × p in an imagethat is fed as input to the classifier C . Mathematically, thereprogramming function f θ is parameterized by a learnableembedding tensor θ | V T |×| p |×| p |×| c | and performs the trans-formation f θ : t (cid:55)→ x as per Algorithm 1. Algorithm 1
Adversarial Program f θ Input:
Sequence t = t , t , . . . , t N Output:
Reprogrammed image x h × w × c Parameters:
Embedding tensor θ | V T |×| p |×| p |×| c | x ← h × w × c for each t k in t do i ← (cid:98) ( k × p ) /h (cid:99) j ← ( k × p ) mod wx [ i : i + p, j : j + p, :] ← tanh( θ [ t k , : , : , :]) end forreturn x The patch size p and image dimensions h, w determine themaximum length of the sequence t that can be encodedinto the image. We pad all the input sequences t all theway up to the maximum allowed sequence length with apadding token to fill up the reprogrammed image and clipany sequences longer than the maximum allowed lengthfrom the end. More details about the hyper-parameters canbe found in our experiments section. Concealing the adversarial perturbation:
Most pastworks on adversarial reprogramming have considered an un-constrained attack setting, where the reprogrammed image does not necessarily need to resemble a real-world image.However, as noted by (Elsayed et al., 2019), it is possibleto conceal the reprogrammed image in a real-world imageby constraining the output of the reprogramming function.We can conceal the reprogrammed image as an additiveperturbation to some real-world base image x c by definingan alternate reprogramming function f (cid:48) θ as follows: f (cid:48) θ ( t ) = Clip [ − , ( x c + (cid:15). f θ ( t )) (1)Since the output of the original reprogramming function f θ is bounded between [ − , , we can control the L ∞ normof the added perturbation using the parameter (cid:15) ∈ [0 , . Computational Complexity:
As depicted in Figure 1, dur-ing inference, the adversarial program only looks up embed-dings of the tokens in the sequence t and arranges them inan image tensor which can optionally be added onto a baseimage. Asymptotically, the time complexity of this adversar-ial program is linear in terms of the length of the sequence t .Since there are no matrix-vector multiplications involved inthe adversarial program, it is computationally equivalent tojust the embedding layer of a sequence-based neural classi-fier. Therefore the inference cost of the adversarial programis significantly less than that of a sequence-based neuralclassifier. Table 1 in our supplementary material comparesthe wall-clock inference time for a sequence of length for our adversarial program and various sequence-basedneural classifiers used in our experiments. Past works (Elsayed et al., 2019; Neekhara et al., 2019; Tsaiet al., 2020) on adversarial reprogramming assume that thenumber of labels in the target task are less than than thenumber of labels in the original task. In our work, we relaxthis constraint and propose label remapping functions forboth of the following scenarios:
1. Target task has fewer labels than the original task:
Ini-tial works on adversarial reprogramming defined a one-to-one mapping between the labels of the original and newtask (Elsayed et al., 2019; Neekhara et al., 2019). How-ever, recent work (Tsai et al., 2020) found that mappingmultiple source labels to one target label helps improve theperformance over one-to-one mapping. Our preliminaryexperiments on cross-modal reprogramming confirm thisfinding, however, we differ in the way the final score of atarget label l t is aggregated—(Tsai et al., 2020) obtainedthe final score for a target label as the mean of the scoresof the mapped original labels. We found that aggregatingthe score by taking the maximum rather than the mean overthe mapped original labels leads to faster training. Anotheradvantage of using max reduction is that during inference,we can directly map the original predicted label to our targetlabel without requiring access to probability scores of any ross-modal Adversarial Reprogramming other label.Consider a target task label l t , mapped to a subset of labels L S t ⊂ L S of the original task under the many-to-one labelremapping function f L . We obtain the score for this targettask label as the maximum of the scores of each label l i ∈ L S t by classifier C . That is, Z (cid:48) l t ( t ) = max l i ∈ L St Z l i ( f θ ( t )) , (2)where Z k ( x ) and Z (cid:48) k ( t ) represent the score (before softmax)assigned to some label k by classifier C and C (cid:48) respectively.To define the label remapping f L , instead of randomly as-signing m source labels to a target label, we first obtain themodel predictions on the base image x c (or a zero imagein the case of an unbounded attack) and sort the labels bythe obtained scores; We then assign the the highest scoredsource labels to each target label using a round-robin strat-egy until we have assigned m source labels to each targetlabel.Note that while we need access to individual class scores dur-ing training (where we assume a white-box attack setting),during inference we can simply map the highest predictedlabel to the target label using the label remapping function f L without having to know the actual scores assigned todifferent labels.
2. Original task has fewer labels than the target task:
Inthis scenario, we map the probability distribution over theoriginal labels to a distribution over target labels to classscores for the target label space using a learnable lineartransformation. That is, Z (cid:48) ( t ) = θ (cid:48)| L T |×| L X | · softmax ( Z ( f θ ( t ))) . (3)Here Z (cid:48) ( t ) is a vector representing class scores (logits) forthe target label space. θ (cid:48)| L T |×| L X | are the learnable param-eters of the linear transformation that are optimized alongwith the parameters of the reprogramming function f θ . Notethat unlike the previous scenario, in this setting, we assumethat we have access to the probability scores of the originallabels during both training and inference. Optimization Objective:
To train the parameters θ of ouradversarial program, we use a cross-entropy loss betweenthe target label and the model score predictions obtainedas per Equation 2 or Equation 3. We also incorporate an L regularization loss for better generalization on the testset and to encourage more imperceptible perturbation in thecase of our bounded attack. Therefore our final optimizationobjective is the following: P l t = softmax ( Z (cid:48) ( t )) l t E ( θ ) = − (cid:80) t ∈ T log( P l t ) + λ || θ || . Here λ is the regularization hyper-parameter and P l t isthe predicted class probability of the correct label l t forsequence t . We use mini-batch gradient descent using anAdam optimizer (Kingma & Ba, 2014) to solve the aboveoptimization problem on the dataset of the target task.
4. Experiments
To demonstrate cross-modal adversarial reprogramming, weperform experiments on four neural architectures trainedon the ImageNet dataset. We choose both CNNs andthe recently proposed Vision Transformers (ViT) (Doso-vitskiy et al., 2021) as our victim image classifiers. WhileCNNs have long achieved state-of-the-art performance oncomputer-vision benchmarks, the recently proposed ViTshave been shown to outperform CNNs on several imageclassification tasks. We choose the ViT-Base (Dosovitskiyet al., 2021), ResNet-50 (He et al., 2016), InceptionNet-V3 (Szegedy et al., 2016) and EfficientNet-B7 (Tan & Le,2019) architectures. The details of these architectures arelisted in Table 1. We perform experiments on both pre-trained and randomly initialized networks.
Accuracy (%)
Model Abbr. Type
Table 1.
Victim image classification networks used for adversarialreprogramming experiments. We include the number of parametersof each model and also the Top-1 and Top-5 test accuracy achievedon the ImageNet benchmark.
In this work, we repurpose the aforementioned image clas-sifiers for several discrete sequence classification tasks. Wewish to analyze the performance of cross-modal adversar-ial reprogramming for different applications such as un-derstanding language and analyzing sequential biomedicaldata. Biomedical datasets e.g. splice-junction detection ingenes, often have fewer training samples than languagebased datasets and we aim to understand whether such lim-itations can adversely affect our proposed reprogrammingtechnique.Sentiment analysis and topic classification are popular NLPtasks. However, analyzing the underlying semantics of thesequence is often not necessary for solving these tasks sinceword-frequency based statistics can serve as strong discrim-inatory features. In contrast, tasks like DNA-sequence clas- ross-modal Adversarial Reprogramming
Dataset Statistics Accuracy (%)
Avg
Neural Methods TF-IDF
Dataset Task Type
Table 2.
Statistics of the datasets used for our reprogramming tasks. We also include the test accuracy of both neural network based andTF-IDF based benchmark classifiers trained from scratch on the train set. sification requires analyzing the sequential semantics ofthe input and simple frequency analysis of the unigramsor n-grams does not achieve competitive performance onthese tasks. To evaluate the effectiveness of adversarial re-programming in both of these scenarios, we consider thefollowing tasks and datasets in our experiments:4.2.1. S
ENTIMENT C LASSIFICATION
1. Yelp Polarity Dataset (Yelp) (Zhang et al., 2015): Thisis a dataset consisting of reviews from Yelp for the task ofsentiment classification, categorized into binary classes ofpositive and negative sentiment.2. Large Movie Review Dataset (IMDB) (Maas et al., 2011):This is a dataset for binary sentiment classification of posi-tive and negative sentiment from highly polar IMDB moviereviews.4.2.2. T
OPIC C LASSIFICATION
1. AG’s News Dataset (AG) (Zhang et al., 2015): is a col-lection of more than 1 million news articles. News articleshave been gathered from more than 2000 news sources andcontains 4 classes:
World, Sports, Business, Sci/Tech .2. DBPedia Ontology Dataset (DBPedia) (Zhang et al.,2015): consists of 14 non-overlapping categories from DB-pedia 2014. The samples consist of the category and abstractof each Wikipedia article.4.2.3. DNA S
EQUENCE C LASSIFICATION
1. Splice-junction Gene Sequences (
Splice ): Thisdataset (Noordewier et al., 1990; Dua & Graff, 2017) wascurated for training ML models to detect splice junctions inDNA sequences. In DNA, there are two kinds of splice junc-tion regions: Exon-Intron (EI) junction and Intron-Exon (IE)junction. This dataset contains sample DNA sequences of 60base pair length categorized into 3 classes: “EI” which con-tains exon-intron junction, “IE” which contains intron-exon junction, and “N” which contain neither EI or IE regions.2. Histone Protein Occupancy in DNA ( H3 ): This datasetfrom (Pokholok et al., 2005; Ngoc Giang et al., 2016) in-dicates whether certain DNA sequences wrap around H3histone proteins. Each sample is a sequence with a length of500 neucleobases. Positive samples contain DNA regionswrapping around histone proteins while negative samplesdo not contain such DNA regions.The statistics of these datasets are included in Table 2.To benchmark the performance that can be achieved onthese tasks, we train various classifiers from scratch on thedatasets for each task. We consider both neural networkbased classification models and frequency-based statisticalmodels (such as TF-IDF) as our benchmarks. We use word-level tokens for sentiment and topic classification tasks andneucleobase level tokens for DNA sequence classificationtasks.The TF-IDF methods can work on either unigrams or n-grams for creating the feature vectors from the input data.For the n-gram model, we consider n-grams up to length 3and choose the value of n that achieves the highest classifi-cation accuracy on the hold-out set. We train a StochasticGradient Descent (SGD) classifier to classify the feature vec-tor as one of the target classes. Additionally, we train DNNbased text-classifiers: Bidirectional Long Short Term Mem-ory networks (Bi-LSTM) (Graves et al., 2005; Hochreiter& Schmidhuber, 1997) and 1D CNN (Kim, 2014) modelsfrom scratch on the above tasks. We use randomly initial-ized token embeddings for all classification models, whichare trained along with the network parameters. For Bi-LSTMs, we combine the outputs of the first and last timestep for prediction. For the Convolutional Neural Networkwe follow the same architecture as (Kim, 2014). The hyper-parameter details of these classifiers and architecture havebeen included in Table 2 of the supplementary material.We report the accuracies on the test set of the above men-tioned classifiers in Table 2. We find that while both neural ross-modal Adversarial Reprogramming Unbounded Bounded ( L ∞ = 0 . )Pre-trained Randomly Initialized Pre-Trained Task p ViT RN-50 IN-V3 EN-B4 ViT RN-50 IN-V3 EN-B4 ViT RN-50 IN-V3 EN-B4Yelp 16 92.82
Table 3.
Results (% Accuracy on the test set) of adversarial reprogramming experiments targeting four image classification models for thesix sequence classification tasks. In the unbounded attack setting, we target both pre-trained and randomly initialized image classifiers. Inthe bounded attack setting, the output of the reprogramming function is concealed as a perturbation (with L ∞ norm of . ) to a randomlyselected ImageNet image shown in Figure 2. and frequency based TF-IDF methods work well on sen-timent and topic classification tasks, neural networks sig-nificantly outperform frequency based methods on DNAsequence classification tasks. This is presumably becausethe latter require structural analysis of the sequence ratherthan relying on keywords. The ViT-Base modelutilized in our work is trained on images of size × and works on image patches of size × . For allour experiments, we fix the input image size to be × . When we use a patch of size × for encodinga single token in our sequence, it allows for a maximumof tokens to be encoded into a single image. In ourinitial experiments we found that using larger patch sizes forsmaller sequences leads to higher performance on the targettask, since it encodes a sequence in a spatially larger areaof the image. Therefore, we choose our patch size as thelargest possible multiple of that can encode the longestsequence in our target task dataset. We list the patch size p used for different tasks in Table 3. Training hyper-parameters:
We train each adversarialprogram on a single Titan 1080i GPU using a batch size of .We set the learning rate as . for the unbounded attacksand . × (cid:15) − for our bounded attacks (Equation 1). Weset the L regularization hyper-parameter λ = 1 e − forall our experiments and train the adversarial program for amaximum 100k mini-batch iterations in the unbouned attacksetting and for 200k mini-batch iterations in the boundedattack setting. We map 10 original labels to each target labelin the scenario when there are fewer labels for the targettask than for the original task. We point the readers to ourcodebase for precise implementation. Code to be released upon publication.
5. Results
Experimental results of our proposed cross-modal repro-gramming method are reported in Table 3. In these experi-ments, the original task has more labels than the target taskso we use the label remapping function given by Equation 2.We first consider the unbounded attack setting, where theoutput of the adversarial program does not need to be con-cealed in a real-world image. For these experiments, we usethe reprogramming function f θ described in Algorithm 1.We also note that the primary evaluation of past reprogram-ming works (Elsayed et al., 2019; Neekhara et al., 2019;Tsai et al., 2020) is done in an unbounded attack setting.When attacking pre-trained image classifiers, we achievecompetitive performance (as compared to benchmark classi-fiers trained from scratch, reported in Table 2) across severaltasks for all victim image classification models. To assessthe importance of pre-training the victim model on the orig-inal dataset, we also experiment with reprogramming un-trained randomly initialized networks.Randomly initialized neural networks can potentially haverich structure which the reprogramming functions can ex-ploit. Prior works (Matthews et al., 2018; Lee et al., 2018)have shown that wide neural networks can behave as Gaus-sian processes, where training specific weights in the inter-mediate layers is not necessary to perform many differenttasks. However, in our experiments, we find that for CNN-based image classifiers, reprogramming pre-trained neuralnetworks performs significantly better than reprogrammingrandomly initialized networks for all tasks. This is consis-tent with the findings of prior reprogramming work (Elsayedet al., 2019) which reports that adversarial reprogrammingin the image domain is more effective when it targets pre-trained CNNs. For the ViT model, we find that we are able ross-modal Adversarial Reprogramming to obtain competitive performance on sentiment and topicclassification tasks when reprogramming either randomlyinitialized or pre-trained models. Particularly, we find thatreprogramming untrained vision transformers provides thehighest accuracy on the IMDB classification task. However,for DNA sequence classification tasks (Splice and H3) thatrequire structural analysis of the sequence rather than token-frequency statistics, we find that reprogramming pre-trainedvision transformer model performs significantly better thana randomly initialized transformer model.The ViT model outperforms other architectures on 5 outof 6 tasks in the unbounded attack setting. In particular,for the task of splice-junction detection in gene sequences,reprogramming a pre-trained ViT model outperforms bothTF-IDF and neural classifiers trained from scratch. For senti-ment analysis and topic classification tasks, which primarilyrequire keyword detection, some reprogramming methodsachieve competitive performance as the benchmark methodsreported in Table 2.Additionally, to assess the importance of the victim classifierfor solving the target task, we study the extent to which thetask can be solved without the victim classifier and usingonly the adversarial reprogramming function with a linearclassification head. We present the results and details of thisexperiment in Table 3 of our supplementary material. DNA H3 Task Input Sequence
ACTCAGTCAGAAAACTGAATTTAGTTGATATGGGACCGCTCCAAGGTAGGAGAATACTAGATCAAGTAAAGCAACCGCACTAGTGCCTTTTTCAAACAAGGTGGTTTGATGAGGAGGCTTTCTACAATCCTAGAAATATAAGACATCTG….
Unbounded ViT Unbounded ResNet-50Bounded (L ∞ =0.1) ViT Base Image
Bounded (L ∞ =0.1) ResNet-50 Figure 2.
Example outputs of our adversarial reprogramming func-tion in both unbounded (top) and bounded (bottom) attack settingswhile reprogramming two different pre-trained image classifiersfor a DNA sequence classification task (H3).
Concealing the adversarial perturbation:
To concealthe output of the adversarial program in a real-world image,we follow the adversarial reprogramming function definedin Equation 1. We randomly select an image from the Ima-geNet dataset (shown in Figure 2) as the base image x c andtrain adversarial programs targeting different image clas-sifiers for the same base image. We present the results at L ∞ = 0 . (on a 0 to 1 pixel value scale) distortion between the reprogrammed image and the base image x c on the rightside of Table 3. It can be seen that for some drop in perfor-mance, it is possible to perform adversarial reprogrammingsuch that the input sequence is concealed in a real-worldimage. Figure 1 in our supplementary material shows theaccuracy on three target tasks for different magnitudes ofallowed perturbation, while reprogramming a pre-trainedViT model. In a practical attack scenario, the adversary may only haveaccess to a victim image classifier with fewer labels thanthe target task labels. To evaluate adversarial reprogram-ming in this scenario, we constrain the adversary’s accessto the class-probability scores of just q labels of the Ima-geNet classifier. We choose the most frequent q ImageNetlabels as the original labels, that can be accessed by theadversary; and perform our experiments on two tasks fromour datasets, which have the highest number of labels—AGNews (4 labels) and DBPedia (14 labels). We use the labelremapping function given by Equation 3, and learn a lineartransformation to map the predicted probability distributionover the q original labels to the target task label scores.We demonstrate that we are able to perform adversarialreprogramming even in this more constrained setting. Weachieve similar performance as compared to our many-to-one label remapping scenario reported in Table 3 when q is close to the number of labels in the target task. Thisis because we learn an additional mapping function forthe output interface, which can potentially lead to betteroptimization. However as a downside, this setting requiresaccess to all q class probability scores for predicting theadversarial label, while in the previous many-to-one labelremapping scenario, we only need to know the highest-scored original label for mapping it to one of the adversariallabels. Accuracy (%)
Dataset q ViT RN-50 IN-V3 EN-B4AG 4 3
Table 4.
Results of adversarial reprogramming in the scenariowhen the target task has more labels than the original task. Theaccess of the adversary is constrained to class-probabilities of q labels of the original (ImageNet) task. This evaluation is done onpre-trained networks in an unbounded attack setting.
6. Conclusion
We propose Cross-modal Adversarial Reprogramming,which for the first time demonstrates the possibility of repur-posing pre-trained image classification models for sequence ross-modal Adversarial Reprogramming classification tasks. We demonstrate that computationallyinexpensive adversarial programs can repurpose neural cir-cuits to non-trivially solve tasks that require structural analy-sis of sequences. Our results suggest the potential of trainingmore flexible neural models that can be reprogrammed fortasks across different data modalities and data structures.More importantly, this work reveals a broader security threatto public ML APIs that warrants the need for rethinking ex-isting security primitives.
References
Deng, J., Dong, W., Socher, R., Li, L., Kai Li, and Li Fei-Fei. Imagenet: A large-scale hierarchical image database.In
CVPR , 2009.Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn,D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer,M., Heigold, G., Gelly, S., Uszkoreit, J., and Houlsby, N.An image is worth 16x16 words: Transformers for imagerecognition at scale. In
ICLR , 2021.Dua, D. and Graff, C. UCI machine learning repository,2017. URL http://archive.ics.uci.edu/ml .Elsayed, G. F., Goodfellow, I., and Sohl-Dickstein, J. Ad-versarial reprogramming of neural networks. In
ICLR ,2019.Goodfellow, I., Shlens, J., and Szegedy, C. Explaining andharnessing adversarial examples. In
ICLR , 2015.Graves, A., Fern´andez, S., and Schmidhuber, J. Bidirec-tional lstm networks for improved phoneme classificationand recognition. In
International Conference on ArtificialNeural Networks: Formal Models and Their Applications ,2005.He, K., Zhang, X., Ren, S., and Sun, J. Deep residuallearning for image recognition. In
CVPR , 2016.Hochreiter, S. and Schmidhuber, J. Long short-term memory.
Neural Comput. , 1997.Hussain, S., Neekhara, P., Jere, M., Koushanfar, F., andMcAuley, J. Adversarial deepfakes: Evaluating vulnera-bility of deepfake detectors to adversarial examples. In
WACV , 2021.Kim, Y. Convolutional neural networks for sentence classi-fication. In
EMNLP , 2014.Kingma, D. P. and Ba, J. Adam: A method for stochasticoptimization. arXiv:1412.6980 , 2014.Kloberdanz, E. Reprogramming of neural networks: Anew and improved machine learning technique.
MastersThesis , 2020. LeCun, Y. and Cortes, C. MNIST handwritten digitdatabase. 2010. URL http://yann.lecun.com/exdb/mnist/ .LeCun, Y., Bottou, L., Bengio, Y., and Haffner, P. Gradient-based learning applied to document recognition.
Proceed-ings of the IEEE , 1998.Lee, J., Sohl-dickstein, J., Pennington, J., Novak, R.,Schoenholz, S., and Bahri, Y. Deep neural networksas gaussian processes. In
ICLR , 2018.Maas, A. L., Daly, R. E., Pham, P. T., Huang, D., Ng, A. Y.,and Potts, C. Learning word vectors for sentiment analy-sis. In
Human Language Technologies , 2011.Matthews, A., Hron, J., Rowland, M., Turner, R. E., andGhahramani, Z. Gaussian process behaviour in wide deepneural networks. In
ICLR , 2018.Moosavi-Dezfooli, S., Fawzi, A., Fawzi, O., and Frossard,P. Universal adversarial perturbations. In
CVPR , 2017.Neekhara, P., Hussain, S., Dubnov, S., and Koushanfar, F.Adversarial reprogramming of text classification neuralnetworks. In
EMNLP , 2019.Ngoc Giang, N., Tran, V., Ngo, D., Phan, D., Lumbanraja, F.,Faisal, M. R., Abapihi, B., Kubo, M., and Satou, K. Dnasequence classification by convolutional neural network.
Journal of Biomedical Science and Engineering , 2016.Noordewier, M. O., Towell, G. G., and Shavlik, J. W. Train-ing knowledge-based neural networks to recognize genesin dna sequences. In
NIPS , 1990.Papernot, N., McDaniel, P. D., Jha, S., Fredrikson, M., Celik,Z. B., and Swami, A. The limitations of deep learning inadversarial settings. arXiv , abs/1511.07528, 2015.Papernot, N., McDaniel, P. D., Goodfellow, I. J., Jha, S.,Celik, Z. B., and Swami, A. Practical black-box attacksagainst deep learning systems using adversarial examples. arXiv , abs/1602.02697, 2016.Pokholok, D., Harbison, C., Levine, S., Cole, M., Hannett,N., Lee, T., Bell, G., Walker, K., Rolfe, P., Herbolsheimer,E., Zeitlinger, J., Lewitter, F., Gifford, D., and Young,R. Genome-wide map of nucleosome acetylation andmethylation in yeast.
Cell , 2005.Raina, R., Battle, A., Lee, H., Packer, B., and Ng, A. Y.Self-taught learning: Transfer learning from unlabeleddata. In
ICML , 2007.Ren, Y., Ruan, Y., Tan, X., Qin, T., Zhao, S., Zhao, Z., andLiu, T.-Y. Fastspeech: Fast, robust and controllable textto speech. In
Neurips , 2019. ross-modal Adversarial Reprogramming
Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., and Wojna,Z. Rethinking the inception architecture for computervision. In
CVPR , 2016.Tan, M. and Le, Q. Efficientnet: Rethinking model scalingfor convolutional neural networks. In
ICML , 2019.Tsai, Y.-Y., Chen, P.-Y., and Ho, T.-Y. Transfer learningwithout knowing: Reprogramming black-box machinelearning models with scarce data and limited resources.In
ICML , 2020.Tu, J., Ren, M., Manivasagam, S., Liang, M., Yang, B.,Du, R., Cheng, F., and Urtasun, R. Physically realizableadversarial examples for lidar object detection. In
CVPR ,2020.Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones,L., Gomez, A. N., Kaiser, L. u., and Polosukhin, I. Atten-tion is all you need. In
NIPS , 2017.Yang, P., Chen, J., Hsieh, C., Wang, J., and Jordan, M. I.Greedy attack and gumbel attack: Generating adversarialexamples for discrete data. arXiv , 2018.Zhang, X., Zhao, J. J., and LeCun, Y. Character-level convo-lutional networks for text classification. In
NIPS , 2015. ross-modal Adversarial Reprogramming - Supplementary Material
1. Wall-clock inference time ofreprogramming function
In Table 1, we report the wall clock inference time forthe adversarial program and the benchmark text classifiersstudied in our work for a sequence of length 500. Boththe Bi-LSTM and CNN model use 256 hidden units and anembedding size of 256. We use a single layer Bi-LSTMnetwork and 1-D CNN with convolution filters of size 3, 4and 5 based on the architecture proposed in (Kim, 2014).For the adversarial program the patch size is X and theoutput image size is X . We average the inferencetime for sequences for these evaluations. It can beseen that the adversarial program is significantly faster thanboth Bi-LSTM and 1D-CNN models for both CPU andGPU implementations in PyTorch. The CPU used for theseevaluations is Intel Xeon CPU and the GPU is an NvidiaTitan 1080i.Model CPU GPUAdversarial Program 7.9 ms 0.2 msBi-LSTM 161.5 ms 13.9 ms1D CNN 383.2 ms 2.2 ms Table 1.
Wall clock inference time (in miliseconds) for the adver-sarial program and the benchmark text classifiers studied in ourwork for a sequence of length 500.
2. Hyper-parameter details of benchmarkclassifiers
For training the benchmark neural-text classifiers, we use theBi-LSTM and 1D-CNN model with a softmax classificationhead. For both of these models, the token embedding layer israndomly initialized and trained with the model parameters.We use Adam optimizer and perform mini-batch gradientdescent using a batch size of 32 for both of these modelsfor a maximum 200k mini-batch iterations. For the 1D-CNN models we use filters of size 3, 4 and 5 for the threeconvolutional layers. Other hyper-parameter details of thesemodels are listed in Table 2.
Model Hidden Units Emb. Size
Table 2.
Hyper-parameter details for the neural sequence classifiersused as benchmark classifiers in our work. LR: Learning Rate.Emb. Size: Embedding Size.
3. Perturbation amount vs Accuracy inBounded attacks
Figure 1 shows the performance of cross-modal adversarialreprogramming at different magnitudes of allowed perturba-tion in the bounded attack setting, while attacking the ViTmodel. We use the same base image x c as used in all of ourbounded attack experiments. Figure 1.
Accuracy vs L ∞ norm of the perturbation while repro-gramming the ViT model for three target tasks covering emotion,topic and DNA sequence classification.
4. Assessing the importance of Victim Model
To assess the importance of the victim model for solvingthe target task, we perform an experiment to understand theextent to which the target task can be solved by using onlythe adversarial reprogramming function with a linear classi-fication head on top. To perform this experiment, we takethe mean of all token embeddings in a sequence and pass itas input to a linear layer that predicts the class scores for thetarget task. Not surprisingly, this classifier works well for ross-modal Adversarial Reprogramming - Supplementary Material
Dataset Patch Size Accuracy%Yelp 16 89.86IMDB 16 87.75AG 16 91.39DBPedia 32 97.01Splice 48 50.41H3 16 75.12
Table 3.
Performance of a classifier that uses only an embeddinglayer and a classification head on the datasets used in our study. sentiment and topic classification tasks which can be solvedreliably using word-frequency based methods (as reportedin Table 2 of the main paper). However, compared to ourreprogramming methods (reported in Table 2 of the mainpaper), the classifier significantly underperforms on the twoDNA sequence classification tasks that require understand-ing the underlying semantics of the sequence. This suggeststhat while the embedding layer of the adversarial programcan sufficiently capture word-frequency statistics and inde-pendently solve tasks that require keyword detection, thepre-trained victim model is essential for tasks that requireanalysing the structure and semantics of the sequence.
References
Kim, Y. Convolutional neural networks for sentence classi-fication. In