Is Deep Learning Safe for Robot Vision? Adversarial Examples against the iCub Humanoid
Marco Melis, Ambra Demontis, Battista Biggio, Gavin Brown, Giorgio Fumera, Fabio Roli
IIs Deep Learning Safe for Robot Vision?Adversarial Examples against the iCub Humanoid
Marco Melis , Ambra Demontis , Battista Biggio , Gavin Brown ,Giorgio Fumera and Fabio Roli PRA Lab, Department of Electrical and Electronic Engineering, University of Cagliari, Italy, http://pralab.diee.unica.it/en Pluribus One, Italy, School of Computer Science, University of Manchester, UK
Abstract
Deep neural networks have been widely adopted in re-cent years, exhibiting impressive performances in severalapplication domains. It has however been shown that theycan be fooled by adversarial examples, i.e., images alteredby a barely-perceivable adversarial noise, carefully craftedto mislead classification. In this work, we aim to evaluatethe extent to which robot-vision systems embodying deep-learning algorithms are vulnerable to adversarial exam-ples, and propose a computationally efficient countermea-sure to mitigate this threat, based on rejecting classifica-tion of anomalous inputs. We then provide a clearer under-standing of the safety properties of deep networks throughan intuitive empirical analysis, showing that the mappinglearned by such networks essentially violates the smooth-ness assumption of learning algorithms. We finally discussthe main limitations of this work, including the creation ofreal-world adversarial examples, and sketch promising re-search directions.
1. Introduction
After decades of research spent in exploring differentapproaches, ranging from search algorithms, expert andrule-based systems to more modern machine-learning algo-rithms, several problems involving the use of an artificialintelligence have been finally tackled through the introduc-tion of a novel paradigm shift based on data-driven artificialintelligence technologies. In fact, due to the increasing pop-ularity and use of the modern Internet, along with the pow- Accepted for publication at the ICCV 2017 Workshop on Vision inPractice on Autonomous Robots (ViPAR). erful computing resources available nowadays, it has beenpossible to extract meaningful knowledge from the hugeamount of data collected online, from images to videos, textand speech data [7]. Deep learning algorithms have pro-vided an important resource in this respect. Their flexibilityto deal with different kinds of input data, along with theirlearning capacity, have made them a powerful instrument tosuccessfully tackle challenging applications, reporting im-pressive performance on several tasks in computer vision,speech recognition and human-robot interactions [11, 21].Despite their undiscussed success in several real-worldapplications, several open problems remain to be addressed.Research work has been investigating how to interpret de-cisions taken by deep learning algorithms, unveiling thepatterns learned by deep networks at each layer [30, 17].Although a significant progress have been made in this di-rection, and it is now clear that such networks graduallylearn more abstract concepts ( e.g ., from detecting elemen-tary shapes in images to more abstract notions of objects oranimals), a relevant effort is still required to gain deeper in-sights. This is also important to understand why such algo-rithms may be vulnerable to the presence of adversarial ex-amples , i.e ., input data that are slightly modified to misleadclassification by the addition of an almost-imperceptible ad-versarial noise [16, 19]. The presence of adversarial exam-ples have been shown on a variety of tasks, including objectrecognition in images, handwritten digit recognition, andface recognition [24, 25, 10, 19, 20].In this work, we are the first to show that robot-visionsystems based on deep learning algorithms are also vulner-able to this potential threat. This is a crucial problem, asembodied agents have a more direct, physical interactionwith humans than virtual agents, and the damage causedby adversarial examples in this context can thus be much1 a r X i v : . [ c s . L G ] A ug igure 1: Architecture of the iCub robot-vision system [21]. After image acquisition, a region of interest containing the objectis cropped and processed by the ImageNet deep network [13]. The deep features extracted from the penultimate layer of suchnetwork (fc7) are then used as input to the classification algorithm to perform the recognition task, in which the probabilitiesthat the object belongs to each (known) class are reported. A human annotator can then validate or correct decisions, and theclassification algorithm can be updated accordingly; for instance, to learn that an object belongs to a never-before-seen class.more concerning. To demonstrate this vulnerability, wefocus on a case study involving the iCub humanoid robot(Sect. 2) [18, 21]. A peculiarity of humanoid robots is thatthey have to be able to learn in an online fashion, from thestimuli received during their exploration of the surround-ing environment. For this reason, a crucial requirement forthem is to embody completely the acquired knowledge, anda reliable and efficient learning paradigm. As discussed inprevious work [21], this is a conflicting goal with the cur-rent state of deep learning algorithms, which are too com-putationally and power demanding to be fully embodied bya humanoid robot. For this reason, the authors in [21] haveproposed to use a pre-trained deep network for object recog-nition to perform feature extraction (essentially consideringas the feature vector for the detected object one of the lastconvolutional layers in the deep network), and then train amulticlass classifier on such feature representation.The first contribution of this work is to show the vul-nerability of these kinds of robot-vision system to adversar-ial examples. To this end, we propose an alternative algo-rithm for the generation of adversarial examples (Sect. 3),which extends previous work on the evasion of binary tomulticlass classifiers [4]. Conversely to previous work deal-ing with the generation of adversarial examples based onminimum-distance perturbations [25, 10, 19, 20], our al-gorithm enables creating adversarial examples misclassi-fied with higher confidence, under a maximum input pertur-bation, for which devising proper countermeasures is alsomore difficult. This allows one to assess classifier securitymore thoroughly, by evaluating the probability of evadingdetection as a function of the maximum input perturbation.Notably, it also allows manipulating only a region of interestin the input image, such that creating real-world adversarialexamples becomes easier; e.g ., one may only modify someimage pixels corresponding to a sticker that can be subse- quently applied to the object of interest.The second contribution of this work is the proposal ofa computationally-efficient countermeasure, inspired fromwork on classification with the reject option and open-setrecognition, to mitigate the threat posed by adversarial ex-amples. Its underlying idea is to detect and reject the so-called blind-spot evasion points, i.e ., samples which are suf-ficiently far from known training data. This countermeasureis particularly suited to our case study, as it requires modi-fying only the learning algorithm applied on top of the deepfeature representation, i.e ., only the output layer (Sect. 4).We then report an empirical evaluation (Sect. 5) showingthat the iCub humanoid is vulnerable to adversarial exam-ples and to which extent our proposed countermeasure canimprove its security. In particular, although it does not com-pletely address the vulnerability of such system to adver-sarial examples, it requires one to significantly increase theamount of perturbation on the input images to reach a com-parable probability of misleading a correct object recog-nition. To better understand the reason behind this phe-nomenon, we provide a further, simple and intuitive empir-ical analysis, showing that the mapping learned by the deepnetwork used for deep feature extraction essentially violatesthe smoothness assumption of learning techniques in the in-put space. This means that, in practice, for a sufficientlyhigh amount of perturbation, the proposed algorithm createsadversarial examples that are mapped onto a region of thedeep feature space which is densely populated by trainingexamples of a different class. Accordingly, only modifyingthe classification algorithm on top of the pre-trained deepfeatures (without re-training the underlying deep network)may not be sufficient in this case.We conclude this paper discussing related work (Sect. 6),and relevant future research directions (Sect. 7). . The iCub Humanoid Our case study focuses on the iCub humanoid, as it pro-vides a cognitive humanoid robotic platform well suited toour task [18, 21]. In particular, the visual recognition sys-tem of this humanoid relies on deep learning technologiesto interact with the surrounding environment, enabling itto detect and to recognize known objects, i.e ., objects thathave been verbally annotated in a previous session by a hu-man teacher. Furthermore, iCub is capable of performing online learning, i.e ., after classification, it asks to the hu-man teacher whether the corresponding decision is correct.If the decision is wrong ( e.g ., in the case of an object be-longing to a never-before-seen class), the human teachercan provide feedback to the robot, which in turn updatesits classification model through online or incremental learn-ing techniques ( e.g ., by expanding the set of known objectclasses). This a clear example of how a robot can learn fromexperience to improve its capabilities, i.e ., a key aspect ofwhy embodying knowledge within robots is of crucial rele-vance for these tasks [21]. However, given the limited hard-ware and power resources of the humanoid, it is clear thatretraining the whole deep learning infrastructure becomestoo computationally demanding. For this reason, the visualsystem of iCub exploits the pre-trained ImageNet deep net-work [13] only for extracting a set of deep features (fromone of the highest convolutional layers) and uses this fea-ture vector to represent the object detected by iCub in theinput image. As described in Fig. 1, this deep feature vec-tor is then classified using a separate classifier, which canbe retrained online in an efficient manner when feedbackfrom the human annotator is received. In particular, in [21]this classifier is implemented using a one-versus-all schemeto combine a set of c linear classifiers, being c the num-ber of known classes. Let us denote the pixel values ofthe input image (in raster-scan order) with x ∈ X ⊆ R d (where d = 128 × × ), and the discriminant func-tions of the aforementioned one-versus-all linear classifiersas f ( x ) , . . . , f c ( x ) . Accordingly, the predicted class c (cid:63) isdetermined as the class whose discriminant function for thatsample is maximum: c (cid:63) = arg max k =1 ,..., c f k ( x ) . (1)The linear classifiers used for this purpose include Sup-port Vector Machines (SVMs) and Recursive Least Square(RLS) classifiers, as both can be efficiently updated on-line [21]. Notably, previous work has shown that replacingthe softmax layer in deep networks with a multiclass SVMcan be effective also in different applications [27].
3. Adversarial Security Evaluation
We discuss here our proposal to assess the security ofrobot-vision systems to adversarial examples. As in pre- vious work addressing the issue of evaluating security ofmachine-learning algorithms [1, 12, 5, 4, 28], our under-lying idea is to evaluate the maximum recognition accu-racy degradation against an increasing maximum admissi-ble level of perturbation of the input images. This is ratherdifferent than previous work in which adversarial exam-ples correspond to minimally-perturbed samples that arewrongly classified [25, 10, 19, 20]. As we will see in ourexperiments, besides providing a more complete evaluationof system security against adversarial examples, our attackstrategy also highlights additional interesting insights onsystem security, including the identification of vulnerabil-ities in the feature representation (rather than in the classi-fication algorithm itself) through the creation of adversarialexamples that are indistinguishable from training samplesof a different class.Our approach is based on extending the work in [4] forevasion of binary classifiers to the multiclass case. To thisend, we define two possible evasion settings, i.e ., waysof creating adversarial examples, which further differenti-ate our technique from previous work on the creation ofminimally-perturbed adversarial examples [25, 10, 19, 20].In particular, we consider an error-generic and an error-specific evasion setting. In the error-generic scenario, theattacker is interested in misleading classification, regardlessof the output class predicted by the classifier for the adver-sarial examples; e.g ., for a known terrorist the goal may beto evade detection by a video surveillance system, regard-less of the identity that may be erroneously associated tohis/her face. Conversely, in the error-specific setting, the at-tacker still aims to mislead classification, but requiring theadversarial examples to be misclassified as a specific, tar-get class; e.g ., imagine an attacker aiming to impersonate aspecific user. The two settings can be formalized in terms of two dis-tinct optimization problems, though using the same formu-lation for the objective function Ω( x ) : Ω( x ) = f k ( x ) − max l (cid:54) = k f l ( x ) . (2)This function essentially represents a difference between apreselected discriminant function (associated to class k ) andthe competing one, i.e ., the one exhibiting the highest valueat x among the remaining c − classes ( i.e ., all classes { , . . . , c } except k ). We discuss below how class k is cho-sen in the two considered settings. Error-generic Evasion.
In this case, the optimization prob- In [20], the authors defined targeted and indiscriminate attacks de-pending on whether the attacker aims to cause specific or generic errors,similarly to our settings. Here we do not follow their naming convention,as it causes confusion with the interpretation of targeted and indiscriminate attacks introduced in previous work [1, 12, 5]. Figure 2: Error-specific ( left ) and error-generic ( right ) eva-sion of a multiclass SVM with the Radial Basis Function(RBF) kernel. Decision boundaries among the three classes(blue, red and green points) are shown as black solid lines.In the error-specific case, the initial (blue) sample is shiftedtowards the green class (selected as the target one). Inthe error-generic case, instead, it is shifted towards the redclass, as it is the closest class to the initial sample. The (cid:96) distance constraint is also shown as a gray circle.lem can be formulated as: min x (cid:48) Ω( x (cid:48) ) , (3) s . t . d ( x , x (cid:48) ) ≤ d max , (4) x lb (cid:22) x (cid:48) (cid:22) x ub , (5)where f k ( x ) in the objective function Ω( x ) (Eq. 2) denotesthe discriminant function associated to the true class of thesource sample x , and d ( x , x (cid:48) ) ≤ d max represents a con-straint on the maximum input perturbation d max between x ( i.e ., the input image) and the corresponding modified ad-versarial example x (cid:48) , given in terms of a distance in theinput space. Normally, the (cid:96) distance between pixel valuesis used as the function d ( · , · ) , but other metrics can be alsoadopted ( e.g ., one may use an (cid:96) -based constraint to inject asparse adversarial noise rather than a slight image blurringas that caused by the (cid:96) -based constraint) [8, 22]. The boxconstraint x lb (cid:22) x (cid:48) (cid:22) x ub (where u (cid:22) v means that eachelement of u has to be not greater than the corresponding el-ement in v ) is optional, and can be used to bound the inputvalues x of the adversarial examples; e.g ., each pixel valuein images is bounded between and . Nevertheless, thebox constraint can be also used to manipulate only somepixels in the image. For example, if some pixels should notbe manipulated, one can set the corresponding values of x lb and x ub equal to those of x . This is of crucial importancefor creating real-world adversarial examples, as it allowsone to avoid manipulating pixels which do not belong tothe object of interest. For instance, this may enable one tocreate an “unusual” sticker to be attached to an adversarial object, similarly to the idea exploited in [24] for the creationof wearable objects used to fool face recognition systems. Error-specific Evasion.
The problem of error-specific eva-sion is formulated as the error-generic evasion problem in
Algorithm 1
Computation of Adversarial Examples
Input: x : the input image; η : the step size; r ∈{− , +1 } : variable set to − ( +1 ) for error-generic(error-specific) evasion; (cid:15) > : a small number. Output: x (cid:48) : the adversarial example. x (cid:48) ← x repeat x ← x (cid:48) , and x (cid:48) ← Π ( x + rη ∇ Ω( x )) until | Ω( x (cid:48) ) − Ω( x ) | ≤ (cid:15) return x (cid:48) Eqs. (3)-(5), with the only differences that: ( i ) the objectivefunction is maximized ; and ( ii ) f k denotes the discriminantfunction associated to the targeted class, i.e ., the class whichthe adversarial example should be assigned to.An example of the different behavior exhibited by thetwo attacks is given in Fig. 2. Both attacks are constructedusing the simple gradient-based algorithm given as Algo-rithm 1. The basic idea is to update the adversarial exam-ple by following the steepest descent (or ascent) direction(depending on whether we are considering error-generic orerror-specific evasion), and use a projection operator Π tokeep the updated point within the feasible domain (given bythe intersection of the box and the (cid:96) constraint). Gradient computation.
One key issue of the aforemen-tioned algorithm is the computation of the gradient of Ω( x ) ,which involve the gradients of the discriminant function f i ( x ) for i ∈ , . . . , c . It is not difficult to see that thiscan be computed using the chain rule to decouple the gradi-ent of the discriminant function of the classifier trained onthe deep feature space and the gradient of the deep networkused for feature extraction, as ∇ f i ( x ) = ∂f i ( z ) ∂ z ∂ z ∂ x , being z ∈ R m the set of deep features. In our case study, theseare the m = 4 , values extracted from layer fc7 (seeFig. 1). Notably, the gradient of the deep network ∂ z ∂ x isreadily available through automatic differentiation, as alsohighlighted in previous work [25, 10, 19, 20], whereas theavailability of the gradient ∂f i ( z ) ∂ z depends on whether thechosen classifier is differentiable or not. Several of the mostused classifiers are differentiable, including, e.g ., SVMswith differentiable kernels (we refer the reader to [4] forfurther details). Nevertheless, if the classifier is not differ-entiable ( e.g ., like in the case of decision trees), one mayuse a surrogate differentiable classifier to approximate it, asalso suggested in [4, 8, 22].
4. Classifier Security to Adversarial Examples
If the evasion algorithm drives the adversarial examplesdeeply into regions populated by known training classes (asshown in Fig. 2), there is no much one can do to correctlyidentify them from the rest of the data by only re-training ormodifying the classifier, i.e ., modifying the shape of the de-
Figure 3: Conceptual representation of our idea behind improving iCub security to adversarial examples, using multiclassSVMs with RBF kernels (SVM-RBF), without reject option (no defense, left ), with reject option ( middle ), and with modifiedthresholds to increase the rejection rate ( right ). Rejected samples are highlighted with black contours. The adversarialexample (black star) is misclassified as a red sample by SVM-RBF (left plot), while SVM-RBF with reject option correctlyidentifies it as an adversarial example (middle plot). Rejection thresholds can be modified to increase classifier security (rightplot), though at the expense of misclassifying more legitimate ( i.e ., non-manipulated) samples.cision boundaries in the feature space. We propose to con-sider this problem as an intrinsic vulnerability of the featurerepresentation : if the feature vector of an adversarial ex-ample becomes indistinguishable from those of the trainingsamples of a different class, it can only be detected by usinga different feature representation ( i.e ., in the case of iCub,this would require at least re-training the underlying deepnetwork responsible for deep feature extraction). However,this is not always the case, especially in high-dimensionalspaces, or if classes are separated with a sufficiently high margin . In this case, as depicted in Fig. 3, there may be verylarge regions of the feature space which are only scarcelypopulated by data, although being associated (potentiallyalso with high confidence) to known classes by the learningalgorithm. Accordingly, adversarial examples may quitereasonably end up in such regions while also successfullyfooling detection. These samples are often referred to as blind-spot evasion samples, as they are capable of mislead-ing classification, but in regions of the space which are farfrom the rest of the training data [12, 25]. Conversely to thecase of indistinguishable adversarial examples, blind-spot adversarial examples can be detected by only modifyingthe classifier ( i.e ., without re-training the underlying deepnetwork used by iCub). Accordingly, we propose to con-sider the problem of blind-spot adversarial examples as anintrinsic vulnerability of the classification algorithm. Dif-ferent approaches have been proposed based on modifyingthe classifier, ranging from 1.5-class classification (based onthe combination of anomaly detectors and two-class classi-fiers) [3] to open-set recognition techniques [23, 2].We propose here a more direct approach, based on the Here, we only refer to the classifier trained on top of the deep featurerepresentation as the classification algorithm . This definition excludes thepre-trained deep network used for feature extraction in iCub, as it is notre-trained online. same idea underlying the notion of classification with a re-ject option, and leveraging some concepts from open-setrecognition. In particular, we consider SVMs with RBFkernels to implement the multiclass classifier in our casestudy, as these SVMs belong to the so-called class of Com-pact Abating Probability (CAP) models [23] ( i.e ., classifierswhose discriminant function decreases while getting fartherfrom the training data). Then, by applying a simple rejec-tion mechanism on their discriminant function, we can iden-tify samples which are far enough from the rest of the train-ing data, i.e ., blind-spot adversarial examples. Our idea isthus to modify the decision rule in Eq. (1) as: c (cid:63) = arg max k =1 ,..., c f k ( x ) , only if f c (cid:63) ( x ) > , (6)otherwise classify x as an adversarial example ( i.e ., a novelclass). In practice this means that, if no classifier assigns thesample to an existing class ( i.e ., no value of f is positive),then we simply categorize it as an adversarial example. Inour specific case study, iCub may reject classification andask the human annotator to label the example correctly. No-tably, the threshold of each discriminant function ( i.e ., thebiases of the one-versus-all SVMs) can be adjusted to tunethe trade-off between the rejection rate of adversarial exam-ples and the fraction of incorrectly-rejected samples (whichare not adversarially manipulated), as shown in Fig. 3.
5. Experimental Analysis
In this section we report the results of the security eval-uation performed on the iCub system (see Sect. 2) alongwith few adversarial examples to show how the proposedevasion algorithm can be exploited to create real-world at-tack samples. We then provide a conceptual representationand an empirical analysis to explain why neural networks up DishwashingDetergent LaundryDetergent Plate Soap Sponge Sprayer
Figure 4: Example images (one per class) from the iCubWorld28 dataset, and subset of classes used in the iCubWorld7 dataset (highlighted in red).are easily fooled and how our defense mechanism can im-prove their security in this context.
Experimental Setup.
Our analysis has been performed us-ing the iCubWorld28 dataset [21], consisting of differentclasses which include different objects (cup, plate, etc .) of different kinds each ( e.g ., cup1, cup2, etc .), as shown inFig. 4. Each object was shown to iCub which automati-cally detected it and cropped the corresponding object im-age. Four acquisition sessions were performed in four dif-ferent days, ending up with approximately , imagesfor training and test sets. As shown in [21], it is very diffi-cult for iCub to be able to distinguish such slight categorydistinctions, like different kinds of cups. For this reason, wealso consider here a reduced dataset, iCubWorld7 , consist-ing only of different objects, each of a different kind. Theselected objects are highlighted in red in Fig. 4.We implement the classification algorithm using threedifferent multiclass SVM versions, all based on a one-versus-all scheme: a linear SVM (denoted with SVM inthe following); an SVM with the RBF kernel (SVM-RBF);and an SVM with the RBF kernel implementing our de-fense mechanism based on rejection of adversarial exam-ples (SVM-adv, Sect. 4). The regularization parameter C ∈ { − , . . . , } and the RBF kernel parameter γ ∈{ − , . . . , − } have been set equal for all one-versus-allSVMs in each multiclass classifier, by maximizing recogni-tion accuracy through 3-fold cross validation. Baseline Performance.
In Fig. 5 we report a box plot show-ing the empirical probability distributions of the accuracyachieved by the SVM classifier on increasingly larger ob-ject identification tasks, as suggested in [21]. To this end,we randomly select 300 subsets of increasing size from the iCubWorld28 dataset (day4 acquisitions), and then trainand test the classifier on each subset. The achieved accu-racy is considered an observation for estimating the empir- a cc u r a c y Confidence 70 %Confidence 80 %Confidence 90 %Confidence 95 %
Figure 5: Box plots of the recognition accuracies measuredfor linear SVM predictors trained on random subsets from to objects (whiskers with maximum 1.5 interquartilerange). Dotted super-imposed curves represent the mini-mum accuracy guaranteed within a fixed confidence level.ical distributions. The minimum accuracy value for whichthe fraction of observations in the estimated distribution washigher than a specific confidence threshold is indicated asa dotted line. Notably the reported performances for thelinear SVM are almost identical to those reported in [21],where a different algorithm is used. Similar performances(omitted for brevity) are obtained using SVM-RBF. Security Evaluation against Adversarial Examples.
Wenow investigate the security of iCub in the presence of ad-versarial examples. In this experiment, we consider thefirst 100 examples per class for both the iCubWorld28 and iCubWorld7 datasets, ending up with training and test setsconsisting of , and samples, respectively. Therecognition accuracy against an increasing maximum ad-missible (cid:96) perturbation ( i.e ., d max value) is reported inFig. 6 for both error-specific (top row plots) and error-generic (bottom row plots) attack scenarios. For error-specific evasion, we average our results not only on differ-ent training-test set splits, but also by considering a differ-ent target class in each repetition. While SVM and SVM-RBF show a comparable decrease of accuracy at increas-ing d max , SVM-adv is able to strongly improve the securityin most of the cases (as the corresponding curve decreasesmore gracefully). Notably, the performance of SVM-adveven increases for low values of d max . A plausible reason isthat, even if all testing images are only slightly modified ininput space, they immediately become blind-spot adversar-ial examples, ending up in a region which is far from the restof the data. As the input perturbation increases, such sam-ples are gradually drifted inside a different class, becoming indistinguishable from the samples of such class.To further improve the security of iCub to adversarialexamples, we set the rejection threshold of SVM-adv to amore conservative value, increasing the false negative rateor each base classifier of 5% (estimated on a validation set).This results in a significant security improvement, as shownin the rightmost plots in Fig. 6. However, as expected, thiscomes at the expense of misclassifying more legitimate ( i.e .non-manipulated) samples. Real-world Adversarial Examples.
In Fig. 7 we reportfew adversarial examples generated using an error-specificevasion attack on the iCubWorld28 data. Notably, the ad-versarial perturbation required to evade the system can bebarely perceived by human eyes. As an important real-world application of the proposed attack algorithm, in thebottom right plots of Fig. 7, we report an adversarial exam-ple generated by manipulating only a subset of the imagepixels, corresponding to the label of the detergent. In thiscase, the perturbation becomes easier to spot for a human,but localizing the noise in a region of interest allows the at-tacker to construct a practical, real-world adversarial object,by simply attaching an “adversarial” sticker to the originalobject before showing it to the iCub humanoid robot.
Why are Deep Nets Fooled?
Our analysis shows that alsothe iCub vision system can be fooled by adversarial ex-amples, even by only adding a slightly-noticeable noise tothe input image. To better understand the root causes ofthis phenomenon, we now provide an empirical analysis ofthe sensitivity of the feature mapping induced by the Im-ageNet deep network used by iCub, by comparing the (cid:96) distance corresponding to random and adversarial pertur-bations in the input space, with the one measured in thedeep feature space. To this end, we randomly perturb eachtraining image such that the (cid:96) distance between the ini-tial and the perturbed image in the input space equals .We then measure the (cid:96) distance between the deep featurevectors corresponding to the same images. For randomly-perturbed images, the average distance in deep space (alongwith its standard deviation) is . ± . , while for theadversarially-perturbed images, it is . ± . . Thismeans that random perturbations in the input space onlyresult in a very small shift in the deep space, while evenlight alterations of an image along the adversarial directioncause a large shift in deep space, which in turn highlights asignificant instability of the deep feature space mapping in-duced by the ImageNet network. In other words, this meansthat images in the input space are very close to the de-cision boundary along the adversarial (gradient) direction,as conceptually represented in Fig. 8. Note that this is ageneral issue for deep networks, not only specific to Ima-geNet [26, 9, 25, 10, 19, 20].It should be thus clear that even a well-crafted modifi-cation of the last layers of the network, as in our proposeddefense mechanism SVM-adv, can only mitigate this vul-nerability. Indeed, it remains intimately related to the sta-bility of the deep feature space mapping, which can be onlyaddressed by imposing specific constraints while training the deep neural network; e.g ., by imposing that small shiftsin the input space correspond to small changes in the deepspace, as recently proposed in [31]. Another possible coun-termeasure to improve stability of such mapping is to en-force classification of samples within a minimum margin ,by modifying the neurons’ activation functions and, poten-tially, considering a different regularizer for the objectivefunction optimized by the deep network. In this respect, itwould be interesting to investigate more in detail the inti-mate connections between robustness to adversarial inputnoise and regularization, as highlighted in [29, 22].
6. Related Work
Previous work has investigated the problem of adversar-ial examples in deep networks [25, 10, 19, 20, 26, 9], focus-ing however on minimally-perturbed adversarial examples, i.e ., examples that simply lie inside the decision region of aknown class, even if they remain far from the correspondingtraining examples; on the contrary, our approach is basedon creating maximally-perturbed (indistinguishable) adver-sarial examples (misclassified with high confidence). Dif-ferent techniques aimed at improving the security of deepnetworks have also been proposed. Some of them attemptto reduce classifier vulnerability by directly detecting andrejecting adversarial examples [2, 15]. The technique of [2]is based on open-set recognition: it rejects samples whosedistance from the centroids of all the classes exceeds a giventhreshold. However, it has not been evaluated using adver-sarial examples carefully generated to evade the classifier.In [15], adversarial examples are detected using the outputof the first convolutional layers. A different approach hasbeen proposed in [31]: it aims at improving the stabilityof the deep feature space mapping by retraining the net-work using an objective function that penalizes examples(images) that are close in input space but lie far in deep fea-ture space. This approach has however been investigatedonly against small image distortions.
7. Conclusions and Future Work
Deep learning has shown groundbreaking performancein several real-world application domains, encompassingareas like computer vision, speech recognition and languageprocessing, among others. Despite its impressive perfor-mances, recent work has shown how deep neural networkscan be fooled by well-crafted adversarial examples affectedby a barely-perceivable adversarial noise. In this work, wehave developed a novel algorithm for the generation of ad-versarial examples which enables a more complete evalu-ation of the security of a learning algorithm, and apply itto investigate the security of the robot-vision system of theiCub humanoid. Even if we do not restrict ourselves to themanipulation of pixels belonging to the object of interest in
50 100 150 200 250dmax0.00.20.40.60.81.0 a cc u r a c y - SVMSVM-RBFSVM-adv 0 50 100 150 200 250dmax0.00.20.40.60.81.0 a cc u r a c y - SVMSVM-RBFSVM-adv 0 50 100 150 200 250dmax0.00.20.40.60.81.0 a cc u r a c y - SVMSVM-RBFSVM-adv Figure 6: Recognition accuracy of iCub (using the three different classifiers SVM, SVM-RBF, and SVM-adv) against anincreasing maximum admissible (cid:96) input perturbation d max , for iCubWorld28 ( left column ) and iCubWorld7 ( middle and right columns), using error-specific ( top row ), and error-generic ( bottom row ) adversarial examples. Baseline accuracies (inthe absence of perturbation) are reported at d max = 0 . Figure 7: Plots in the top row show an adversarial example from class laundry-detergent3 , modified to be misclassified as cup3 , using an error-specific evasion attack, for increasing levels of input perturbation (reported in the title of the plots).Plots in the bottom row show the minimally-perturbed adversarial example that evades detection ( i.e ., the sample that evadesdetection with minimum d max ), along with the corresponding noise applied to the initial image (amplified to make it clearlyvisible), for the case in which all pixels can be manipulated (first and second plot), and for the case in which modificationsare constrained to the label of the detergent ( i.e ., simulating a sticker that can be applied to the real-world adversarial object).the image (which could lead one to more easily generate thecorresponding real-world adversarial object, e.g ., by mean of the application of specific stickers to objects), we haveshown how our algorithm enables this additional possibil-igure 8: Conceptual representation of the vulnerability of the deep feature space mapping. The left and right plots respec-tively represent images in input space and the corresponding deep feature vectors. Randomly-perturbed versions of an inputimage are shown as gray points, while the adversarially-perturbed image is shown as a red point. Despite these points are atthe same distance from the input image in the input space, the adversarial example is much farther in the deep feature space.This also means that images are very close to the decision boundary in input space, although in an adversarial direction thatis difficult to guess at random due to the high dimensionality of the input space.ity. Notably, even if we have not constructed any real-worldadversarial object during our experiments, recent work hasshown that the artifacts introduced by printing images andre-acquiring them through a camera are irrelevant, and donot eliminate the problem of the existence of adversarialexamples [14]. Similarly, another work has shown how toevade face recognition systems based on deep learning byusing adversarial glasses and other accessories [24]. Theserecent evidences clearly give a much higher practical rele-vance to the problem of adversarial examples.We have demonstrated and quantified the vulnerabilityof iCub to the presence of adversarial manipulations of theinput images, and suggested a simple countermeasure tomitigate the threat posed by such an issue. We have addi-tionally shown that, while blind-spot adversarial examplescan be detected using our defense mechanism, to furtherimprove the security of iCub against indistinguishable ad-versarial examples, re-training the classification algorithmon top of a pre-trained deep neural network is not suffi-cient. To this end, different strategies to enforce the deepnetwork to learn a more stable deep feature representation(in which small perturbations to the input data correspondto small perturbations in the deep feature space) should alsobe adopted, like the one proposed in [31].Other interesting research directions for this work in-clude evaluating security of robot-vision systems againstother threats, including the threat of data poisoning [12,6, 5], in which a malicious human annotator may providefew wrong labels to the humanoid to completely misleadits learning process and enforce it to misclassify as manyobjects as possible. In general, a comprehensive, standard-ized framework for evaluating the security of such systemswhile providing also more formal verification proceduresis still lacking, and we believe that this constitutes a fun-damental requirement for the complete transition of deep-learning-based systems in safety-critical applications, likerobots performing life-critical tasks and self-driving cars. References [1] M. Barreno, B. Nelson, A. Joseph, and J. Tygar. The securityof machine learning.
Mach. Learn. , 81:121–148, 2010. 3[2] A. Bendale and T. E. Boult. Towards open set deep networks.In
Proc. of the IEEE Conf. on Computer Vision and PatternRecognition , pages 1563–1572, 2016. 5, 7[3] B. Biggio, I. Corona, Z.-M. He, P. P. K. Chan, G. Giacinto,D. S. Yeung, and F. Roli. One-and-a-half-class multiple clas-sifier systems for secure learning against evasion attacks attest time. In F. Schwenker, F. Roli, and J. Kittler, editors,
Multiple Classifier Systems , volume 9132 of
Lecture Notesin Computer Science , pages 168–180. Springer InternationalPublishing, 2015. 5[4] B. Biggio, I. Corona, D. Maiorca, B. Nelson, N. ˇSrndi´c,P. Laskov, G. Giacinto, and F. Roli. Evasion attacks againstmachine learning at test time. In H. Blockeel, et al., editors,
ECML PKDD, Part III , vol. 8190 of
LNCS , pages 387–402.Springer Berlin Heidelberg, 2013. 2, 3, 4[5] B. Biggio, G. Fumera, and F. Roli. Security evaluation of pat-tern classifiers under attack.
IEEE Trans. Knowl. and DataEng. , 26(4):984–996, April 2014. 3, 9[6] B. Biggio, B. Nelson, and P. Laskov. Poisoning at-tacks against support vector machines. In J. Langford andJ. Pineau, editors, ,pages 1807–1814. Omnipress, 2012. 9[7] N. Cristianini. Intelligence reinvented.
New Scientist ,232(3097):37–41, 2016. 1[8] A. Demontis, P. Russu, B. Biggio, G. Fumera, and F. Roli.On security and sparsity of linear classifiers for adversarialsettings. In A. Robles-Kelly et al., editors,
Joint IAPR Int’lWorkshop on Structural, Syntactic, and Statistical Patt. Rec. ,vol. 10029 of
LNCS , pages 322–332, Cham, 2016. SpringerInternational Publishing. 4[9] R. Feinman, R. R. Curtin, S. Shintre, and A. B. Gardner.Detecting adversarial samples from artifacts. arXiv preprintarXiv:1703.00410 , 2017. 7[10] I. J. Goodfellow, J. Shlens, and C. Szegedy. Explaining andharnessing adversarial examples. In
International Conf. onLearning Representations , 2015. 1, 2, 3, 4, 711] G. Hinton, L. Deng, D. Yu, G. E. Dahl, A.-r. Mohamed,N. Jaitly, A. Senior, V. Vanhoucke, P. Nguyen, T. N. Sainath,et al. Deep neural networks for acoustic modeling in speechrecognition: The shared views of four research groups.
IEEESignal Processing Magazine , 29(6):82–97, 2012. 1[12] L. Huang, A. D. Joseph, B. Nelson, B. Rubinstein, and J. D.Tygar. Adversarial machine learning. In , pages43–57, Chicago, IL, USA, 2011. 3, 5, 9[13] A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenetclassification with deep convolutional neural networks. InF. Pereira, C. J. C. Burges, L. Bottou, and K. Q. Weinberger,editors,
Advances in Neural Information Processing Systems25 , pages 1097–1105. Curran Associates, Inc., 2012. 2, 3[14] A. Kurakin, I. Goodfellow, and S. Bengio. Adversarial ex-amples in the physical world. arXiv:1607.02533 , 2016. 9[15] X. Li and F. Li. Adversarial examples detection indeep networks with convolutional filter statistics.
CoRR ,abs/1612.07767, 2016. 7[16] Y. Luo, X. Boix, G. Roig, T. Poggio, and Q. Zhao. Foveation-based mechanisms alleviate adversarial examples. arXivpreprint arXiv:1511.06292 , 2015. 1[17] A. Mahendran and A. Vedaldi. Understanding deep imagerepresentations by inverting them. In
IEEE Conf. ComputerVision and Patt. Rec. , pages 5188–5196, 2015. 1[18] G. Metta, G. Sandini, D. Vernon, L. Natale, and F. Nori. TheiCub humanoid robot: an open platform for research in em-bodied cognition. In
Proc. of the 8th workshop on perfor-mance metrics for intelligent systems , pages 50–56. ACM,2008. 2, 3[19] S.-M. Moosavi-Dezfooli, A. Fawzi, and P. Frossard. Deep-fool: a simple and accurate method to fool deep neural net-works. In
Proc. of the IEEE Conf. on Computer Vision andPattern Recognition , pages 2574–2582, 2016. 1, 2, 3, 4, 7[20] N. Papernot, P. McDaniel, S. Jha, M. Fredrikson, Z. B. Celik,and A. Swami. The limitations of deep learning in adversar-ial settings. In ,pages 372–387. IEEE, 2016. 1, 2, 3, 4, 7[21] G. Pasquale, C. Ciliberto, F. Odone, L. Rosasco, L. Natale,and I. dei Sistemi. Teaching iCub to recognize objects usingdeep convolutional neural networks. In
MLIS@ ICML , pages21–25, 2015. 1, 2, 3, 6[22] P. Russu, A. Demontis, B. Biggio, G. Fumera, and F. Roli.Secure kernel machines against evasion attacks. In ,pages 59–69, New York, NY, USA, 2016. ACM. 4, 7[23] W. Scheirer, L. Jain, and T. Boult. Probability models foropen set recognition.
IEEE Trans. Patt. An. Mach. Intell. ,36(11):2317–2324, 2014. 5[24] M. Sharif, S. Bhagavatula, L. Bauer, and M. K. Reiter. Ac-cessorize to a crime: Real and stealthy attacks on state-of-the-art face recognition. In
Conf. Computer and Comm. Sec. ,pages 1528–1540. ACM, 2016. 1, 4, 9[25] C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan,I. Goodfellow, and R. Fergus. Intriguing properties of neuralnetworks. In
ICLR , 2014. 1, 2, 3, 4, 5, 7[26] T. Tanay and L. Griffin. A boundary tilting persepec-tive on the phenomenon of adversarial examples. arXiv:1608.07690 , 2016. 7 [27] Y. Tang. Deep learning using support vector machines.In
ICML Workshop on Representational Learning , volumearXiv:1306.0239, Atlanta, USA, 2013. 3[28] N. ˇSrndic and P. Laskov. Practical evasion of a learning-based classifier: A case study. In
IEEE Symp. Sec. & Privacy ,pages 197–211, Washington, DC, USA, 2014. IEEE CS. 3[29] H. Xu, C. Caramanis, and S. Mannor. Robustness and regu-larization of support vector machines.
JMLR , 10:1485–1510,July 2009. 7[30] M. D. Zeiler and R. Fergus. Visualizing and understandingconvolutional networks. In
ECCV , pages 818–833. Springer,2014. 1[31] S. Zheng, Y. Song, T. Leung, and I. Goodfellow. Improvingthe robustness of deep neural networks via stability train-ing. In