[PDF] Trustworthy Convolutional Neural Networks: A Gradient Penalized-based Approach

Abstract

Convolutional neural networks (CNNs) are commonly used for image classification. Saliency methods are examples of approaches that can be used to interpret CNNs post hoc, identifying the most relevant pixels for a prediction following the gradients flow. Even though CNNs can correctly classify images, the underlying saliency maps could be erroneous in many cases. This can result in skepticism as to the validity of the model or its interpretation. We propose a novel approach for training trustworthy CNNs by penalizing parameter choices that result in inaccurate saliency maps generated during training. We add a penalty term for inaccurate saliency maps produced when the predicted label is correct, a penalty term for accurate saliency maps produced when the predicted label is incorrect, and a regularization term penalizing overly confident saliency maps. Experiments show increased classification performance, user engagement, and trust.

Full PDF

TTrustworthy Convolutional Neural Networks:A Gradient Penalized-based Approach

Nicholas Halliwell

Inria, Sophia Antipolis, France [email protected]

Freddy Lecue

CortAIx, Thales, Montreal, CanadaInria, Sophia Antipolis, France [email protected]

Abstract

Convolutional neural networks (CNNs) are commonly used for image classiﬁca-tion. Saliency methods are examples of approaches that can be used to interpretCNNs post hoc, identifying the most relevant pixels for a prediction following thegradients ﬂow. Even though CNNs can correctly classify images, the underlyingsaliency maps could be erroneous in many cases. This can result in skepticism asto the validity of the model or its interpretation. We propose a novel approach fortraining trustworthy CNNs by penalizing parameter choices that result in inaccuratesaliency maps generated during training. We add a penalty term for inaccuratesaliency maps produced when the predicted label is correct, a penalty term foraccurate saliency maps produced when the predicted label is incorrect, and a reg-ularization term penalizing overly conﬁdent saliency maps. Experiments showincreased classiﬁcation performance, user engagement, and trust.

Convolutional neural networks (CNNs) are used in computer vision for tasks such as object detec-tion [25, 23, 24], image classiﬁcation [14], and visual question answering [4, 5]. The success of thesemodels has created a need for model transparency. Due to their complex structure, CNNs are oftendifﬁcult to interpret. Features learned by CNNs can be visualized, but understanding the meaning ofthese hidden representations can be difﬁcult for non-experts.Indeed there are many approaches to interpreting the output of machine learning models [27, 26, 17, 3,18, 12, 11, 7]. Saliency methods offer an intuitive way to understand what a CNN has learned. Thesealgorithms provide post hoc interpretations by highlighting a set of pixels or super pixels in the inputimage that are most relevant for a prediction. Small changes to these highlighted pixels results in thebiggest change in predicted score. One of the ﬁrst saliency methods, Gradients (sometimes calledVanilla Gradients) compute the gradient of the class score with respect to the input image [35]. GuidedBackPropagation [39] imputes the gradient of layers using ReLUs, backpropagating only positivegradients. Class Activation Mapping (CAM) uses a speciﬁc architecture for CNNs to discriminateregions of the input [44]. Grad-CAM generalizes CAM to any CNN architecture [31]. There existsmany different saliency methods [33, 34, 42, 37, 40, 19] to visually interpret CNN predictions. Thereis existing work showing not all of these methods are robust [2], and that saliency maps can bemanipulated [8]. HINT [32] encourages deep neural networks to be sensitive to the same inputregions as humans. The authors of [28] add a penalty term for explanation predictions far away fromtheir respective ground truth. For future versions, potential baselines include [47, 6, 13]On the task of image classiﬁcation, saliency methods allow the user to visually inspect the highlightedregions of the image. The overlap between the saliency map and the object of interest can be easilyand quickly compared by users. This forms an intuitive way to interpret what a model has learned.Saliency methods however can attribute relevant pixels outside the object of interest, producing

Preprint. Under review. a r X i v : . [ c s . L G ] S e p hat we term an inaccurate saliency map. Indeed any of the saliency methods mentioned above canencounter some form of this issue. This indicates the model has chosen a poor set of parameters. Onthe contrary, an accurate saliency map attributes pixel values over the object of interest.Convolutional neural networks are most commonly optimized by minimizing the cross entropybetween the target distribution and the predicted target distribution. This loss function is unaware thatthe choice of parameters giving a correct prediction may result in inaccurate saliency maps shownto the user. Conversely, parameters may be learned that give an incorrect prediction but result in avisually accurate saliency map. Figure 1b shows an example where the predicted label is incorrect,and the saliency map is inaccurate. Figure 1c shows an example where the predicted label is correct,and the saliency map is accurate.We propose a loss function that uniﬁes traditional loss functions with post hoc interpretation methods.This function includes a penalty term for inaccurate saliency maps generated when the predicted classis correct, a penalty term for accurate saliency maps generated when the prediction is incorrect, and apenalty term for overly conﬁdent saliency maps. Indeed this involves computing predicted saliencymaps on the forward pass during training. We demonstrate these penalty terms can be added to theexisting loss of a pre-trained model to continue training, or used in a transfer learning framework toimprove post hoc saliency maps. (a) Input Image: Cockroach (b) VGG-16: Tick (c) Proposed: Cockroach

Figure 1: VGG-16 incorrectly classiﬁes the input image, and the post hoc saliency map is inaccu-rate.The trustworthy CNN correctly classiﬁes the image and produces an accurate saliency map.

Consider a convolutional neural network f that classiﬁes an image x i into one of c ∈ C classes. Let y c be the true label c of an image x i . Let ˆ y be x i ’s predicted label from f , or equivalently ˆ y = f ( x i ) c .The cross entropy is thus given by CE ( y, ˆ y ) = 1 n n (cid:88) i =1 C (cid:88) c =1 − [ y i = c ] log ( f ( x i ) c ) (1)Two saliency methods of particular interest are Grad-CAM and Guided Grad-CAM [31]. Let A bethe activation maps of some convolutional layer, a given activation map denoted A k . To produce asaliency map for some x i , Grad-CAM computes the gradients of the target class with respect to layer l ’s activation maps. Global average pooling is performed on the gradients to serve as weights foreach activation map. These weights, denoted α ck in equation 2 represent the importance of a givenactivation map k for some target class c ∈ C . We recognize it is also possible the saliency method is not robust. For this paper we focus on model inducederrors. ck = 1 W × H (cid:88) i (cid:88) j ∂y c ∂A kij (2)After computing a weighted combination of the activation maps, the resulting output is passed througha ReLU. Here a ReLU is used to focus only on the features that have a positive inﬂuence on the trueclass c , i.e. pixels whose intensity should be increased in order to increase y c [31]. The saliency mapoutput by Grad-CAM is thus given by L c ( Grad − CAM ) = ReLU (cid:32)(cid:88) k α ck A k (cid:33) (3)Guided Grad-CAM [31] combines the output from Grad-CAM (equation 3) with the output fromGuided Backpropagation [39] through element-wise multiplication. This is done for two reasons;First, Guided Backpropagation alone is not class discriminative. Second, Grad-CAM fails to producehigh resolution (ﬁne-grained) visualizations. Merging the two saliency methods produces saliencymaps that are both high resolution and class discriminative [31]. Despite state-of-the-art classiﬁcation performance achieved by convolutional neural networks, lossminimizing parameters may result in saliency maps that do not highlight relevant pixels over the objectof interest. Indeed the saliency map is dependent on the learned parameters. Parameters however arelearned without knowing if the resulting saliency maps are visually accurate. Additionally, showing aninaccurate saliency map to a practitioner does not provide insight on how to change model parametersto correctly highlight pixels over the object of interest.As an example, take a pre-trained model VGG-16 [36], trained on the ImageNet dataset [29]. We useGrad-CAM on selected images to identify relevant pixels. Figure 2 shows four different cases that canbe encountered when using saliency methods for interpretations. Figure 2b shows the case , wherethe predicted label is correct, and the resulting saliency map is visually accurate. Figure 2g showscase , where the predicted class is incorrect, and the resulting saliency map is correct. Figure 2lshows case , the predicted class is correct, and the resulting saliency map is inaccurate. Figure 2qshows case , where the predicted class is incorrect and the resulting saliency map is inaccurate.Models giving inaccurate predictions (Figures 2g, 2q) and/or inaccurate saliency maps (Figures 2l, 2q)will cause users to lose trust in the model. Currently, convolutional neural networks are optimizedignoring how the saliency map will look post hoc. To our knowledge, no method exists to trainconvolutional neural networks to produce visually accurate saliency maps. We propose a loss functionthat penalizes inaccurate saliency maps, resulting in model parameters that produce visually accuratesaliency maps post hoc, and improved classiﬁcation performance, ensuring better user trust. We deﬁne a trustworthy CNN as one that produces accurate predictions and visually accurate posthoc saliency maps determined by user evaluation.

To identify parameters that produce both accurate predictions and accurate saliency maps, constraintsmust to be added to the cross entropy loss. Saliency maps produced post hoc can be visually accuratewhile the model classiﬁes the observation incorrectly. Additionally, visually inaccurate saliency mapscan be produced while the model classiﬁes the observation correctly. Lastly, visually inaccuratesaliency maps can be observed while the model incorrectly classiﬁes the observation. The lossfunction must consider the saliency maps produced from the parameter choices at each step taken bythe optimizer. 3 a) True class:Brambling (b) Predicted class:Brambling (c) Predicted class:Brambling (d) Predicted class:Brambling (e) Predicted class:Brambling(f) True class:Vulture (g) Predicted class:Kite (h) Predicted class:Vulture (i) Predicted class:Vulture (j) Predicted class:Vulture(k) True class:Pajama (l) Predicted class:Pajama (m) Predicted class:Pajama (n) Predicted class:Pajama (o) Predicted class:Pajama(p) True class:Sports car

Input image (q) Predicted class:Racer

Baseline (r) Predicted class:Sports car

Proposed ( . ) (s) Predicted class:Sports car Proposed ( . ) (t) Predicted class:Sports car Proposed ( . ) Figure 2: Input image shown with post hoc saliency maps from a VGG-16 baseline, and our proposedgradient penalized based Trustworthy CNN model shown with various learning rates.Take a saliency map ˆ L c ( . ) i generated by a saliency method on the forward pass of training. We averagethe predicted saliency map across all dimensions. Given by equation 4, we use this penalty term togauge the conﬁdence of the predicted saliency map. ˆ S i = 1 W × H W (cid:88) w =1 H (cid:88) h =1 ˆ L c ( . ) i (4)Adding the constraint of overly conﬁdent saliency maps generated during training does not penalizeinteractions between the saliency maps and predicted labels. Further constraints are needed to accountfor the predicted saliency map being accurate when the predicted label is incorrect, and the predictedsaliency map being inaccurate when the predicted label is correct. Equation 5 and 6 are added for theinteraction between the predicted class labels and predicted saliency map. Large gradient saliencymaps with corresponding incorrect predicted labels are penalized, along with small gradient saliencymaps with corresponding correct predicted labels.4 = CE ( y i , ˆ y i )(1 − ˆ S i ) (5) R = ˆ S i (1 − CE ( y i , ˆ y i )) (6)The ﬁnal loss function used for all plots and tables in this work is given by equation 7. We use ascalar λ ∈ [0 , to establish a dependence between ˆ S i and the cross entropy CE . L ( y , ˆ y, S ) = n (cid:88) i =1 λCE ( y i , ˆ y i ) + (1 − λ ) S i + R + R (7)The loss function we plan on using in future versions is given by R = CE ( y i , ˆ y i )(1 − P W CE ( S i , ˆ S i )) (8) R = P W CE ( S i , ˆ S i )(1 − CE ( y i , ˆ y i )) (9) L ( y , ˆ y, S, ˆ S ) = n (cid:88) i =1 λCE ( y i , ˆ y i ) + (1 − λ ) P W CE ( S i , ˆ S i ) + R + R (10)where P W CE is the pixel-wise cross entropy between the ground truth saliency map and predictedsaliency map.

To optimize the loss proposed in equation 7, we freeze the weights of all other layers in the network.We use stochastic gradient descent in all our experiments, although any of its variants can be used.Naturally, this loss function will be most effective in two settings; to update previously learnedparameters of a pre-trained model, or learn parameters of a newly added layer in a transfer learningframework. Consider the following example, where a practitioner identiﬁes a layer in a convolutionalnetwork that learns a noticeable systematic error. Our loss function allows practitioners to update thelayer weights, and eliminate these errors without having to re-train the model from scratch. In thecase of transfer learning, a new layer can be added and parameters can be learned that will produceaccurate saliency maps post hoc.There are no restrictions on which saliency method can be used to produce the saliency maps L c ( . ) i generated during training, provided the generated output when averaged is between zero and one.Regarding choice of saliency method, some choices make more intuitive sense than others. Forexample, Guided Backpropagation [39] and Deconvolutions [42] are not class discriminative, andtherefore should not be chosen. One interesting application of the proposed loss function is its application to the ﬁeld of transferlearning [16]. Knowledge from models trained on a speciﬁc task are applied to an entirely different We recognize ˆ S i and CE are not guaranteed to be between zero and one. In our experiments however, weﬁnd that the cross entropy term in equation 1 and regularization term in equation 4 are between zero and onewhen each term is divided by the number of classes | C | . ﬁlters, using a × kernel with a stride of . We then remove alllayers after, and add a softmax layer. All other layer weights are frozen. The baseline model usesthe cross entropy loss given by equation 1. We compare this against two trustworthy CNN modelstrained using equation 7. The ﬁrst uses Grad-CAM to generate saliency maps during training, thesecond uses Guided Grad-CAM. We train all models for epochs using a batch size of . Wecompare post hoc saliency maps using the structured similarity index (SSIM) given by equation 11,to compare relative to the baseline model. We ﬁx the learning rate to . and set λ = . . SSIM ( x, y ) = (2 µ x µ y + c )(2 σ xy + c )( µ x + µ y + c )( σ x + σ y + c ) (11)Where c = (0 . L ) and c = (0 . L ) , and L is deﬁned by the dynamic range of the pixel values. As a second experiment, we apply our loss function to VGG-16 [36], trained on the ImageNetdataset [29] for an image classiﬁcation task. Again we train two trustworthy models, one using Grad-CAM to generate saliency maps during training, the other using Guided Grad-CAM. We demonstrateimproved post hoc saliency maps as evaluated by users, and improved classiﬁcation performance.We compare this to a VGG-16 baseline trained using only the cross entropy loss. We use a subsetof , images, and update the parameters of VGG-16 from the block _ conv layer. This layerwas chosen as it is the closest convolutional layer to the softmax layer. We freeze the weights of allother layers. We compare classiﬁcation performance of all models using accuracy, precision, andrecall. We train different models varying the learning rate and lambda λ . We perform a grid searchfor the learning rate and lambda hyperparameters. We consider learning rates ∈ { . , . , . } , and λ ∈ { . , . , . } . Post hoc saliency maps are evaluated with a user experiment, detailed below. To compare the post hoc saliency maps across models, a scoring metric is needed. Two commonlyused metrics are localization error [31], or the pointing game [31, 43]. Both methods require groundtruth object labels. We do not assume the data has any object annotations, hence these metrics cannotbe used.To score the post hoc saliency maps between the trustworthy gradient penalized models and theirrespective baselines, we conduct a user experiment. We take the best performing set of hyperparam-eters (learning rate of . , and λ = . ), and use all models in the experiment to generate saliencymaps for user evaluation.We randomly sample test set images for user evaluation. We show users the input image and a posthoc saliency map from each model. We ask users "Which image best highlights the in the original image." Users were given the option of "Can’t distinguish" and "They look the same"in the case that no saliency maps are convincing. Users are asked if they have a Bachelor’s degreein Computer Science, if they are familiar with the term saliency map used by the machine learningcommunity, and if they have experience in computer vision. A link to the experiment can be foundbelow. https://forms.gle/DMszuv84sbxB9tzt7 R = 0 ) 98.3% 98.3% 98.3%Trustworthy CNN w/ Grad-CAM ( R = 0 ) 98.5% 98.5% 98.5% Trustworthy CNN w/ Guided Grad-CAM 98.7% 98.7% 98.7%

Trustworthy CNN w/ Guided Grad-CAM ( R = 0 ) 98.6% 98.6% 98.6%Trustworthy CNN w/ Guided Grad-CAM ( R = 0 ) 98.4% 98.4% 98.4%Classiﬁcation performance on transfer learning task. R = 0 denotes models trained with equation 5set to zero, R = 0 denotes models trained with equation 6.Table 2: Classiﬁcation Performance-VGG16Accuracy Precision RecallVGG-16 Baseline 66% (56%,75%) 49.7% 50.8%Trustworthy CNN w/ Grad-CAM

70% (61%, 78%) 54.6% 55.7%

VGG-16 Baseline 66% (56%,75%) 49.7% 50.8%Trustworthy CNN w/ Guided Grad-CAM

70% (61%, 78%) 54.6% 55.7%

Classiﬁcation performance shown for gradient penalized trustworthy models and baselines on test setimages shown during user experiment. A conﬁdence interval (lower, upper) is also included.

The classiﬁcation performance for all models can be found in Table 1. We observe the trustworthyMobilenetV2 model trained using Guided Grad-CAM outperformed the baseline and trustworthyGrad-CAM model. We ﬁnd that both trustworthy CNN models outperformed their respective baselinemodels trained with just the cross entropy term.The SSIM between saliency maps of the baseline model and trustworthy CNN trained with Grad-CAM was . , and . between the baseline and trustworthy CNN trained with Guided Grad-CAM.When R = 0 , the SSIM for the trustworthy Grad-CAM and Guided Grad-CAM models drop to . ,and . respectively. When R = 0 , the SSIM for the trustworthy Grad-CAM and Guided Grad-CAMmodels increases to . , and . respectively. This metric shows the post hoc saliency maps differvisually from the baseline, however, it fails to identify which saliency maps are more visually correct. Table 2 shows the accuracy, precision, and recall of all models on the subset of test set images shownto users in the experiment. We use the best performing set of hyperparameters to be evaluated by theusers. We ﬁnd that both trustworthy models outperform their respective baselines. We recognize theclassiﬁcation performance between the trustworthy models to be equal, likely due to setting λ = . . We ﬁnd the SSIM between the baseline VGG-16 and trustworthy Grad-CAM model was . , and . between the baseline VGG-16 and trustworthy Guided Grad-CAM model. According to thismetric, the saliency maps generated by the gradient penalized models should be very similar to thebase model. Through our user experiment however, we ﬁnd this not to be the case. This is furtherdiscussed in Section 6.Table 3 further breaks down the user experiment, showing the percentage of images that fall into eachcase. Users decided both trustworthy models outperform the baseline in all scenarios, except case .7ecall case occurs when the predicted label is incorrect and the saliency map is accurate. Ideallyimages fall into case , and fewer images fall into cases , , and .Table 3: User Experiment Breakdown-VGG16Case Case VGG-16 Baseline 18% (7%,28%)

16% (6%, 26%)

Trustworthy CNN w/ Grad-CAM

44% (30%, 58%) 16% (6%, 26%)

VGG-16 Baseline 16% (6%, 26%)

10% (2%,18%)

Trustworthy CNN w/ Guided Grad-CAM

50% (36%, 64%)

18% (7%, 29%)Case Case VGG-16 Baseline 44% (30%, 58%) 16% (6%, 26%)Trustworthy CNN w/ Grad-CAM

22% (10%, 33%) 12% (3%, 21%)

VGG-16 Baseline 48% (34%, 62%) 20% (9%,31%)Trustworthy CNN w/ Guided Grad-CAM

18% (7%, 29%) 8% (0%, 20%)

For the Trustworthy CNN trained with Grad-CAM, of images shown to users were found to haveaccurate predictions, and more accurate saliency maps (relative to the baseline). A conﬁdenceinterval (lower, upper) is included. Recall Case : % of observations with predicted labels correctand resulting saliency maps accurate. Case : % of observations with predicted labels incorrectand resulting saliency maps accurate. Case : % of observations with predicted labels correct andresulting saliency maps inaccurate. Case % of observations with predicted labels incorrect andresulting saliency maps inaccurate. High percentage is desirable for Case , low percentage isdesirable for Cases , , and . We ﬁnd the trustworthy models trained with Guided Grad-CAM outperform all other models in termsof predicting correct labels and accurate post hoc saliency maps. In the transfer learning experiment,we ﬁnd the SSIM decreases on gradient penalized models when equation 5 is set to zero. Additionally,the SSIM increases when equation 6 is set to zero. This shows the inherent trade-off between the twoterms.On the ImageNet dataset, users found the baseline VGG-16 models to produce inaccurate saliencymaps. These models are not trustworthy. The most common error from the baseline models wasproducing accurate predictions with inaccurate saliency maps (case from Table 3). This is notsurprising considering the cross entropy loss is saliency map unaware. Users will question thelegitimacy of the model when inaccurate saliency maps are produced. Our approach offers improvedclassiﬁation results and more accurate saliency maps, resulting in increased user trust. One noticeable limitation using the proposed loss is that only one convolutional layer can be updatedat a time in training. This is partially due to the limitations of some saliency methods. Grad-CAMand Guided Grad-CAM [31] for example generate saliency maps using the gradients from a speciﬁclayer only. Hence the gradients of one individual layer are used to compute the saliency map. Thislayer however may not fully represent what the entire model has learned.

One difﬁculty in scoring saliency maps is that two saliency maps can correctly highlight the object ofinterest, but equal attribution values can be assigned to different parts of the same object. An exampleof this can be demonstrated in Figure 3 using two hypothetical models.8 a) Input Image (b) Hypothetical Model 1 (c) Hypothetical Model 2

Figure 3: Two saliency maps with equal attribution valuesFor some input image, both models output saliency maps with equal total attribution values, butpixels are attributed to different locations on the object. The model in Figure 3b attributes the face ofthe Marmot, and the model in Figure 3c attributes a portion of the face and body. These two saliencymaps have exactly the same total attribution values when averaged. Hence, the SSIM betweenFigures 3b and 3c is . , but look signiﬁcantly different. It is unclear which saliency map is morevisually accurate. In this work, we combine the use of post hoc interpretability methods with traditional loss functionsto learn trustworthy model parameters. We propose a loss function that penalizes inaccurate saliencymaps during training. Further constraining the loss function used by convolutional neural networksincreases classiﬁcation performance, and users found the post hoc saliency maps to be more accurate.This give a more dependable model. Future work involves extending this method to other tasks(image captioning, object tracking, etc), and other deep learning architectures.

Broader Impact

Users receiving an automated decision from a convolutional neural network will beneﬁt from thisresearch; Our approach provides a way to increase user trust in models previously treated as blackbox.Using this approach, parameters of a pre-trained model can be updated, or parameters of a new layercan be learned in a transfer learning framework. Errors from an existing model can be identiﬁed andﬁxed. For practitioners wanting to eliminate a race or gender bias from a model, they will not have toretrain the model from scratch. This will save electricity used by the machine(s) to train.We do not believe anyone is put at a disadvantage from this research. A failure of this system wouldmean the model would no longer be convincing to users, and thus no different than the original blackbox model.

References [1] Martín Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, MatthieuDevin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, Manjunath Kudlur, Josh Levenberg,Rajat Monga, Sherry Moore, Derek Gordon Murray, Benoit Steiner, Paul A. Tucker, VijayVasudevan, Pete Warden, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. Tensorﬂow: A systemfor large-scale machine learning. In Kimberly Keeton and Timothy Roscoe, editors, , pages 265–283. USENIX Association, 2016.[2] Julius Adebayo, Justin Gilmer, Michael Muelly, Ian J. Goodfellow, Moritz Hardt, and Been Kim.Sanity checks for saliency maps. In Samy Bengio, Hanna M. Wallach, Hugo Larochelle, KristenGrauman, Nicolò Cesa-Bianchi, and Roman Garnett, editors,

Advances in Neural InformationProcessing Systems 31: Annual Conference on Neural Information Processing Systems 2018,NeurIPS 2018, 3-8 December 2018, Montréal, Canada , pages 9525–9536, 2018.93] Marco Ancona, Enea Ceolini, Cengiz Öztireli, and Markus Gross. Towards better understandingof gradient-based attribution methods for deep neural networks. In , 2018.[4] Stanislaw Antol, Aishwarya Agrawal, Jiasen Lu, Margaret Mitchell, Dhruv Batra, C. LawrenceZitnick, and Devi Parikh. VQA: visual question answering. In , pages2425–2433. IEEE Computer Society, 2015.[5] Rémi Cadène, Corentin Dancette, Hedi Ben-younes, Matthieu Cord, and Devi Parikh. Rubi:Reducing unimodal biases for visual question answering. In Wallach et al. [41], pages 839–850.[6] Chun-Hao Chang, Elliot Creager, Anna Goldenberg, and David Duvenaud. Explaining im-age classiﬁers by counterfactual generation. In . OpenReview.net, 2019.[7] Chaofan Chen, Oscar Li, Daniel Tao, Alina Barnett, Cynthia Rudin, and Jonathan Su. Thislooks like that: Deep learning for interpretable image recognition. In Wallach et al. [41], pages8928–8939.[8] Ann-Kathrin Dombrowski, Maximilian Alber, Christopher J. Anders, Marcel Ackermann,Klaus-Robert Müller, and Pan Kessel. Explanations can be manipulated and geometry is toblame. In Wallach et al. [41], pages 13567–13578.[9] Jeremy Elson, John R. Douceur, Jon Howell, and Jared Saul. Asirra: a CAPTCHA that exploitsinterest-aligned manual image categorization. In Peng Ning, Sabrina De Capitani di Vimercati,and Paul F. Syverson, editors,

Proceedings of the 2007 ACM Conference on Computer andCommunications Security, CCS 2007, Alexandria, Virginia, USA, October 28-31, 2007 , pages366–374. ACM, 2007.[10] Steve Hanneke and Samory Kpotufe. On the value of target data in transfer learning. In Wallachet al. [41], pages 9867–9877.[11] Sara Hooker, Dumitru Erhan, Pieter-Jan Kindermans, and Been Kim. A benchmark for inter-pretability methods in deep neural networks. In Wallach et al. [41], pages 9734–9745.[12] Been Kim, Martin Wattenberg, Justin Gilmer, Carrie J. Cai, James Wexler, Fernanda B. Viégas,and Rory Sayres. Interpretability beyond feature attribution: Quantitative testing with conceptactivation vectors (TCAV). In Jennifer G. Dy and Andreas Krause, editors,

Proceedings ofthe 35th International Conference on Machine Learning, ICML 2018, Stockholmsmässan,Stockholm, Sweden, July 10-15, 2018 , volume 80 of

Proceedings of Machine Learning Research ,pages 2673–2682. PMLR, 2018.[13] Pieter-Jan Kindermans, Kristof T. Schütt, Maximilian Alber, Klaus-Robert Müller, DumitruErhan, Been Kim, and Sven Dähne. Learning how to explain neural networks: Patternnet andpatternattribution. In , 2018.[14] Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. Imagenet classiﬁcation with deepconvolutional neural networks. In Peter L. Bartlett, Fernando C. N. Pereira, Christopher J. C.Burges, Léon Bottou, and Kilian Q. Weinberger, editors,

Advances in Neural InformationProcessing Systems 25: 26th Annual Conference on Neural Information Processing Systems2012. Proceedings of a meeting held December 3-6, 2012, Lake Tahoe, Nevada, United States ,pages 1106–1114, 2012.[15] Joshua Lee, Prasanna Sattigeri, and Gregory W. Wornell. Learning new tricks from old dogs:Multi-source transfer learning from pre-trained networks. In Wallach et al. [41], pages 4372–4382.[16] Fei-Fei Li, Robert Fergus, and Pietro Perona. One-shot learning of object categories.

IEEETrans. Pattern Anal. Mach. Intell. , 28(4):594–611, 2006.1017] Scott M. Lundberg and Su-In Lee. A uniﬁed approach to interpreting model predictions. InIsabelle Guyon, Ulrike von Luxburg, Samy Bengio, Hanna M. Wallach, Rob Fergus, S. V. N.Vishwanathan, and Roman Garnett, editors,

Advances in Neural Information Processing Systems30: Annual Conference on Neural Information Processing Systems 2017, 4-9 December 2017,Long Beach, CA, USA , pages 4765–4774, 2017.[18] Grégoire Montavon, Sebastian Bach, Alexander Binder, Wojciech Samek, and Klaus-RobertMüller. Explaining nonlinear classiﬁcation decisions with deep taylor decomposition.

CoRR ,abs/1512.02479, 2015.[19] Grégoire Montavon, Sebastian Lapuschkin, Alexander Binder, Wojciech Samek, and Klaus-Robert Müller. Explaining nonlinear classiﬁcation decisions with deep taylor decomposition.

Pattern Recognit. , 65:211–222, 2017.[20] Sinno Jialin Pan and Qiang Yang. A survey on transfer learning.

IEEE Trans. Knowl. DataEng. , 22(10):1345–1359, 2010.[21] Doina Precup and Yee Whye Teh, editors.

Proceedings of the 34th International Conferenceon Machine Learning, ICML 2017, Sydney, NSW, Australia, 6-11 August 2017 , volume 70 of

Proceedings of Machine Learning Research . PMLR, 2017.[22] Maithra Raghu, Chiyuan Zhang, Jon M. Kleinberg, and Samy Bengio. Transfusion: Under-standing transfer learning for medical imaging. In Wallach et al. [41], pages 3342–3352.[23] Joseph Redmon, Santosh Kumar Divvala, Ross B. Girshick, and Ali Farhadi. You only lookonce: Uniﬁed, real-time object detection.

CoRR , abs/1506.02640, 2015.[24] Joseph Redmon and Ali Farhadi. YOLO9000: better, faster, stronger. In , pages 6517–6525. IEEE Computer Society, 2017.[25] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement.

CoRR , abs/1804.02767,2018.[26] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. "why should I trust you?": Explainingthe predictions of any classiﬁer. In Balaji Krishnapuram, Mohak Shah, Alexander J. Smola,Charu C. Aggarwal, Dou Shen, and Rajeev Rastogi, editors,

Proceedings of the 22nd ACMSIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco,CA, USA, August 13-17, 2016 , pages 1135–1144. ACM, 2016.[27] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. Anchors: High-precision model-agnostic explanations. In Sheila A. McIlraith and Kilian Q. Weinberger, editors,

Proceedings ofthe Thirty-Second AAAI Conference on Artiﬁcial Intelligence, (AAAI-18), the 30th innovativeApplications of Artiﬁcial Intelligence (IAAI-18), and the 8th AAAI Symposium on EducationalAdvances in Artiﬁcial Intelligence (EAAI-18), New Orleans, Louisiana, USA, February 2-7,2018 , pages 1527–1535. AAAI Press, 2018.[28] Laura Rieger, Chandan Singh, W. James Murdoch, and Bin Yu. Interpretations are useful:penalizing explanations to align neural networks with prior knowledge.

CoRR , abs/1909.13584,2019.[29] Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, ZhihengHuang, Andrej Karpathy, Aditya Khosla, Michael S. Bernstein, Alexander C. Berg, and Fei-FeiLi. Imagenet large scale visual recognition challenge.

Int. J. Comput. Vis. , 115(3):211–252,2015.[30] Mark Sandler, Andrew G. Howard, Menglong Zhu, Andrey Zhmoginov, and Liang-Chieh Chen.Mobilenetv2: Inverted residuals and linear bottlenecks. In , pages4510–4520. IEEE Computer Society, 2018.[31] Ramprasaath R. Selvaraju, Abhishek Das, Ramakrishna Vedantam, Michael Cogswell, DeviParikh, and Dhruv Batra. Grad-cam: Why did you say that? visual explanations from deepnetworks via gradient-based localization.

CoRR , abs/1610.02391, 2016.1132] Ramprasaath Ramasamy Selvaraju, Stefan Lee, Yilin Shen, Hongxia Jin, Shalini Ghosh, Larry P.Heck, Dhruv Batra, and Devi Parikh. Taking a HINT: leveraging explanations to make vision andlanguage models more grounded. In , pages 2591–2600.IEEE, 2019.[33] Avanti Shrikumar, Peyton Greenside, and Anshul Kundaje. Learning important features throughpropagating activation differences. In Precup and Teh [21], pages 3145–3153.[34] Avanti Shrikumar, Peyton Greenside, Anna Shcherbina, and Anshul Kundaje. Not just ablack box: Learning important features through propagating activation differences.

CoRR ,abs/1605.01713, 2016.[35] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks:Visualising image classiﬁcation models and saliency maps. In Yoshua Bengio and Yann LeCun,editors, , 2014.[36] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scaleimage recognition. In Yoshua Bengio and Yann LeCun, editors, , 2015.[37] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda B. Viégas, and Martin Wattenberg. Smooth-grad: removing noise by adding noise.

CoRR , abs/1706.03825, 2017.[38] Jie Song, Yixin Chen, Xinchao Wang, Chengchao Shen, and Mingli Song. Deep modeltransferability from attribution maps. In Wallach et al. [41], pages 6179–6189.[39] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin A. Riedmiller. Strivingfor simplicity: The all convolutional net. In Yoshua Bengio and Yann LeCun, editors, , 2015.[40] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. InPrecup and Teh [21], pages 3319–3328.[41] Hanna M. Wallach, Hugo Larochelle, Alina Beygelzimer, Florence d’Alché-Buc, Emily B. Fox,and Roman Garnett, editors.

Advances in Neural Information Processing Systems 32: AnnualConference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December2019, Vancouver, BC, Canada , 2019.[42] Matthew D. Zeiler and Rob Fergus. Visualizing and understanding convolutional networks.In David J. Fleet, Tomás Pajdla, Bernt Schiele, and Tinne Tuytelaars, editors,

ComputerVision - ECCV 2014 - 13th European Conference, Zurich, Switzerland, September 6-12, 2014,Proceedings, Part I , volume 8689 of

Lecture Notes in Computer Science , pages 818–833.Springer, 2014.[43] Jianming Zhang, Zhe L. Lin, Jonathan Brandt, Xiaohui Shen, and Stan Sclaroff. Top-downneural attention by excitation backprop. In Bastian Leibe, Jiri Matas, Nicu Sebe, and MaxWelling, editors,

Computer Vision - ECCV 2016 - 14th European Conference, Amsterdam, TheNetherlands, October 11-14, 2016, Proceedings, Part IV , volume 9908 of

Lecture Notes inComputer Science , pages 543–559. Springer, 2016.[44] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learningdeep features for discriminative localization. In , pages 2921–2929.IEEE Computer Society, 2016.[45] Fuzhen Zhuang, Xiaohu Cheng, Ping Luo, Sinno Jialin Pan, and Qing He. Supervised repre-sentation learning: Transfer learning with deep autoencoders. In Qiang Yang and Michael J.Wooldridge, editors,

Proceedings of the Twenty-Fourth International Joint Conference on Artiﬁ-cial Intelligence, IJCAI 2015, Buenos Aires, Argentina, July 25-31, 2015 , pages 4119–4125.AAAI Press, 2015. 1246] Fuzhen Zhuang, Zhiyuan Qi, Keyu Duan, Dongbo Xi, Yongchun Zhu, Hengshu Zhu, Hui Xiong,and Qing He. A comprehensive survey on transfer learning.

CoRR , abs/1911.02685, 2019.[47] Luisa M. Zintgraf, Taco S. Cohen, Tameem Adel, and Max Welling. Visualizing deep neuralnetwork decisions: Prediction difference analysis. In5th International Conference on LearningRepresentations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings