[PDF] Dependency Decomposition and a Reject Option for Explainable Models

Abstract

Deploying machine learning models in safety-related do-mains (e.g. autonomous driving, medical diagnosis) demands for approaches that are explainable, robust against adversarial attacks and aware of the model uncertainty. Recent deep learning models perform extremely well in various inference tasks, but the black-box nature of these approaches leads to a weakness regarding the three requirements mentioned above. Recent advances offer methods to visualize features, describe attribution of the input (e.g.heatmaps), provide textual explanations or reduce dimensionality. However,are explanations for classification tasks dependent or are they independent of each other? For in-stance, is the shape of an object dependent on the color? What is the effect of using the predicted class for generating explanations and vice versa? In the context of explainable deep learning models, we present the first analysis of dependencies regarding the probability distribution over the desired image classification outputs and the explaining variables (e.g. attributes, texts, heatmaps). Therefore, we perform an Explanation Dependency Decomposition (EDD). We analyze the implications of the different dependencies and propose two ways of generating the explanation. Finally, we use the explanation to verify (accept or reject) the prediction

Full PDF

DDependency Decomposition and a Reject Option for Explainable Models

Jan KronenbergerComputer Science InstituteRuhr West University of Applied Sciences [email protected]

Anselm HaselhoffComputer Science InstituteRuhr West University of Applied Sciences [email protected]

Abstract

Deploying machine learning models in safety-related do-mains (e.g. autonomous driving, medical diagnosis) de-mands for approaches that are explainable, robust againstadversarial attacks and aware of the model uncertainty. Re-cent deep learning models perform extremely well in vari-ous inference tasks, but the black-box nature of these ap-proaches leads to a weakness regarding the three require-ments mentioned above. Recent advances offer methodsto visualize features, describe attribution of the input (e.g.heatmaps), provide textual explanations or reduce dimen-sionality. However, are explanations for classiﬁcation tasksdependent or are they independent of each other? For in-stance, is the shape of an object dependent on the color?What is the effect of using the predicted class for generat-ing explanations and vice versa?In the context of explainable deep learning models, wepresent the ﬁrst analysis of dependencies regarding theprobability distribution over the desired image classiﬁca-tion outputs and the explaining variables (e.g. attributes,texts, heatmaps). Therefore, we perform an ExplanationDependency Decomposition (EDD). We analyze the impli-cations of the different dependencies and propose two waysof generating the explanation. Finally, we use the explana-tion to verify (accept or reject) the prediction.

1. Introduction

In recent years the success of deep neural networks wasmainly driven by technological advances in network ar-chitectures and learning algorithms to improve the perfor-mance of the model. However, using these models in real-world applications an interaction with humans is inevitable.This interaction leads to new considerations regarding eth-ical and legal issues as well as aspects of user acceptance.Especially in safety-critical applications the models need tobe explainable. Therefore, decisions, predictions or recom-mendations of the machine learning model must be compre-hensible for the user. For example, an autonomous vehicle detecting a stop-sign or executing a full braking maneuvershould justify its decision. Predictions of classical methodslike decision trees that can be explained by introspectionand temporal fusion methods (e.g. Kalman ﬁltering) areoften enriched with an uncertainty measure (error covari-ance). In contrast, deep learning models are often treatedas black-box models without any explanation. Even proba-bilistic outputs of neural networks, which could be treatedas a conﬁdence measurement, are often poorly calibrated[10] as the conﬁdence estimate must match the accuracy.Recently, Kendall and Gall [16] proposed Bayesian deeplearning to model the needed uncertainty.Understanding the internal representation that neuralnetworks are constructing is one step into the direction ofunderstanding their way of reasoning and thus reduce thechance of malfunctions or successful attacks. Early workon hierarchical representations were already presented byHubel and Wiesel in 1962 [15]. Their approach of complexcells, which are activated by simpler cells, leads to an inter-nal representation of differently complex objects and prop-erties within the network. These internal representationscan be used to justify decisions and make them comprehen-sible [39]. They can also help to develop decision-makingsystems that are not biased, unfair or even racist [35].In this work we focus on methods that provide explana-tions in form of attribution of the input (e.g. visual con-cepts) or textual explanations. Fig. 1 shows an example,where an auxiliary network is used for explaining the pre-diction besides the base-network that performs the intrinsictask (e.g. classifying trafﬁc signs). The explanation couldbe a sentence containing the attributes (color, shape, sym-bols, etc.) that have contributed to the decision process ora region in the input space that was most important for theprediction. The appearance of attributes can also be usedto verify the prediction by using their joint probability andoptionally reject the decision.First, we introduce a loss function to jointly learn ex-plaining variables (attributes) and classiﬁcation outputs.The loss can be mapped to a broad class of explainable mod-els (e.g. [13, 28]). Second, we decompose or the factorize4321 a r X i v : . [ c s . C V ] D ec ttributes z . . .Input x Prediction ˆ y Explanation ˆ z Reject OptionFeature extraction f ClassiﬁerLogical veriﬁcationKnowledge aboutthe relation betweenclass and attributesFigure 1: Overview of the approach. In addition to a base-network that performs the intrinsic task (e.g. classifying trafﬁcsigns), an auxiliary model is added to explain the decision. The CNN extracts features for the classiﬁcation of trafﬁc signs andthe explaining attributes. The main contribution of this work is an analysis of different dependencies between the explainingattributes and the classiﬁer. Finally, the explaining attributes are used to verify the decision (predicted class).the dependencies of attributes and classiﬁcation outputs indifferent ways, to analyze the relations between explana-tions and the intrinsic task. In addition, appropriate networkarchitectures and a method for attribute-based veriﬁcationincluding a reject option are introduced. The evaluationis based on the german trafﬁc sign recognition benchmark(GTSRB, [33]).

2. Related Work

Convolutional neural networks already create internalrepresentations as proposed by [2, 43]. In order to makethese representations usable for the human user, they can besynthesized in different ways. Previous approaches solvethis problem by visualization of features [3, 7, 21, 24, 25,26, 28, 30, 40, 41], attribution [8, 11, 14, 17, 18, 31, 32, 34,42] or by descriptive sentences [13, 37, 38]. However, theseapproaches have several disadvantages. Visualizations ofactivations and ﬁlters may lead to a comprehensive expla-nation of the internal representations. However, they aredifﬁcult to understand by the user because the visualiza-tions are of high dimensionality. Image caption systemsare often more likely to generate generic captions and insome cases are not tailored to give explanations for a spe-ciﬁc decision [23]. We adress theis problem by using gen-eral as well as speciﬁc attributes. Another alternative is touse semantic parts to justify decisions [42]. These seman-tic parts describe small objects or visual concepts and areassigned to the individual classes by a voting system. Find-ing a meaningful description of a class can be very difﬁcult,since the differences between individual examples within aclass can be very large, while different classes are some- times very similar [12]. To create meaningful synthetictraining data for such applications, more extensive methodsare required [27].Choosing the attributes to explain the prediction is a hardtask as it requires prior knowledge. While we have chosenvery simple handcrafted attributes in our work like in otherapproaches [42], it is also possible to have them selectedunsupervised by the network [14]. However, this high vari-ety of attributes might not be usable for easy explanationsas labeling the unsupervised attributes is difﬁcult due to thepossibility of false interpretation.Having comprehensible decision-making systems is veryimportant because simple predictions systems can be fooledquite easily [1, 25] by modifying the input. These manip-ulations can be either modiﬁcations of the real object (e.g.contamination on trafﬁc signs, grafﬁti, . . . ) or adversarialattacks which are designed to confuse the DNNs by addinga small amount of noise [6, 9, 20]. However, while it is al-ready possible to prevent certain adversarial attacks by de-noising the input data [22] modiﬁcations on the real objectsare difﬁcult to recognize because the network compensatessuch errors and still predicts the most likely class. With ourmethod such unclear inputs can be detected and rejected ifnecessary attributes are missing.

3. Methods

In this paper we pose the supervised learning problemin terms of a probability distribution p ( y | x ; θ ) , where y de-notes the class in the set of all classes Y , x denotes the inputdata or image and θ is the parameter vector of the network.In explainable models we are interested in jointly learning4322he distribution of the output class and the explaining vari-ables z ; we call these variables attributes. The joint proba-bility distribution of the class and attributes given the inputdata can be written as p ( y, z | x ; θ ) . (1)In general the parameter vector θ is determined using max-imum likelihood estimation. We use a cross-entropy lossbetween the empirical data distribution ˆ p data and the modeldistribution p given by L ( θ ) = − E y,z,x ∼ ˆ p data log p ( y, z | x ; θ ) . (2) This section presents the details of the proposed methodregarding the decomposition of different dependencies be-tween attributes and class-labels. We start with a referencemodel, where no explaining variables are included. Basedon this reference, the dependencies are integrated into theprobabilistic model and the assumptions as well as the im-plications are discussed. In the following sections we distin-guish between parameters of the feature extractor θ f (con-volutional layers), classiﬁer θ y and explanation or attributes θ z , respectively. Up to now we have used the parametervector θ = θ y,z,f to denote all model parameters, that is θ y,z,f = ( θ y , θ z , θ f ) . The joint probability p ( y, z | x ; θ ) can be decomposed as either p ( y, z | x ; θ ) = p ( y | x ; θ ) p ( z | y, x ; θ ) , or (3) p ( y, z | x ; θ ) = p ( y | z, x ; θ ) p ( z | x ; θ ) . (4)Variable y is depended on z or vice versa. p ( z | y, x ; θ ) fromEq. 3 can be decomposed into the single attributes containedin zp ( z | y, x ; θ ) = p ( z e | y, x ; θ ) e − (cid:89) k =1 p ( z k | pa ( z k ) , y, x ; θ ) . (5)Eq. 4 can be decomposed in a similar fashion. In this con-text pa ( z k ) describes the parental variables of z k [4]. De-pending on the model, we used either Eq. 3 or Eq. 4 withdifferent parental variables. Tab. 1 summarizes the depen-dencies and the used equations for all utilized models.As a reference model M-REF we use a standard classiﬁ-cation model, without any explaining variables. This modelis used to verify the proposition of [36] that additional at-tributes used for explanation may limit the learning freedomof the network and thus lead to worse results. Fig. 2a showsthe schematic representation of the network divided into theinput data x , the convolution network, the classiﬁer (fullyconnected layer), and the class-output y . Table 1: Overview of the dependencies of the different mod-els. Model Base Equation pa ( z k ) M-REF p ( y | x ; θ ) n/a M-FI p ( y | x ; θ ) p ( z | x ; θ ) ∅ M-IACD

Eq. 3 y M-DACD

Eq. 3 { z k +1 , . . . , z e } M-CDIA

Eq. 4 ∅ M-CDDA

Eq. 4 { z k +1 , . . . , z e } M-FI ) The simplest way of integrating explanations is to assumefull independence. The model

M-FI was developed underthe assumption that the attributes are independent of eachother in the same way in which the class is independent ofthe attributes. The implication of these assumptions are,that, given the input data x the attributes z are not providingany additional information to solve the classiﬁcation task y and vice versa. It is assumed, that during the joint trainingprocess the attributes and classes can have an inﬂuence onthe parameter adjustment of feature extractor θ f . In con-trast, the parameters of the model that are dedicated to theattributes θ z are not adjusted by the classiﬁcation loss andvise versa. The corresponding network structure is visual-ized in Fig. 2b. M-IACD , M-DACD ) When designing an explainable model we want to get anexplanation for a speciﬁc decision of the model. In thiscase the assumption that the explaining attributes are in-dependent of the class may be oversimpliﬁed. To incor-porate the class information we use the model

M-IACD ,where we only assume independent attributes. Thus, theexplaining part of the model is given the ability to focuson class-speciﬁc explanations. The joint training process ofthe class and attributes may be unstable, since the attributemodel has to use the noisy class outputs. To overcome thisproblem we apply a teacher forcing using the ground-truthlabels during the ﬁrst iterations of the training process. Thenetwork structure is shown in Fig. 2c and it is obvious thatthe explaining part of the model can inﬂuence all parametersof network, including the classiﬁer. This aspect is possiblyrestricting the classiﬁcation performance.Using

M-IACD as a base-model, thus preserving theclass-dependency of the attributes, in addition a dependencyamong each attribute can be considered (cp. Fig. 2d). Thismodel has the minimal amount of assumptions possible; aswell as model

M-CDDA . As an example, the model can cap-ture the dependence of object shape and color attributes.4323 onv x y (a) Model:

M-REF . Conv x y z z z . . . z e (b) Model M-FI . + Conv x y z z z . . . z e (c) Model M-IACD . + Conv x y z z z . . . z e (d) Model M-DACD . + Conv x y z z z . . . z e (e) Model M-CDIA . + Conv x y z z z . . . z e (f) Model M-CDDA . Figure 2: Overview of the used models. While

M-REF is used to measure the inﬂuence of the additional attributes to thepristine classiﬁer y the other models differ according to their dependencies. Model M-FI has independent classiﬁers. Themodels

M-IACD and

M-DACD have a dependency of the attributes z on the class y . With model M-DACD the attributes arealso independent. In the models

M-CDIA and

M-CDDA the dependencies are reversed. For the sake of clarity, the individuallinks between the convolution layers and the attribute classiﬁers have been merged.

M-CDIA , M-CDDA ) In the previous models we have assumed a speciﬁc or-der while decomposing the joint probability distribution p ( y, z | x ; θ ) . The goal was to get an explanation for a spe-ciﬁc decision of a classiﬁer. A different way to look at theproblem is to give a possible explanation ﬁrst (what kind ofattributes are visible in the input space?) and deﬁne a classi-ﬁer to leverage explanations for an enhanced classiﬁcationperformance. Thus, we still obtain an explainable model,but with slight shift of the objective with a focus on theclassiﬁcation performance. This type of model with a clas-siﬁer dependent on the attributes and independence amongattributes is denoted as model M-CDIA . The correspondingnetwork structure is given in Fig. 2e.The last decomposition of p ( y, z | x ; θ ) is based on model M-CDIA , but without making any assumptions. Likewise,a dependence of the class on the attributes is used and inaddition the dependence among the attributes is preserved.The structure of model

M-CDDA is shown in Fig. 2f.

Our models represent their explanation by the presenceand absence of attributes. The explanation made by theDNN can be seen as an image speciﬁc explanation ˆ y, ˆ z = arg max y,z p ( y, z | x ; θ ) (6) as they explain the resulting decision with the attributes vis-ible in the image. The output of the DNN (ˆ y, ˆ z ) is the mostplausible combination of a class and the associated expla-nation. The explanation doesn’t need any knowledge aboutthe internal dependencies, but may be affected by them. The predicted attributes ˆ z can not only be utilized forexplaining the decision process, but also to verify, supportor reject a prediction of the base-network. In our applica-tion we can deﬁne sufﬁcient and necessary conditions (de-noted as C ) for each class based on the attributes. Nec-essary attributes must be recognized when a class is pre-dicted. Furthermore, we can directly deduce a class if sufﬁ-cient attributes of exclusively that class are recognized. Forexample given only a detected bicycle symbol we can di-rectly induce class

Bicycle lane . In general we can deﬁne atleast some necessary conditions that could be used to sup-port the decision. A simple categorization of the outputsfor attribute-based veriﬁcation of a prediction ˆ y is given byBelnap [5], who has introduced the categories1. True - we only have information about ˆ y being true (noinformation about ˆ y being false).2. Both - we have information about ˆ y being true or false(uncertain).3. False - we only have information about ˆ y being false.4324. None - we have no information.Depending on the application at hand we could utilize dif-ferent Belnap categories to deﬁne a reject option. We usethe Belnap category

True as a strong condition to accept aprediction and the three other categories deﬁne a reject. Inorder to apply the Belnap categories we use the predictedattributes ˆ z to deﬁne a possible set of class ˆ Y . The set in-cludes all classes that meet the deﬁned conditions C . Thereject option or veriﬁcation can then be expressed by h (cid:16) ˆ y, ˆ Y (cid:17) = (cid:26) accept, if ˆ y ∈ ˆ Y ∧ ˆ Y \ ˆ y = ∅ reject, otherwise .

4. Experiments

We use the dataset GTSRB [33] for our experiments.It consists of different classes showing german trafﬁcsigns. The a priori distribution of the classes are compen-sated by augmentation. Furthermore, the data is normalizedto have zero mean and a standard deviation of one, to com-pensate for different lighting conditions and a bias in ex-posure. For the classes we have determined the followingattributes sorted by complexity:• Simple : Main and border color (white, red, blue, black,yellow)•

Medium : Shapes (round, triangular, square and octag-onal)•

Complex : Numbers (0, . . . , 9) and symbols (car, truck,stop, animal, ice, children, people, construction site,attention, trafﬁc lights, bicycle, narrow point, uneven)Figure 3: Some examples of synthetic data. The colors,shapes and symbols are sampled from the original dataset.In order to cover different variations of attributes, syntheticdata, as shown in Fig. 3, is included. These synthetic sam-ples account for approximately

39 % of the dataset. Due tothe selected attributes, it is possible that several attributesare only available for a single class (for example, the sign”priority road” is the only one with a square shape and ayellow main color). In order to prevent the respective clas-siﬁers from accidentally learning the wrong property, thesynthetic data is used to create square signs with differentmain colors.

The architecture of the convolution network is based onAlexNet [19] with a reduced number of parameters and aninput size of × . Instead of the original . · param-eters in the AlexNet, the model M-REF uses only . · parameter. The remaining networks need between . · and . · parameter, depending on the complexity of thedependencies. The reason for fewer parameters is mainlya parameter reduction in the fully-connected layers. As theinternal representations become more complicated with thedepth of the network [29, 40], the attributes are classiﬁedbased on different feature layers of the network. Simple at-tributes, such as the main or border color, are determineddirectly after the second convolution layer. After the thirdlayer the shapes are predicted. The remaining complex at-tributes (symbols, numbers) are determined after the fourthlayer. The experiments show, that adding an auxiliary model forexplaining a DNN doesn’t have a signiﬁcantly negative ef-fect on the accuracy. The models have to be compared to thepristine classiﬁer

M-REF that has an accuracy of .

20 % (cp. Tab. 2). However, the predictions of the additionalattributes can have a very positive effect on improving theprediction of the class. Model

M-CDDA and

M-CDIA areexamples that use a classiﬁer with a dependency on the ex-planation and therefore can improve the classiﬁcation per-formance. This suggests that the prediction of the class ben-eﬁts from the additional knowledge about the attributes andthe additional parameters available in the network. The ac-curacy of the explanations of these two models is close tothe optimal model

M-DACD . In contrast, the models that fo-cus on class speciﬁc explanations (

M-DACD and

M-IACD )provide less accurate class predictions. This property maybe due to the fact that the attributes depend on the classand therefore their training has an inﬂuence on the classi-ﬁer network. However, as expected the best performanceregarding the explanation is achieved (

M-DACD ). Finally,using full independence of classes and explanations doesn’tchange the performance of the classiﬁer compared to thereference model. The explanations delivered by this kindof model are not of comparable quality given by all othermodels. The results for all models are presented in the ﬁrsttwo rows of Tab. 2.

The second part of Tab. 2 describes the accuracy of the pre-dictions with the option of rejecting a decision. The ac-curacy for all models increases while between . and4325able 2: The accuracy of class predictions and explanations is evaluated for the different models. In addition the reject optionbased on the attributes is evaluated only on the accepted predictions. The rejection rate deﬁnes the number of samples withuncertain predictions (no decision possible).Model M-REF M-DACD M-IACD M-FI M-CDDA M-CDIA

Evaluation of the class prediction and explanationAccuracy of ˆ y .

20 92 .

03 90 .

75 92 .

34 97 . . Accuracy of ˆ z . .

36 84 .

95 89 .

78 88 . Evaluation of the class prediction with reject optionAccuracy of ˆ y .

29 95 .

33 98 .

14 99 . . Rejection rate . .

64 15 .

05 10 .

22 11 . . of the decisions are rejected. The model M-FI hasthe highest rejection rate because the classiﬁer and the ex-plaining model are fully independent. Therefore, it is likelythat the explanation and the class prediction deliver contraryresults. However, the accuracy of class prediction increasesby almost using the reject option. The best accuracywith respect to the class prediction is again obtained withmodel M-CDIA . This model has a moderate rejection rateof . compared to the lowest rejection rate of . ( M-DACD ). It is worth to mention that this kind of rejectoption justiﬁes an acceptance or rejection. In Fig. 4 threedifferent outcomes of the veriﬁcation are shown. In the ﬁrstexample the decision of the classiﬁer and the explanationagree. Therefore, the decision of the classiﬁer can be ac-cepted. There are multiple reasons for getting a reject. Inthe second example one necessary attribute ( ) that is re-quired for the class is missing. In this case ˆ Y is empty (cat-egory None ) because no sufﬁcient condition for any class ismet and the decision is rejected. The third example showsa reject based on the category

Both . In this case two sufﬁ-cient attributes for different classes are detected. Irrespec-tive of whether the predicted class is in the set of all possi-ble classes ˆ Y , at least one other class is also supported. Thiscontradiction leads to a rejection of the decision.

5. Conclusion

We have presented an analysis of different dependencydecompositions for explainable models and a method to ver-ify the decision of the base-network that performs the intrin-sic task. The results have shown that additional attributescan be used to support, verify, and explain the predictionsof the network with an option of rejecting the decision. Fur-thermore, with the right dependencies, the usage of explain-ing attributes even lead to an increase in accuracy. The the-sis of [36] could be conﬁrmed for some of our models. Al-though the accuracy of model

M-IACD is slightly below thereference model

M-REF , the other models show a compara-ble or increased accuracy. From the increased performanceof the models

M-CDDA and

M-CDIA it can be concludedthat there are dependencies between the attributes. To ob-

Input ˆ y : speed limit 80 ˆ z : •, red & white, 8, 0 ˆ Y : speed limit 80 → Decision : accept ˆ y : speed limit 80 ˆ z : •, red & white, 0 ˆ Y : ∅ → Decision : reject ˆ y : Bicycle lane ˆ z : (cid:78) , r & w, bicycle, uneven ˆ Y : bumpy road, bicycle lane → Decision : rejectFigure 4: Examples of three decisions using the reject op-tion.tain reliable classiﬁcations with decent explanation it is rec-ommended to use models where the classiﬁer can beneﬁtfrom an attribute dependency and still has a low rejectionrate (e.g.

M-CDIA ). Adding the reject option for ambigu-ous inputs leads to an increase in accuracy for all of ourmodels. In addition the veriﬁcation process delivers a justi-ﬁcation for performing a reject.As the attributes were gathered supervised, they may notcover the best representation possible from the networkspoint of view, but they are guaranteed to be understandableby a human. Unsupervised explanation generation systems[13, 37] may have the problem to generate explanations thatare not directly interpretable. However, the introduced de-pendency decomposition can also be used or combined withunsupervised procedures (e.g. [14]).

References [1] N. Akhtar, J. Liu, and A. Mian. Defense against universaladversarial perturbations. arXiv preprint arXiv:1711.05929 ,2017.

2] B. Alsallakh, A. Jourabloo, M. Ye, X. Liu, and L. Ren. Doconvolutional neural networks learn class hierarchy?

TVCG ,24(1):152–162, 2018.[3] S. Bach, A. Binder, G. Montavon, F. Klauschen, K.-R.M¨uller, and W. Samek. On pixel-wise explanations for non-linear classiﬁer decisions by layer-wise relevance propaga-tion.

PloS one , 10(7):e0130140, 2015.[4] D. Barber.

Bayesian Reasoning and Machine Learning .Cambridge University Press, New York, NY, USA, 2012.[5] N. D. Belnap.

A Useful Four-Valued Logic , pages 5–37.Springer Netherlands, Dordrecht, 1977.[6] K. T. Co, L. Mu˜noz-Gonz´alez, and E. C. Lupu. Procedu-ral noise adversarial examples for black-box attacks on deepneural networks. arXiv preprint arXiv:1810.00470 , 2019.[7] D. Erhan, Y. Bengio, A. Courville, and P. Vincent. Visualiz-ing higher-layer features of a deep network. Technical report,Univerist´e de Montr´eal, 2009.[8] R. C. Fong and A. Vedaldi. Interpretable explanations ofblack boxes by meaningful perturbation. In

ICCV , pages3449–3457, 2017.[9] I. J. Goodfellow, J. Shlens, and C. Szegedy. Explainingand harnessing adversarial examples.

ICLR , abs/1412.6572,2015.[10] C. Guo, G. Pleiss, Y. Sun, and K. Q. Weinberger. On calibra-tion of modern neural networks. In

ICML , 2017.[11] E. M. Hand and R. Chellappa. Attributes for improvedattributes: A multi-task network utilizing implicit and ex-plicit relationships for facial attribute classiﬁcation. In

AAAI ,2017.[12] X. He and Y. Peng. Fine-grained visual-textual representa-tion learning.

TCSVT , pages 1–1, 2019.[13] L. A. Hendricks, Z. Akata, M. Rohrbach, J. Donahue,B. Schiele, and T. Darrell. Generating visual explanations.In

ECCV , 2016.[14] C. Huang, C. C. Loy, and X. Tang. Unsupervised learning ofdiscriminative attributes and visual representations.

CVPR ,pages 5175–5184, 2016.[15] D. H. Hubel and T. N. Wiesel. Receptive ﬁelds, binocularinteraction and functional architecture in the cat’s visual cor-tex.

The Journal of physiology , 160(1):106–154, 1962.[16] A. Kendall and Y. Gal. What Uncertainties Do We Need inBayesian Deep Learning for Computer Vision? In

NIPS ,2017.[17] P.-J. Kindermans, S. Hooker, J. Adebayo, M. Alber, K. T.Sch¨utt, S. D¨ahne, D. Erhan, and B. Kim. The (un)reliabilityof saliency methods.

NIPS , 2017.[18] P.-J. Kindermans, K. T. Sch¨utt, M. Alber, K.-R. M¨uller,D. Erhan, B. Kim, and S. D¨ahne. Learning how to explainneural networks: PatternNet and PatternAttribution. arXive-prints , page arXiv:1705.05598, May 2017.[19] A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenetclassiﬁcation with deep convolutional neural networks. In

NIPS , pages 1097–1105, 2012.[20] A. Kurakin, I. J. Goodfellow, and S. Bengio. Adversarial ex-amples in the physical world.

ICLR , abs/1607.02533, 2017.[21] S. Lapuschkin, A. Binder, G. Montavon, K.-R. M¨uller, andW. Samek. The lrp toolbox for artiﬁcial neural networks.

Journal of Machine Learning Research , 17(114):1–5, 2016. [22] F. Liao, M. Liang, Y. Dong, T. Pang, J. Zhu, and X. Hu. De-fense against adversarial attacks using high-level representa-tion guided denoiser. In

CVPR , pages 1778–1787, 2018.[23] A. Lindh, R. J. Ross, A. Mahalunkar, G. Salton, and J. D.Kelleher. Generating diverse and meaningful captions. In

ICANN , pages 176–187. Springer, 2018.[24] A. Nguyen, J. Clune, Y. Bengio, A. Dosovitskiy, andJ. Yosinski. Plug & play generative networks: Conditional it-erative generation of images in latent space. In

CVPR , pages3510–3520, July 2017.[25] A. Nguyen, J. Yosinski, and J. Clune. Deep neural networksare easily fooled: High conﬁdence predictions for unrecog-nizable images. In

CVPR , 2014.[26] C. Olah, A. Mordvintsev, and L. Schubert. Feature vi-sualization.

Distill , 2017. https://distill.pub/2017/feature-visualization.[27] X. Peng, Z. Tang, F. Yang, R. S. Feris, and D. Metaxas.Jointly optimize data augmentation and network training:Adversarial data augmentation in human pose estimation. In

CVPR , pages 2226–2234, 2018.[28] M. T. Ribeiro, S. Singh, and C. Guestrin. ”why should i trustyou?”: Explaining the predictions of any classiﬁer. In

HLT-NAACL Demos , 2016.[29] M. Riesenhuber and T. Poggio. Hierarchical models of ob-ject recognition in cortex.

Nature neuroscience , 2(11):1019,1999.[30] R. R. Selvaraju, M. Cogswell, A. Das, R. Vedantam,D. Parikh, and D. Batra. Grad-cam: Visual explanations fromdeep networks via gradient-based localization. In

ICCV ,pages 618–626, 2017.[31] K. Simonyan, A. Vedaldi, and A. Zisserman. Deep in-side convolutional networks: Visualising image classiﬁca-tion models and saliency maps, 2013.[32] J. T. Springenberg, A. Dosovitskiy, T. Brox, and M. A. Ried-miller. Striving for simplicity: The all convolutional net.

ICLR , abs/1412.6806, 2015.[33] J. Stallkamp, M. Schlipsing, J. Salmen, and C. Igel. Man vs.computer: Benchmarking machine learning algorithms fortrafﬁc sign recognition.

Neural networks : the ofﬁcial jour-nal of the International Neural Network Society , 32:323–32,2012.[34] M. Sundararajan, A. Taly, and Q. Yan. Axiomatic attributionfor deep networks, 2017.[35] E. Thelisson, K. Padh, and L. E. Celis. Regulatory mecha-nisms and algorithms towards trust in ai/ml. In

IJCAI , 2017.[36] R. Turner. A model explanation system. In

MLSP , 2016.[37] O. Vinyals, A. Toshev, S. Bengio, and D. Erhan. Show andtell: A neural image caption generator.

CVPR , pages 3156–3164, 2015.[38] K. Xu, J. Ba, R. Kiros, K. Cho, A. C. Courville, R. Salakhut-dinov, R. S. Zemel, and Y. Bengio. Show, attend and tell:Neural image caption generation with visual attention. In

ICML , 2015.[39] Y. Xu, L. Qin, X. Liu, J. Xie, and S.-C. Zhu. A causal and-orgraph model for visibility ﬂuent reasoning in tracking inter-acting objects. In

CVPR , pages 2178–2187, 2018.

40] M. D. Zeiler and R. Fergus. Visualizing and understandingconvolutional networks. In D. Fleet, T. Pajdla, B. Schiele,and T. Tuytelaars, editors,

ECCV , pages 818–833, Cham,2014. Springer International Publishing.[41] M. D. Zeiler, G. W. Taylor, and R. Fergus. Adaptive decon-volutional networks for mid and high level feature learning.

ICCV , pages 2018–2025, 2011.[42] Z. Zhang, C. Xie, J. Wang, L. Xie, and A. L. Yuille. Deep-voting: A robust and explainable deep network for semanticpart detection under partial occlusion. In

CVPR , pages 1372–1380, 2018.[43] B. Zhou, A. Khosla, `A. Lapedriza, A. Oliva, and A. Tor-ralba. Object detectors emerge in deep scene cnns.

ICLR ,abs/1412.6856, 2014.,abs/1412.6856, 2014.