Do Explanations Reflect Decisions? A Machine-centric Strategy to Quantify the Performance of Explainability Algorithms
Zhong Qiu Lin, Mohammad Javad Shafiee, Stanislav Bochkarev, Michael St. Jules, Xiao Yu Wang, Alexander Wong
DDo Explanations Reflect Decisions? AMachine-centric Strategy to Quantify thePerformance of Explainability Algorithms
Zhong Qiu Lin , , , ∗ , Mohammad Javad Shafiee , , , ∗ , Stanislav Bochkarev Michael St. Jules , Xiao Yu Wang , Alexander Wong , , Vision & Image Processing Group, Systems Design Engineering, University of Waterloo Waterloo Artificial Intelligence Institute, Waterloo, ON DarwinAI Corp., Waterloo, ON ∗ Equal Contribution
Abstract
There has been a significant surge of interest recently in the research communityaround the concept of explainable artificial intelligence (XAI), where the goal is toproduce an interpretation for a decision made by a machine learning algorithm. Ofparticular interest is the interpretation of how deep neural networks make decisions,given the complexity and ‘black box’ nature of such networks. Given the infancy ofthe field, there has been very limited exploration into the assessment of the perfor-mance of explainability methods, with most evaluations centered around subjectivevisual interpretation of the produced interpretations. In this study, we explore amore machine-centric strategy for quantifying the performance of explainabilitymethods on deep neural networks via the notion of decision-making impact analy-sis. More specifically, we quantify the importance of identified critical factors for agiven decision made by a network based on the impact over network decisions andconfidences in the absence of these critical factors. For scenarios where we wish tostudy impact in directed erroneous decisions (e.g., under adversarial distractions),we additionally quantify importance of identified critical factors based on coverageof the adversarially impacted factors. We introduce two quantitative performancemetrics: i) Impact Score, which assesses the percentage of critical factors witheither strong confidence reduction impact or decision changing impact, and ii) Im-pact Coverage, which assesses the percentage coverage of adversarially impactedfactors in the input. A comprehensive analysis using this approach was conductedon several state-of-the-art explainability methods (LIME, SHAP, Expected Gradi-ents, GSInquire) on a ResNet-50 deep convolutional neural network using a subsetof ImageNet for the task of image classification. Experimental results show that,for both general and adversarial distraction scenarios, the critical regions identifiedby LIME within the tested images had the lowest impact on the decision-makingprocess of the network ( ∼ ∼ ∼ ∼ The significant advances in deep learning [8], in particular deep neural networks, has led to the risein adoption across industry. This has also led to a tremendous rise in research in the area of deeplearning and its application for a wide variety of tasks, leading to state-of-the-art performance across
Preprint. Under review. a r X i v : . [ c s . L G ] O c t arious tasks such as visual, perception [18, 6, 11], speech recognition [2], and natural languageprocessing [22, 5]. However, as the proliferation of deep learning continues, there is now a growinginterest as well as concern over how deep neural networks are making decisions, particularly forlife-critical applications such as autonomous driving and clinical decision support. Given the sheercomplexity of deep neural networks and how information propagates through such networks to forma decision, deep learning has been often viewed as a ‘black box’ machine learning method andvery difficult to interpret and understand the decision-making process or the key factors involvedin the decision. This makes deep learning challenging to leverage particularly in regulated spaceswhere interpretability and transparency is a necessity (e.g., finance and healthcare). Furthermore,this challenge of interpretability also makes it very difficult for machine learning engineers andscientists to understand biases and error scenarios of the trained network to improve upon, as well assituations where the network is deciding based on unintended patterns in the dataset [10]. This isparticular critical given the recent rise of adversarial examples [19, 1], which are designed specificallyto cause deep neural networks to make erroneous decisions and the understanding of how networksbehave is very important to better devise ways to defend against them. As such, the ability to explainthe decision-making process of deep neural networks can be critical for enabling the developmentof improved, more dependable deep learning as well as enable the use of deep learning in a moretrust-worthy manner in mission-critical scenarios.Due to this critical need for increased transparency and interpretability in deep learning, there hasbeen a considerable increase in research interest on explainability methods for interpreting thedecision-making process of a deep neural network. In the field of computer vision, such explainabilitymethods typically manifest their interpretations of the decision-making process of visual perceptionneural networks in the form of visual saliency maps that highlight critical regions deemed by themethod as important in making the decision. While such visual interpretations aim to give newinsights into the way deep neural networks make decisions, much of the evaluation around the visualinterpretations produced by explainability methods have been largely subjective as such, ironically,is up to the interpretation of the human observer and thus difficult to judge whether these identifiedcritical regions are in fact reflective of what the deep neural network is leveraging to make decisions.While this current gap in the exploration of quantitative performance assessment of explainabilitymethods in terms of their impact on decisions made by deep neural networks is understandable givenhow new this area of research is, this gap hinders the level of human trust in not just the deep neuralnetworks but also in the explainability methods themselves. In fact, quantitative methods to assess theperformance of explainability methods is critical to not only trust in decisions made but also in thechoice of method for deployment and research development, especially since different explainabilitymethods can produce drastically different explanations given the same input data and same modeland so it is difficult to know if algorithmic extensions on such explainability approaches actuallyimproves interpretability.In this study, we explore a more machine-centric strategy for quantifying the performance of ex-plainability methods on deep neural networks via the notion of decision-making impact analysis.More specifically, we introduce a new performance metric (which we will refer to as the ImpactScore ) for quantifying how well the critical factors identified by an explainability method reflects agiven decision made by a network based on the impact over network decisions and confidences in theabsence of these critical factors. For scenarios where we wish to assess impact in directed erroneousdecisions (e.g., under adversarial distractions), we introduce an additional performance metric (whichwe will refer to as
Impact Coverage ) for quantifying the coverage of the identified critical factorson the adversarial impacted factors. Based on these metrics, we conduct a comprehensive analysisof the performance of four different state-of-the-art methods from recent research literature on thetask of image classification to study how such methods compare against each other in terms of howmuch impact the critical regions identified in the explanations produced by each method actuallyhas on the decision-making process under both general and adversarial scenarios. To the best ofthe authors’ knowledge, this is the first systematic study to quantitatively assess the performanceof several state-of-the-art explainability methods based on how impactful their explanations are todecisions made by a network under both normal and adversarial scenarios.
The explainability methods in current research literature can generally divided into two main cat-egories [20]. In the first category of explainability methods, which we will refer to as proxy direct strategies [17, 16, 14, 15, 4, 21] the decision-making process of a deepneural network is mainly interpreted by studying the internal behaviour within a deep neural networkdirectly and then surfacing that information as an explanation for the decision-making process ofthe network. The most well-known of proxy methods is LIME [12], which takes advantage of alinear proxy model to approximate the behavioural of the targeted machine learning model andthen interprets the original model based on the learnt proxy. Proxy approaches are considered as’black box’ approaches where the explainability method does not have direct access to the innerworkings of the network and the proxy model approximates it given the input and the output to thenetwork. On the other hand, direct explainability algorithms are usually considered as ‘white box’methods as they require access to the inner workings of a deep neural network such as gradientsand activations at different layers for a given input to identify the key factors within the input that iscritical to the decision-making process. For example, by leveraging information about gradients, itis possible to quantify how much change in the input data would turn the decision of the networkto another output and as such measure the importance of each input in the decision-making process.Notable gradient-based direct explainability approaches include Integrated Gradient [17], GuidedBackpropagation [16], Guided GradCAM [14], SmoothGrad [15] and Expected Gradients [4].
Much of research literature around explainability, particularly for visual perception tasks such asimage classification, has revolved around subjective visual interpretation of the explanations producedby the explainability method. This usually takes on the form of visual saliency maps, where salientregions in the map produced using the explainability method of choice are considered as criticalregions influencing the decision made by a network. However, due to the purely qualitative nature ofsuch visual assessments, it is very challenging to get a good sense as to how well an explainabilitymethod is performing, how useful or meaningful the provide explanation is relative to its influenceover the network’s decision and its associated confidence, and more importantly how well it performscompared to other explainability methods. As such, this can limit progress in the field of explainableartificial intelligence since there is no method of benchmarking based on subjective visual assessment.More recently, there have been explorations into human-centric strategies for quantifying explain-ability performance in the case of visual perception, where the visual saliency map produced using agiven explainability method for a given image is compared with a visual attention maps created basedon gaze information collected from human subjects [7]. While such an approach is a step towardsquantification of explanations produced by explainability methods, one of the biggest limitationsof such an approach is the underlying assumption that a deep neural network makes decisions in asimilar manner as human subjects, which is often not true. As such, this human-centric approach toquantifying explainability performance provides very little insight on the actual driving factors of thedecision-making process of deep neural networks. Furthermore, this approach requires considerablehuman gaze information to be collected, which is simply impractical for most real-world scenarios.To address the limitations of human-centric strategies for quantifying the performance of explainabilitymethods, we take a drastically different direction by instead exploring a more machine-centric strategywhere we quantify performance based on the decision-making behaviour of the network itself. Morespecifically, we aim to quantify the performance of explainability methods on deep neural networksvia the notion of decision-making impact analysis, where we instead study the quantitative impact ofcritical factors identified by an explainability method for a given decision made by a network basedon the changes in decisions and associated confidences in the decisions of the network itself.In the below sections, we will first define a performance metric for quantifying the impact of criticalfactors identified by an explainability method on decisions and the confidence in those decisionsas made by a given deep neural network. Next, we introduce an additional performance metric fordirected erroneous decision scenarios based around the concept of impact coverage.
In order to be facilitate for the quantitative assessment of the performance of a given explainabilitymethod, the first step is to first define and formulate a performance metric for performing suchan assessment. Motivated towards taking a machine-centric strategy to quantitative performanceassessment of a given explainability method on a particular deep neural network, we aim to developmetrics that quantify the importance of critical factors identified by the explainability method for a3iven decision made by a network based on the impact these factors have over network decisions andthe associated confidences. We consider the critical factors c identified by an explainability method M to be important to a decision y made by a deep neural network N for a given input x if either ofthe following conditions are met: • Decision-level impact : The decision made by the deep neural network changes in theabsence of the critical factors. • Confidence-level impact : The confidence of the deep neural network in its decision z changes by τ % in the absence of the critical factors.The motivation behind this definition of importance for critical factors as identified by a givenexplainability method is based on the idea that, if the critical factors are indeed crucial to the decision-making process of the deep neural network, then the absence of these critical factors in the giveninput will have such an impact that the network behaves in a way that it would either be significantlyless confident in its current decision, or so unconfident in its decision that its confidence in anotherdecision is higher and thus leads the network to make a different decision all together.In this study, we formulate the performance metric I , which we will refer to as the Impact Score, asfollows. Let the relationships between the critical factors c , explainability method M , the input x , thedecision y , confidence in the decision z , and the network N be expressed by the following equations: { y, z } = N ( x ) , (1) c = M ( x, N ) , (2)where c ∈ x . Based on this, we can define the input in absence of c as identified by M as, x (cid:48) = x − c, (3)and the decision given x (cid:48) as input into N as, { y (cid:48) , z (cid:48) } = N ( x (cid:48) ) . (4)Therefore, in the general scenario, based on the conditions defined above that the critical factors c fora given input x as identified by M must meet to be deemed as important, we can define the ImpactScore I across a set of n inputs X = { x , x , . . . , x n } as: I = 1 n n (cid:88) i =1 (( y (cid:48) i (cid:54) = y i ) ∨ ( z (cid:48) i ≤ τ z i )) . (5)where i denotes the i th input. In this study, we set τ = 0 . to indicate that the network has lost halfof the confidence it had on its original decision. Finally, we also introduce a stricter variant of theabove Impact Score, denoted by I strict where we only consider decision-level impact: I strict = 1 n n (cid:88) i =1 ( y (cid:48) i (cid:54) = y i ) . (6) Impact Coverage.
In the scenario where we wish to study impact in directed erroneous decisions(e.g., decisions made under the influence of adversarial examples), we introduce an additionalapproach to quantitatively assessing performance of the different explainability methods since thecritical factors that the network leverages to make a decision are largely known a priori to theevaluation (e.g., in the case of an adversarial patch, the critical region that is important to the decision-making process is the adversarial patch itself) More specifically, we can further quantify importanceof the identified critical factors c based the amount of coverage of the adversarially impacted factorsin x by the critical factors c .Let us define the Impact Coverage metric I coverage across a set of n inputs X = { x , x , . . . , x n } based on the intersection-over-union between the adversarially impacted factors and the criticalfactors across the given set of inputs: I coverage = 1 n n (cid:88) i =1 ( a i ∨ c i )( a i ∪ c i ) . (7)where a i is the adversarially impacted factors in input x i . As such, the Impact Coverage metricis designed to be high when heavy overlapping between the identified critical factors and theadversarially impacted factors to reward strong alignment between the explanation produced by theexplainability method and the actual factors impacting decision.4igure 1: Example of a decision change due to absence of critical regions in the decision-makingprocess. (left) original image; (center) identified critical region; (right) prediction confidences fordecisions made with original image and with the absence of critical regions. The absence of criticalregions led to a change in decision, which means the explanation reflects impact on the decision. The conducted experiments and the explainability methods used in this study are described below.
For the first experiment, we quantitatively evaluate the performance of several state-of-the-art ex-plainability methods using the two variants of Impact Score (i.e., I and I strict ) for each explainabilitymethod M using a ResNet-50 deep convolutional neural network designed for the task of imageclassification as the reference network N . A subset of the ImageNet [13] dataset is leveraged asinput X . More specifically, we leveraged a subset of 410 different images from the ImageNet dataset,all of which had correct classifications for consistency purposes. As such, this experiment tasks thedifferent explainability methods to identify critical regions within a natural image that is importantto the class prediction made by the network, such that in the absence of such critical regions theconfidence by the network in the predicted class is either significant reduced or results in an altogetherdifferent class prediction. An example of a decision change that resulted from the absence of criticalregions identified by an explainability method during the decision-making process is known in Fig. 1.The purpose of this first experiment is the quantitatively evaluate explainability performance under amore general scenario where decisions are made on untampered data inputs and decisions are madeby the network on such data inputs, and is representative of the general use case. For the second experiment, we quantitatively evaluate the performance of several state-of-the-artexplainability methods using the two variants of Impact Score (i.e., I and I strict ), as well as I coverage for each explainability method M in the presence of visual ’distractions’ in the form of adversarialpatches to better study the impact in directed erroneous decisions. More specifically, we leverage theadversarial patches from the work of Brown et al. [3]. For generating the adversarial patch, we fix thereference network N aforementioned in Experiment 1, and apply adversarial training for the samesubset of the ImageNet [13] dataset as Experiment 1. Later, we randomly (e.g. random translationand random rotation of the patch) overlay the resulting adversarial patches on the same subset ofimages with different patch scales ranging from . to . . An example of a directed erroneousdecision due to adversarially impacted area is known in Fig. 2. We compute I , I strict , and I coverage for each patch scale over the test images, of which the prediction classes change to the adversariallytargeted classes. With the adversarial patch being the control variable, the critical region that isimportant to the decision-making process is largely known a prior to be the adversarial patch itself,and as such I coverage provides an additional quantitative indicator for the ability of the explainabilitymethod to identify such adversarially impacted areas within the images that has a direct impact in thedecisions made by the deep neural network. In this study, the proposed Impact Score and Impact Coverage is leveraged to perform a comprehensiveanalysis of on several state-of-the-art explainability methods in research literature. More specifically,the methods under study are: i) LIME [12], ii) SHAP [9], iii) Expected Gradients [4], and iv)GSInquire [21]. These methods were selected as they represent a good coverage of both popular andstate-of-the-art methods from both the proxy and direct categories of explainability methods.5igure 2: Example of a directed erroneous decision due to adversarially impacted area. (left) originaluntampered image, (center) tampered image with an adversarial patch, (right) prediction confidencesof decisions made with untampered image and adversarially tampered image. The adversarial patchled to a change in decision.
The experimental results for the two experiments conducted in this study is presented below.
Experiment 1:
The quantitative performance of the four tested explainability methods as determinedby the proposed Impact Scores in the first experiment is shown in Table 1. A number of interestingobservations can be made. First, it can be observed that LIME achieved the lowest I and I strict scores, thus indicating that the critical regions identified by LIME had the lowest impact on the actualdecision-making process of the network in identifying the class for a given image when comparedto the other tested methods, with a difference in I and I strict between SHAP and LIME of over6% and over 5%, respectively. Second, it can be observed that there is a progressive increase indecision-making impact from SHAP to Expected Gradients, with a significant absolute increase in I and I Strict by over 7% and over 7.5%, respectively. While both SHAP and Expected Gradientsapproximate Shapley values, this significant improvement achieved by Expected Gradients overSHAP can be attributed to the incorporation of ideas behind three of most recent state-of-the-artconcepts in explainability (SHAP, Integrated Gradients [17], and SmoothGrad [15]) within a commonexpected value formulation, leading to the identification of more impactful critical regions. Third, itcan be observed that GSInquire achieved the highest I and I strict scores amongst the tested methods,achieving a significant absolute increase of close to 25% and close to 3% in I and I strict , respectively,when compared to Expected Gradients. What’s interesting about this observation is the fact that theimprovements of GSInquire for I is significantly higher than the improvements for I strict , whichindicates that a much larger number of tested images experienced a significant confidence-level impactin the absence of the critical regions identified by GSInquire when compared to those identified bythe other methods, while the improvements in decision-level impact is significant but less drastic. Anexample image, the critical regions identified by tested explainability methods, and the predictionconfidences with and in absence of the identified critical regions are shown in Fig. 3. It can beobserved that for one of the example images where both Expected Gradients and GSInquire identifieddecision-impacting critical regions while SHAP and LIME did not (middle row), the absence of thecritical regions that SHAP identified not only did not lead to a decision change by the network, butinstead led to an increase in prediction confidence for the original decision and as such illustrativeof an explanation that does the reflect the decision-making process of the network. Furthermore, asillustrated by the example image in the first row, no explainability method is perfect and the criticalregions identified may not have decision-level impact (In this example, while decisions did not changein the absence of identified critical regions, the critical regions identified by GSInquire led to thehighest prediction confidence change amongst the test methods). Experiment 2:
The quantitative performance of the four tested explainability methods as determinedby the proposed Impact Score and Impact Coverage in the second experiment is shown in Table 2.A number of interesting observations can be made. First, it can be observed that LIME achievedthe lowest I , I strict , and I coverage scores across all adversarial patch scales, thus indicating that thecritical regions identified by LIME have the lowest impact as well as coverage of the adversariallyimpacted areas in the test images amongst the tested methods. Second, it can be observed that bothSHAP and Expected Gradients had similar I , I strict , and I coverage scores, while GSInquire had6able 1: Performance of tested explainability methods based on impact on network decisions.Method I I
Strict
LIME [12] 38.05% 35.12%SHAP [9] 44.15% 40.24%Expected Gradients [4] 51.22% 47.80%GSInquire [21] 76.10% 50.73%Figure 3: Example images, the corresponding critical regions identified by tested explainabilitymethods, and prediction confidences with and in absence of the identified critical regions.LIME [12] SHAP [9] ExpectedGradients [4] GSInquire [21]significantly higher I , I strict , and I coverage scores than both SHAP and Expected Gradients acrossall adversarial patch scales. Example adversarially modified erroneous images via adversarial patchesand the corresponding critical regions identified by tested explainability methods as being importantto the decision made by the network are shown in Figure 4. It can be observed that both ExpectedGradients and GSInquire were both able to better identify more adversarially impacted areas, withGSInquire achieving the best identification coverage for the adversarially impacted areas. In this study, we explored a more machine-centric strategy for quantifying the performance ofexplainability methods on deep convolutional neural networks by quantifying the importance ofcritical factors identified by an explainability method for a given decision made by a network. This isaccomplished by studying the impact of identified factors on the decision and the confidence in the7able 2: Performance of tested explainability methods at different adversarial patch scales
Scale LIME [12] SHAP [9] Expected Gradient [4] GSInquire [21] I coverage I I strict I coverage I I strict I coverage I I strict I coverage I I strict
Figure 4: Example adversarially modified erroneous images via adversarial patches at different scales,and the corresponding critical regions identified by tested explainability methods as being importantto the decision made by the network.Patch Scale Ground Truth/ AdversarialLabel LIME [12] SHAP [9] ExpectedGradients [4] GSInquire [21]0.30 Television /Monitor0.40 Suit / Cup0.50 Necklace /Cup0.60 Sweatshirt /Monitor0.70 Cup /Necklacedecision, and additionally the coverage of adversarially impacted factors in the directed erroneousdecision scenario. A comprehensive analysis using this approach showed that, in the case of visualperception tasks such as image classification, some of the most popular and widely-used methods suchas LIME and SHAP may produce explanations that may not be as reflective as expected of what thedeep neural network is leveraging to make decisions. Newer methods such as Expected Gradients andGSInquire performed significantly better in general scenarios, and GSInquire performing significantlybetter in adversarially distraction scenarios as well, though there is significant room for improvementand thus illustrating the importance of such quantitative metrics for benchmarking methods to betterunderstand where our current approaches stand and where we can improve. While by no meansperfect, the hope is that the proposed machine-centric strategy helps push the conversation forwardtowards better metrics for evaluating explainability methods in a manner that gives insights to guidenetwork error mitigation as well as improve trust in deep neural networks. Future work involvesstudying the quantitative performance of explainability methods under different use cases such asspeech recognition and natural language processing tasks, as well as the extension of the proposedImpact Score to incorporate a wider range of factors for more thorough quantitative assessment.8 eferences [1] Naveed Akhtar and Ajmal Mian. Threat of adversarial attacks on deep learning in computer vision: Asurvey.
CoRR , abs/1801.00553, 2018.[2] Dario Amodei, Sundaram Ananthanarayanan, Rishita Anubhai, Jingliang Bai, Eric Battenberg, Carl Case,Jared Casper, Bryan Catanzaro, Qiang Cheng, Guoliang Chen, et al. Deep speech 2: End-to-end speechrecognition in english and mandarin. In
International Conference on Machine Learning , pages 173–182,2016.[3] Tom B Brown, Dandelion Mané, Aurko Roy, Martín Abadi, and Justin Gilmer. Adversarial patch. arXivpreprint arXiv:1712.09665 , 2017.[4] Gabriel Erion, Joseph D Janizek, Pascal Sturmfels, Scott Lundberg, and Su-In Lee. Learning explainablemodels using attribution priors. arXiv preprint arXiv:1906.10670 , 2019.[5] Matt Gardner, Joel Grus, Mark Neumann, Oyvind Tafjord, Pradeep Dasigi, Nelson Liu, Matthew Peters,Michael Schmitz, and Luke Zettlemoyer. Allennlp: A deep semantic natural language processing platform. arXiv preprint arXiv:1803.07640 , 2018.[6] Gao Huang, Zhuang Liu, Laurens Van Der Maaten, and Kilian Q Weinberger. Densely connectedconvolutional networks. In
CVPR , volume 1, page 3, 2017.[7] Qiuxia Lai, Wenguan Wang, Salman Khan, Jianbing Shen, Hanqiu Sun, and Ling Shao. Human vs.machine attention in neural networks: A comparative study. arXiv preprint arXiv:1906.08764 , 2019.[8] Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. Deep learning. nature , 521(7553):436, 2015.[9] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. In
Advances inNeural Information Processing Systems , pages 4765–4774, 2017.[10] Timothy Niven and Hung-Yu Kao. Probing neural network comprehension of natural language arguments. arXiv preprint arXiv:1907.07355 , 2019.[11] Joseph Redmon and Ali Farhadi. Yolo9000: better, faster, stronger. arXiv preprint , 2017.[12] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. Why should i trust you?: Explaining thepredictions of any classifier. In
Proceedings of the 22nd ACM SIGKDD international conference onknowledge discovery and data mining , pages 1135–1144. ACM, 2016.[13] Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang,Andrej Karpathy, Aditya Khosla, Michael Bernstein, et al. Imagenet large scale visual recognition challenge.
International Journal of Computer Vision , 115(3):211–252, 2015.[14] Ramprasaath R Selvaraju, Abhishek Das, Ramakrishna Vedantam, Michael Cogswell, Devi Parikh, andDhruv Batra. Grad-cam: Why did you say that? arXiv preprint arXiv:1611.07450 , 2016.[15] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad:removing noise by adding noise. arXiv preprint arXiv:1706.03825 , 2017.[16] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity:The all convolutional net. arXiv preprint arXiv:1412.6806 , 2014.[17] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In
Proceedingsof the 34th International Conference on Machine Learning-Volume 70 , pages 3319–3328. JMLR. org,2017.[18] Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jon Shlens, and Zbigniew Wojna. Rethinking theinception architecture for computer vision. In
Proceedings of the IEEE conference on computer vision andpattern recognition , pages 2818–2826, 2016.[19] Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian J. Goodfellow, andRob Fergus. Intriguing properties of neural networks.
CoRR , 2013.[20] Erico Tjoa and Cuntai Guan. A survey on explainable artificial intelligence (xai): Towards medical xai. arXiv preprint arXiv:1907.07374 , 2019.[21] Alexander Wong, Mohammad Javad Shafiee, Brendan Chwyl, and Francis Li. Ferminets: Learninggenerative machines to generate efficient neural networks via generative synthesis. arXiv preprintarXiv:1809.05989 , 2018.[22] Tom Young, Devamanyu Hazarika, Soujanya Poria, and Erik Cambria. Recent trends in deep learningbased natural language processing. ieee Computational intelligenCe magazine , 13(3):55–75, 2018., 13(3):55–75, 2018.