[PDF] A Survey on Explainable Artificial Intelligence (XAI): Towards Medical XAI

Abstract

Recently, artificial intelligence and machine learning in general have demonstrated remarkable performances in many tasks, from image processing to natural language processing, especially with the advent of deep learning. Along with research progress, they have encroached upon many different fields and disciplines. Some of them require high level of accountability and thus transparency, for example the medical sector. Explanations for machine decisions and predictions are thus needed to justify their reliability. This requires greater interpretability, which often means we need to understand the mechanism underlying the algorithms. Unfortunately, the blackbox nature of the deep learning is still unresolved, and many machine decisions are still poorly understood. We provide a review on interpretabilities suggested by different research works and categorize them. The different categories show different dimensions in interpretability research, from approaches that provide "obviously" interpretable information to the studies of complex patterns. By applying the same categorization to interpretability in medical research, it is hoped that (1) clinicians and practitioners can subsequently approach these methods with caution, (2) insights into interpretability will be born with more considerations for medical practices, and (3) initiatives to push forward data-based, mathematically- and technically-grounded medical education are encouraged.

Full PDF

JJOURNAL OF L A TEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 1

A Survey on Explainable Artiﬁcial Intelligence(XAI): towards Medical XAI

Erico Tjoa, and Cuntai Guan,

Fellow, IEEE

Abstract —Recently, artiﬁcial intelligence and machine learningin general have demonstrated remarkable performances in manytasks, from image processing to natural language processing,especially with the advent of deep learning. Along with researchprogress, they have encroached upon many different ﬁelds anddisciplines. Some of them require high level of accountability andthus transparency, for example the medical sector. Explanationsfor machine decisions and predictions are thus needed to justifytheir reliability. This requires greater interpretability, whichoften means we need to understand the mechanism underlyingthe algorithms. Unfortunately, the blackbox nature of the deeplearning is still unresolved, and many machine decisions are stillpoorly understood. We provide a review on interpretabilitiessuggested by different research works and categorize them.The different categories show different dimensions in inter-pretability research, from approaches that provide ”obviously”interpretable information to the studies of complex patterns. Byapplying the same categorization to interpretability in medicalresearch, it is hoped that (1) clinicians and practitioners cansubsequently approach these methods with caution, (2) insightsinto interpretability will be born with more considerations formedical practices, and (3) initiatives to push forward data-based,mathematically- and technically-grounded medical education areencouraged.

Index Terms —Explainable Artiﬁcial Intelligence, Survey, Ma-chine Learning, Interpretability, Medical Information System.

I. I

NTRODUCTION M ACHINE LEARNING (ML) has grown large in bothresearch and industrial applications, especially with thesuccess of deep learning (DL) and neural networks (NN), solarge that its impact and possible after-effects can no longerbe taken for granted. In some ﬁelds, failure is not an option:even a momentarily dysfunctional computer vision algorithmin autonomous vehicle easily leads to fatality. In the medicalﬁeld, clearly human lives are on the line. Detection of a diseaseat its early phase is often critical to the recovery of patients orto prevent the disease from advancing to more severe stages.While machine learning methods, artiﬁcial neural networks,brain-machine interfaces and related subﬁelds have recentlydemonstrated promising performance in performing medicaltasks, they are hardly perfect [1]–[9].Interpretability and explainability of ML algorithms havethus become pressing issues: who is accountable if thingsgo wrong? Can we explain why things go wrong? If thingsare working well, do we know why and how to leveragethem further? Many papers have suggested different measures

Erico T. and Cuntai Guan were with the School of Computer Science andEngineering, Nanyang Technological University, Singapore.Erico T. was also afﬁliated with HealthTech Division, Alibaba GroupHolding Limited. and frameworks to capture interpretability, and the topic ex-plainable artiﬁcial intelligence (XAI) has become a hotspotin ML research community. Popular deep learning librarieshave started to include their own explainable AI libraries, suchas Pytorch Captum and Tensorﬂow tf-explain. Furthermore,the proliferation of interpretability assessment criteria (such as reliability , causality and usability ) helps ML community keeptrack of how algorithms are used and how their usage can beimproved, providing guiding posts for further developments[10]–[12]. In particular, it has been demonstrated that visu-alization is capable of helping researchers detect erroneousreasoning in classiﬁcation problems that many previous re-searchers possibly have missed [13].The above said, there seems to be a lack of uniform adoptionof interpretability assessment criteria across the research com-munity. There have been attempts to deﬁne the notions of ”in-terpretability”, ”explainability” along with ”reliability”, ”trust-worthiness” and other similar notions without clear expositionson how they should be incorporated into the great diversity ofimplementations of machine learning models; consider [10],[14]–[18]. In this survey, we will instead use ”explainability”and ”interpretability” interchangeably, considering a researchto be related to interpretability if it does show any attempts (1)to explain the decisions made by algorithms, (2) to uncoverthe patterns within the inner mechanism of an algorithm, (3)to present the system with coherent models or mathematics,and we will include even loose attempts to raise the credibilityof machine algorithms.In this work, we survey through research works related tothe interpretability of ML or computer algorithms in gen-eral, categorize them, and then apply the same categoriesto interpretability in the medical ﬁeld. The categorizationis especially aimed to give clinicians and practitioners aperspective on the use of interpretable algorithms that areavailable in diverse forms. The trade-off between the easeof interpretation and the need for specialized mathematicalknowledge may create a bias in preference for one methodcompared to another without justiﬁcation based on medicalpractices. This may further provide a ground for specializededucation in the medical sector that is aimed to realize thepotentials that reside within these algorithms. We also ﬁndthat many journal papers in the machine learning and AIcommunity are algorithm-centric. They often assume that thealgorithms used are obviously interpretable without conductinghuman subject tests to verify their interpretability; see columnHSI of table I and II. Note that assuming that a model isobviously interpretable is not necessarily wrong, and, in somecases human tests might be irrelevant (for example pre-deﬁned a r X i v : . [ c s . L G ] J un OURNAL OF L A TEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 2 models based on commonly accepted knowledge speciﬁc tothe content-subject may be considered interpretable withouthuman subject tests). In the tables, we also include a columnto indicate whether the interpretability method applies forartiﬁcial NN, since the issue of interpretability is recentlygathering attention due to its blackbox nature.We will not attempt to cover all related works manyof which are already presented in the research papers andsurvey we cite [1], [2], [15]–[30]. We extend the so-called integrated interpretability [16] by including considerationsfor subject-content-dependent models. Compared to [17], wealso overview the mathematical formulation of common orpopular methods, revealing the great variety of approaches tointerpretability. Our categorization draws a starker borderlinebetween the different views of interpretability that seem to bedifﬁcult to reconcile. In a sense, our survey is more suitable fortechnically-oriented readers due to some mathematical details,although causal readers may ﬁnd useful references for relevantpopular items, from which they may develop interests in thisyoung research ﬁeld. Conversely, algorithm users that needinterpertability in their work might develop an inclination tounderstand what is previously hidden in the thick veil of math-ematical formulation, which might ironically undermine relia-bility and interpretability. Clinicians and medical practitionersalready having some familiarity with mathematical termsmay get a glimpse on how some proposed interpretabilitymethods might be risky and unreliable. The survey [30] viewsinterpretability in terms of extraction of relational knowledge,more speciﬁcally, by scrutinizing the methods under neural-symbolic cycle . It presents the framework as a sub-categorywithin the interpretability literature. We include it under verbalinterpretability , though the framework does demonstrate thatmethods in other categories can be perceived under verbalinterpretability as well. The extensive survey [18] provides alarge list of researches categorized under transparent model and models requiring post-hoc analysis with multiple sub-categories. Our survey, on the other hand, aims to overviewthe state of interpretable machine learning as applied to themedical ﬁeld.This paper is arranged as the following. Section II intro-duces generic types of interpretability and their sub-types. Ineach section, where applicable, we provide challenges andfuture prospects related to the category. Section III applies thecategorization of interpretabilities in section II to medical ﬁeldand lists a few risks of machine interpretability in the medicalﬁeld. Before we proceed, it is also imperative to point out thatthe issue of accountability and interpretability has spawneddiscussions and recommendations [31]–[33], and even enteredthe sphere of ethics and law enforcements [34], engenderingmovements to protect the society from possible misuses andharms in the wake of the increasing use of AI.II. T

YPES OF I NTERPRETABILITY

There has yet to be a widely-adopted standard to understandML interpretability, though there have been works proposingframeworks for interpretability [10], [13], [35]. In fact, differ-ent works use different criteria, and they are justiﬁable in one

Fig. 1. Overview of categorization with illustration. Orange box: interpretabil-ity interface to demarcate the separation between interpretable informationand the cognitive process required to understand them. Grey box: algorithmoutput/product that is proposed to provide interpretability. Black arrow:computing or comprehension process. The perceptive interpretability methodsgenerate items that are usually considered immediately interpretable. On theother hand, methods that provide interpretability via mathematical structuregenerate outputs that require one more layer of cognitive processing interfacebefore reaching the interpretable interface. The eyes and ear icons representhuman senses interacting with items generated for interpretability. way or another. Reference [36] suggests network dissection forthe interpretability of visual representations and offers a wayto quantify it as well. The interactive websites [37], [38] havesuggested a uniﬁed framework to study interpretabilities thathave thus-far been studied separately. The paper [39] deﬁnes auniﬁed measure of feature importance in the SHAP (SHapleyAdditive exPlanations) framework. Here, we categorize exist-ing interpretabilities and present a non-exhaustive list of worksin each category.The two major categories presented here, namely per-ceptive interpretability and interpretability by mathematicalstructures , appear to present different polarities within thenotion of interpretability. As an example for the difﬁcultywith perceptive interpretability, when a visual evidence isgiven erroneously, the underlying mathematical structure maynot seem to provide useful clues on the mistakes. On theother hand, a mathematical analysis of patterns may provideinformation in high dimensions. They can only be easilyperceived once the pattern is brought into lower dimensions,abstracting some ﬁne-grained information we could not yetprove is not discriminative with measurable certainty.

A. Perceptive Interpretability

We include in this category interpretabilities that can behumanly perceived, often one that will be considered obvious.For example, as shown in ﬁg. 2(A2), an algorithm thatclassiﬁes an image into the cat category can be consideredobviously interpretable if it provides segmented patch showingthe cat as the explanation. We should note that this alone mighton the other hand be considered insufﬁcient, because (1) itstill does not un-blackbox an algorithm and (2) it ignores thepossibility of using background objects for its decision. Thefollowing are the sub-categories to perceptive interpretability.

A.1) Saliency

OURNAL OF L A TEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 3

Fig. 2. (A1) Using LIME to generate explanation for text classiﬁcation.

Headache and sneeze are assigned positive values. This means both factors havepositive contribution to the model prediction ﬂu. On the other hand, weight and no fatigue contribute negatively to the prediction. (A2) LIME is used togenerate the super-pixels for the classiﬁcation cat. (A3) ADC modality of a slice of MRI scan from ISLES 2017 segmentation competition. Reddish intensityregion reﬂects a possible explanation to the choice of segmentation (segmentation not shown) (B) Optimized images that maximize the activation of a neuronin the indicated layers. In shallower layer, simple patterns activate neurons strongly while in deeper layer, more complex features such as dog faces and earsdo. Figure (B) is obtained from https://distill.pub/2018/building-blocks/ with permission from Chris Olah.

Saliency method explains the decision of an algorithm byassigning values that reﬂect the importance of input com-ponents in their contribution to that decision. These valuescould take the forms of probabilities and super-pixels such asheatmaps etc. For example, ﬁg. 2(A1) shows how a modelpredicts that the patient suffers from ﬂu from a series offactors, but LIME [14] explains the choice by highlightingthe importance of the particular symptoms that indicate thatthe illness should indeed be ﬂu. Similarly, [40] computes thescores reﬂecting the n-grams activating convolution ﬁlters inNLP (Natural Language Processing). Fig. 2(A2) demonstratesthe output that LIME will provide as the explanation for thechoice of classiﬁcations cat and ﬁg. 2(A3) demonstrates akind of heatmap that shows the contribution of pixels to thesegmentation result (segmentation result not shown, and thisﬁgure is only for demonstration). More formally, given thatmodel f makes a prediction y = f ( x ) for input x , for somemetric v , typically large magnitude of v ( x i ) indicates that thecomponent x i is a signiﬁcant reason for the output y .Saliency methods via decomposition have been developed.In general, they decompose signals propagated within theiralgorithm and selectively rearrange and process them to pro-vide interpretable information. Class Activation Map (CAM)has been a popular method to generate heat/saliency/relevance-map (from now, we will use the terms interchangeably) thatcorresponds to discriminative features for classiﬁcations [41]–[43]. The original implementation of CAM [41] producesheatmaps using f k ( x, y ) , the pixel-wise activation of unit k across spatial coordinates ( x, y ) in the last convolutionallayers, weighted by w ck , the coefﬁcient corresponding to unit k for class c . CAM at pixel ( x, y ) is thus given by M c ( x, y ) =Σ k w ck f k ( x, y ) .Similarly, widely used Layer-wise Relevance Propagation(LRP) is introduced in [44]. Some papers that use LRP toconstruct saliency maps for interpretability include [13], [45]–[50]. It is also applicable for video processing [51]. A shortsummary for LRP is given in [52]. LRP is considered adecomposition method [53]. Indeed, the importance scoresare decomposed such that the sum of the scores in eachlayer will be equal to the output. In short, the relevancescore is the pixel-wise intensity at the input layer R (0) where R ( l ) i = Σ j a ( l ) i w + ij Σ i a ( l ) i w + ij R ( l +1) j is the relevance score of neuron i at layer l with the input layer being at l = 0 . Each pixel ( x, y ) at the input layer is assigned the importance value R (0) ( x, y ) , although some combinations of relevance scores { R ( l ) c } at inner layer l over different channels { c } have beendemonstrated to be meaningful as well (though possibly lessprecise; see the tutorial in its website heatmapping.org). LRPcan be understood in Deep Taylor Decomposition framework[54]. The code implementation can also be found in theaforementioned website.Automatic Concept-based Explanations (ACE) algorithm[55] uses super-pixels as explanations. Other decompositionmethods that have been developed include, DeepLIFT andgradient*input [56], Prediction Difference Analysis [57] and[40]. Peak Response Mapping [58] is generated by back- OURNAL OF L A TEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 4 propagating peak signals. Peak signals are normalized andtreated as probability, and the method can be seen as de-composition into probability transitions. In [59],

Removedcorrelation ρ is proposed as a metric to measure the qualityof signal estimators. And then it proposes PatternNet andPatternAttribution that backpropagate parameters optimizedagainst ρ , resulting in saliency maps as well. SmoothGrad[60] improves gradient-based techniques by adding noises.Do visit the related website that displays numerous visualcomparison of saliency methods; be mindful of how someheatmaps highlight apparently irrelevant regions.For natural language processing or sentiment analysis,saliency map can also take the form of heat scores over wordsin texts, as demonstrated by [61] using LRP and by [62]. Inthe medical ﬁeld (see later section), [6], [43], [63]–[69] havestudied methods employing saliency and visual explanations.Note that we also sub-categorize LIME as a method that usesoptimization and sensitivity as its underlying mechanisms, andmany researches on interpretability span more than one sub-categories. Challenges and Future Prospects . As seen, the formulasfor CAM and LRP are given on a heuristic: certain waysof interaction between weights and the strength of activationof some units within the models will eventually produce theinterpretable information. The intermediate processes are notamenable to scrutiny. For example, taking one of the weightsand changing its value does not easily reveal any usefulinformation. How these prescribed ways translate into inter-pretable information may also beneﬁt from stronger evidences,especially evidences beyond visual veriﬁcation of localizedobjects. Signal methods to investigate ML models (see latersection) exist, but such methods that probe them with respectto the above methods have not been attempted systematically,possibly opening up a different research direction.

A.2) Signal Method

Methods of interpretability that observe the stimulation ofneurons or a collection of neurons are called signal methods[70]. On the one hand, the activated values of neurons canbe manipulated or transformed into interpretable forms. Forexample, the activation of neurons in a layer can be usedto reconstruct an image similar to the input. This is possiblebecause neurons store information systematically [71]: featuremaps in the deeper layer activate more strongly to complexfeatures, such as human face, keyboard etc while feature mapsin the shallower layers show simple patterns such as lines andcurves. Note: an example of feature map is the output of aconvolutional ﬁlter in a Convolutional Neural Network (CNN).On the other hand, parameters or even the input data mightbe optimized with respect to the activation values of particularneurons using methods known as activation optimization (seea later section). The following are the relevant sub-categories.

Feature maps and Inversions for Input Reconstructions . Afeature map often looks like a highly blurred image withmost region showing zero (or low intensity), except for thepatch that a human could roughly discern as a detectedfeature. Sometimes, these discernible features are consideredinterpretable, as in [71]. However, they might be too distorted.Then, how else can a feature map be related to a humanly- perceptible feature? An inverse convolution map can be de-ﬁned: for example, if feature map in layer 2 is computed inthe network via y = f ( f ( x )) where x is the input, f ( . ) consists of 7x7 convolutions of stride 2 followed by max-pooling and likewise f ( . ) . Then [71] reconstructs an imageusing a deconvolution network by approximately inversing thetrained convolutional network ˜ x = deconv ( y ) = ˆ f − ˆ f − ( y ) which is an approximation, because layers such as max-pooling have no unique inverse. It is shown that ˜ x does appearlike slightly blurred version of the original image, whichis distinct to human eye. Inversion of image representationswithin the layers has also been used to demonstrate thatCNN layers do store important information of an input imageaccurately [72], [73]. Guided backpropagation [74] modiﬁesthe way backpropagation is performed to achieve inversion byzeroing negative signals from both the output or input signalsbackwards through a layer. Indeed, inversion-based methodsdo use saliency maps for visualization of the activated signals. Activation Optimization . Besides transforming the activationof neurons, signal method also includes ﬁnding input imagesthat optimize the activation of a neuron or a collection ofneurons. This is called the activation maximization . Startingwith a noise as an input x , the noise is slowly adjusted toincrease the activation of a select (collection of) neuron(s) { a k } . In simple mathematical terms, the task is to ﬁnd x = argmax ||{ a k }|| where optimization is performed over input x and || . || is a suitable metric to measure the combined strengthof activations. Finally the optimized input that maximizes theactivation of the neuron(s) can emerge as something visuallyrecognizable. For example, the image could be a surreal fuzzycombination of swirling patterns and parts of dog faces, asshown in ﬁg. 2(B).Research works on activation maximization include [75]on MNIST dataset, [76] and [77] that uses a regularizationfunction. In particular, [37] provides an excellent interac-tive interface (feature visualization) demonstrating activation-maximized images for GoogLeNet [78]. GoogLeNet has adeep architecture, from which we can see how neurons indeeper layer stores complex features while shallower layerstores simple patterns; see ﬁg. 2(B). To bring this one stepfurther, the semantic dictionary is used [38] to provide avisualization of activations within a higher-level organizationand semantically more meaningful arrangements. Other Observations of Signal Activations.

Ablation studies[79], [80] also study the roles of neurons in shallower anddeeper layers. In essence, some neurons are corrupted and theoutput of the corrupted neural network is compared to theoriginal network.

Challenges and Future Prospects . Signal methods mighthave revealed some parts of the black-box mechanisms. Manyquestions still remain. What do we do with the (partially)reconstructed images and images that optimize activation? Wemight have learned how to approximately inverse signals torecover images, can this help improve interpretability further?The components and parts in the intermediate process thatreconstruct the approximate images might contain importantinformation; will we be able to utilize them in the future? Howis explaining the components in this “inverse space” more

OURNAL OF L A TEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 5 useful than explaining signals that are forward propagated?Similarly, how does looking at intermediate signals that lead toactivation optimization help us pinpoint the role of a collectionof neurons? Optimization of highly parameterized functionsnotoriously gives non-unique solutions. Can we be sure thatoptimization that yields combination of surreal dog faces willnot yield other strange images with minor alteration? In theprocess of answering these questions, we may ﬁnd hiddenclues required to get closer to interpretable AI.

A.3) Verbal Interpretability

This form of interpretability takes the form of verbal chunksthat human can grasp naturally. Examples include sentencesthat indicate causality, as shown in the examples below.Logical statements can be formed from proper concatena-tion of predicates, connectives etc. An example of logicalstatement is the conditional statement. Conditional statementsare statements of the form A → B , in another words if Athen B . An ML model from which logical statements can beextracted directly has been considered obviously interpretable.The survey [30] shows how interpretability methods in generalcan be viewed under such symbolic and relational system. Inthe medical ﬁeld, see for example [81], [82].Similarly, decision sets or rule sets have been studied forinterpretability [83]. The following is a single line in a rule set“rainy and grumpy or calm → dairy or vegetables”, directlyquoted from the paper. Each line in a rule set contains a clausewith an input in disjunctive normal form (DNF) mapped toan output in DNF as well. The example above is formallywritten (rainy ∧ grumpy) ∨ calm → dairy ∨ vegetables. Comparingthree different variables, it is suggested that interpretabilityof explanations in the form of rule sets is most affectedby cognitive chunks, explanation size and little effected byvariable repetition. Here, a cognitive chunk is deﬁned as aclause of inputs in DNF and the number of (repeated) cognitivechunks in a rule set is varied. The explanation size is self-explanatory (a longer/shorter line in a rule set, or more/lesslines in a rule set). MUSE [84] also produces explanationin the form of decision sets, where interpretable model ischosen to approximate the black-box function and optimizedagainst a number of metrics, including direct optimization ofinterpretability metrics.It is not surprising that verbal segments are provided as theexplanation in NLP problems. An encoder-generator frame-work [85] extracts segment like “a very pleasant ruby red-amber color” to justify 5 out of 5-star rating for a productreview. Given a sequence of words x = ( x , ..., x l ) with x k ∈ R d , explanation is given as the subset of the sentencethat gives a summary of why the rating is justiﬁed. Thesubset can be expressed as the binary sequence ( z , ..., z l ) where z k = 1(0) indicates x k is (not) in the subset. Then z follows a probability distribution with p ( z | x ) decomposedby assuming independence to Π k p ( z k | x ) where p ( z k | x ) = σ z ( W z [ −→ h k , ←− h k ] + b z ) , with −→ h t , ←− h t being the usual hidden unitsin the recurrent cell (forward and backward respectively). Sim-ilar segments are generated using ﬁlter-attribute probabilitydensity function to improve the relation between the activationof certain ﬁlters and speciﬁc attributes [86]. Earlier works onVisual Question Answering (VQA) [87]–[89] are concerned with the generation of texts discussing objects appearing inimages. Challenges and Future Prospects . While texts appear toprovide explanations, the underlying mechanisms used togenerate the texts are not necessarily explained. For example,NNs and the common variants/components used in text-relatedtasks such as RNN (recurrent NN), LSTM (long short termmemory) are still black-boxes that are hard to troubleshoot inthe case of wrong predictions. There have been less worksthat probe into the inner signals of LSTM and RNN neuralnetworks. This is a possible research direction, although simi-lar problem as mentioned in the previous sub-subsection mayarise (what to do with the intermediate signals?). Furthermore,while word embedding is often optimized with the usual lossminimization, there does not seem to be a coherent explanationto the process and shape of the optimized embedding. Theremay be some clues regarding optimization residing within theembedding, and thus successfully interpreting the shape ofembedding may help shed light into the mechanism of thealgorithm.

B. Interpretability via Mathematical Structure

Mathematical structures have been used to reveal the mech-anisms of ML and NN algorithms. In the previous sec-tion, deeper layer of NN is shown to store complex in-formation while shallower layer stores simpler information[71]. TCAV [95] has been used to show similar trend, assuggested by ﬁg. 3(A2). Other methods include clusteringsuch as t-SNE (t-Distributed Stochastic Neighbor Embedding)shown in ﬁg. 3(B) and subspace-related methods, for examplecorrelation-based Singular Vector Canonical Correlation Anal-ysis (SVCCA) [96] is used to ﬁnd the signiﬁcant directionsin the subspace of input for accurate prediction, as shown inﬁgure 3(C). Information theory has been used to study in-terpretability by considering Information Bottleneck principle[97], [98]. The rich ways in which mathematical structures addto the interpretability pave ways to a comprehensive view ofthe interpretability of algorithms, hopefully providing a groundfor unifying the different views under a coherent frameworkin the future.

B.1) Pre-deﬁned Model

To study a system of interest, especially complex systemswith not well-understood behaviour, mathematical formulasuch as parametric models can help simplify the tasks. Witha proper hypothesis, relevant terms and parameters can bedesigned into the model. Interpretation of the terms comenaturally if the hypothesis is either consistent with availableknowledge or at least developed with good reasons. Whenthe systems are better understood, these formula can beimproved by the inclusion of more complex components. Inthe medical ﬁeld (see later section), an example is kineticmodelling . Machine learning can be used to compute theparameters deﬁned in the models. Other methods exist, suchas integrating commonly available methodologies with subjectspeciﬁc contents etc. For example, Generative DiscriminativeModels [99], combining ridge regression and least squaremethod to handle variables for analyzing Alzheimer’s diseaseand schizophrenia.

OURNAL OF L A TEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 6

TABLE IL

IST OF JOURNAL PAPERS ARRANGED ACCORDING TO THE INTERPRETABILITY METHODS USED , HOW INTERPRETABILITY IS PRESENTED OR THESUGGESTED MEANS OF INTERPRETABILITY . T

HE TABULATION PROVIDES A NON - EXHAUSTIVE OVERVIEW OF INTERPRETABILITY METHODS , PLACINGSOME DERIVATIVE METHODS UNDER THE UMBRELLA OF THE MAIN METHODS THEY DERIVE FROM . HSI: H

UMAN S TUDY ON I NTERPRETABILITY (cid:88)

MEANS THERE IS HUMAN STUDY DESIGNED TO VERIFY IF THE SUGGESTED METHODS ARE INTERPRETABLE BY THE HUMAN SUBJECT . ANN: (cid:88)

MEANSEXPLICITLY INTRODUCES NEW ARTIFICIAL NEURAL NETWORK ARCHITECTURE , MODIFIES EXISTING NETWORKS OR PERFORMS TESTS ON NEURALNETWORKS .Methods HSI ANN MechansimCAM with global average pooling [41], [90] (cid:55) (cid:88)

Decomposition S a li e n c y P e r ce p ti v e I n t e r p r e t a b ilit y + Grad-CAM [42] generalizes CAM, utilizing gradient (cid:88) (cid:88) + Guided Grad-CAM and Feature Occlusion [67] (cid:55) (cid:88) + Respond CAM [43] (cid:55) (cid:88) + Multi-layer CAM [91] (cid:55) (cid:88) LRP (Layer-wise Relevance Propagation) [13], [52] (cid:55)

N.A.+ Image classiﬁcations. PASCAL VOC 2009 etc [44] (cid:55) (cid:88) + Audio classiﬁcation. AudioMNIST [46] (cid:55) (cid:88) + LRP on DeepLight. fMRI data from Human Connectome Project [47] (cid:55) (cid:88) + LRP on CNN and on BoW(bag of words)/SVM [48] (cid:55) (cid:88) + LRP on compressed domain action recognition algorithm [49] (cid:55) (cid:55) + LRP on video deep learning, selective relevance method [51] (cid:55) (cid:88) + BiLRP [50] (cid:55) (cid:88)

DeepLIFT [56] (cid:55) (cid:88)

Prediction Difference Analysis [57] (cid:55) (cid:88)

Slot Activation Vectors [40] (cid:55) (cid:88)

PRM (Peak Response Mapping) [58] (cid:55) (cid:88)

LIME (Local Interpretable Model-agnostic Explanations) [14] (cid:88) (cid:88)

Sensitivity+ MUSE with LIME [84] (cid:88) (cid:88) + Guidelinebased Additive eXplanation optimizes complexity, similar to LIME [92] (cid:88) (cid:88) (cid:55) (cid:88) + Image corruption and testing Region of Interest statistically [65] (cid:55) (cid:88) + Attention map with autofocus convolutional layer [66] (cid:55) (cid:88)

DeconvNet [71] (cid:55) (cid:88)

Inversion S i gn a l Inverting representation with natural image prior [72] (cid:55) (cid:88)

Inversion using CNN [73] (cid:55) (cid:88)

Guided backpropagation [74], [90] (cid:55) (cid:88)

Activation maximization/optimization [37] (cid:55) (cid:88)

Optimization+ Activation maximization on DBN (Deep Belief Network) [75] (cid:55) (cid:88) + Activation maximization, multifaceted feature visualization [76] (cid:55) (cid:88)

Visualization via regularized optimization [77] (cid:55) (cid:88)

Semantic dictionary [38] (cid:55) (cid:88)

Decision trees N.A. N.A.Propositional logic, rule-based [81] (cid:55) (cid:55)

Sparse decision list [82] (cid:55) (cid:55)

Decision sets, rule sets [83], [84] (cid:88) (cid:55)

VerbalEncoder-generator framework [85] (cid:55) (cid:88)

Filter Attribute Probability Density Function [86] (cid:55) (cid:55)

MUSE (Model Understanding through Subspace Explanations) [84] (cid:88) (cid:88)

Linearity . The simplest interpretable pre-deﬁned model isthe linear combination of variables y = Σ i a i x i where a i is thedegree of how much x i contributes to the prediction y . A linearcombination model with x i ∈ { , } has been referred to as the additive feature attribution method [39]. If the model performswell, this can be considered highly interpretable. However,many models are highly non-linear. In such cases, studyinginterpretability via linear properties (for example, using linearprobe; see below) are useful in several ways, including theease of implementation. When linear property appears to beinsufﬁcient, non-linearity can be introduced; it is typically notdifﬁcult to replace the linear component −→ w · −→ a within thesystem with a non-linear version f ( −→ w , −→ a ) .A linear probe is used in [100] to extract information fromeach layer in a neural network. More technically, assume wehave deep learning classiﬁer F ( x ) ∈ [0 , D where F i ( x ) ∈ [0 , is the probability that input x is classiﬁed into class i out of D classes. Given a set of features H k at layer k of a neural network, then the linear probe f k at layer k isdeﬁned as a linear classiﬁer f k : H k → [0 , D i.e. f ( h k ) = sof tmax ( W h k + b ) . In another words, the probe tells us howwell the information from only layer k can predict the output,and each of this predictive probe is a linear classiﬁer by design.The paper then shows plots of the error rate of the predictionmade by each f k against k and demonstrates that these linearclassiﬁers generally perform better at deeper layer, that is, atlarger k . General Additive Models . Linear model is generalized bythe Generalized Additive Model (GAM) [101], [102] withstandard form g ( E [ y ]) = β + Σ f j ( x j ) where g is the linkfunction . The equation is general, and speciﬁc implementationsof f j and link function depend on the task. The familiarGeneral Linear Model (GLM) is GAM with the speciﬁcimplementation of linear f j and g is the identity. Modiﬁcations OURNAL OF L A TEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 7 can be duly implemented. As a natural extension to the model,interaction terms between variables f ij ( x i , x j ) are used [103];we can certainly extend this indeﬁnitely. ProtoAttend [104]uses probabilities as weights in the linear component of theNN. Such model is considered inherently interpretable by theauthors. In the medical ﬁeld, see [81], [99], [105], [106]. Content-subject-speciﬁc model . Some algorithms are con-sidered obviously interpretable within its ﬁeld. Models aredesigned based on existing knowledge or empirical evidence,and thus interpretation of the models is innately embedded intothe system. ML algorithms can then be incorporated in richand diverse ways, for example, through parameter ﬁtting. Thefollowing lists just a few works to illustrate the usage diversityof ML algorithms. Deep Tensor Neural Network is used forquantum many-body systems [107]. Atomistic neural networkarchitecture for quantum chemistry is used in [108], whereeach atom is like a node in a graph with a set of feature vectors.The speciﬁcs depend on the neural network used, but thismodel is considered inherently interpretable. Neural networkhas been used for programmable wireless environments (PWE)[109]. TS approximation [110] is a fuzzy network approxima-tion of other neural networks. The approximate fuzzy systemis constructed with choices of components that can be adaptedto the context of interpretation. The paper itself uses sigmoid-based membership function, which it considers interpretable.A so-called model-based reinforcement learning is suggestedto be interpretable after the addition of high level knowledgeabout the system that is realized as Bayesian structure [111].

Challenges and Future Prospects . The challenge of for-mulating the “correct” model exists regardless of machinelearning trend. It might be interesting if a system is foundthat is fundamentally operating on a speciﬁc machine learn-ing model. Backpropagation-based deep NN (DNN) itself isinspired by the brain, but they are not operating at fundamentallevel of similarity (nor is there any guarantee that such modelexists). When interpretability is concerned, having fundamen-tal similarity to real, existing systems may push forward ourunderstanding of machine learning model in unprecedentedways. Otherwise, in the standard uses of machine learningalgorithm, different optimization paradigms are still beingdiscovered. Having optimization paradigm that is specializedfor speciﬁc models may be contribute to a new aspect ofinterpretable machine learning.

B.2) Feature Extraction

We give an intuitive explanation via a hypothetical exampleof a classiﬁer for heart-attack prediction. Given, say, 100-dimensional features including eating pattern, job and resi-dential area of a subject. A kernel function can be used toﬁnd out that the strong predictor for heart attack is a 100-dimensional vector which is signiﬁcant in the following axes:eating pattern, exercise frequency and sleeping pattern. Then,this model is considered interpretable because we can linkheart-attack risk with healthy habits rather than, say socio-geographical factors. More information can be drawn fromthe next most signiﬁcant predictor and so on.

Correlation . The methods discussed in this section includethe use of correlation in a general sense. This will natu-rally include covariance matrix and correlation coefﬁcients after transformation by kernel functions. A kernel functiontransforms high-dimensional vectors such that the transformedvectors better distinguish different features in the data. Forexample, the Principal Component Analysis transforms vectorsinto the principal components (PC) that can be ordered by theeigenvalues of singular-value-decomposed (SVD) covariancematrix. The PC with the highest eigenvalue is roughly the mostinformative feature. Many kernel functions have been intro-duced, including the Canonical Correlation Analysis (CCA)[112]. CCA provides the set of features that transforms theoriginal variables to the pairs of canonical variables, whereeach pair is a pair of variables that are “best correlated”but not correlated to other pairs. Quoted from [113], “suchfeatures can inherently characterize the object and thus it canbetter explore the insights and ﬁner details of the problems athand”. In the previous sections, interpretability research usingcorrelation includes [59].SVCCA combines CCA and SVD to analyze interpretability[96]. Given an input dataset X = { x , ..., x m } where eachinput x i is possibly multi-dimensional. Denote the activationof neuron i at layer l as z li = ( z li ( x ) , ..., z li ( x m )) . Notethat one such output is deﬁned for the entire input dataset.SVCCA ﬁnds out the relation between 2 layers of a network l k = { z l k i | i = 1 , ..., m k } for k = 1 , by taking l and l as the input (generally, l k does not have to be the entirelayer). SVCCA uses SVD to extract the most informativecomponents l (cid:48) k and uses CCA to transform l (cid:48) and l (cid:48) such that ¯ l (cid:48) = W X l (cid:48) and ¯ l (cid:48) = W X l (cid:48) have the maximum correlation ρ = { ρ , ..., ρ min ( m ,m ) } . One of the SVCCA experimentson CIFAR-10 demonstrates that only 25 most-signiﬁcant axesin l (cid:48) k are needed to obtain nearly the full accuracy of a full-network with 512 dimensions. Besides, the similarity between2 compared layers is deﬁned to be ¯ ρ = min ( m ,m ) Σ i ρ i .The successful development of generative adversarial net-works (GAN) [114]–[116] for generative tasks have spawnedmany derivative works. GAN-based models have been ableto generate new images not distinguishable from syntheticimages and perform many other tasks, including transferringstyle from one set of images to another or even producing newdesigns for products and arts. Studies related to interpretabil-ities exist. For example [117] uses encoder-decoder system toperform multi-stage PCA. Generative model is used to showthat natural image distribution modelled using probabilitydensity is fundamentally difﬁcult to interpret [118]. This isdemonstrated through the use of GAN for the estimationof image distribution density. The resulting density showspreferential accumulation of density of images with certainfeatures (for examples, images featuring small object withfew foreground distractions) in the pixel space. The paperthen suggests that interpretability is improved once it isembedded in the deep feature space, for example, from GAN.In this sense, the interpretability is offered by better correlationbetween the density of images with the correct identiﬁcationof the objects. Consider also the GAN-based works they cite. Clustering . Algorithm such as t-SNE has been used tocluster input images based on their activation of neurons ina network [76], [119]. The core idea relies on the distancebetween objects being considered. If the distance between two

OURNAL OF L A TEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 8

Fig. 3. (A1) TCAV [95] method ﬁnds the hyperplane CAV that separates concepts of interest. (A2) Accuracies of CAV applied to different layers supportsthe idea that deeper NN layers contain more complex concepts, and shallower layers contain simpler concepts. (B) SVCCA [96] ﬁnds the most signiﬁcantsubspace (direction) that contains the most information. The graph shows that as few as 25 directions out of 500 are enough to produce the accuracies of thefull network. (C) t-SNE clusters images in meaningful arrangement, for example dog images are close together. Figures (A1,A2) are used with permissionfrom the authors Been Kim; ﬁgures (B,C) from Maithra Raghu and Jascha Sohl-dickstein. objects are short in some measurement space, then they aresimilar. This possibly appeals to the notion of human learningby the

Law of Association . It differs from correlation-basedmethod which provides some metrics that relate the change ofone variable with another, where the two related objects canoriginate from completely different domains; clustering simplypresents their similarity, more sensibly in similar domain orin the subsets thereof. In [119], the activations { f fc ( x ) } of 4096-dimensional layer fc7 in the CNN are collectedover all input { x } . Then { f fc ( x ) } is fed into t-SNE to bearranged and embedded into two-dimension for visualization(each point then is visually represented by the input image x ). Activation atlases are introduced in [120], which similarlyuses t-SNE to arrange some activations { f act ( x ) } , except thateach point is represented by the average activations of featurevisualization. In meta-material design [121], design patternand optical responses are encoded into latent variables to becharacterized by Variational Auto Encoder (VAE). Then, t-SNE is used to visualize the latent space.In the medical ﬁeld (also see later section), we have [122],[123] (uses Laplacian Eigenmap for interpretability) [124](introduces a low-rank representation method for AutisticSpectrum Diagnosis). Challenges and Future Prospects . This section exempliﬁesthe difﬁculty in integrating mathematics and human intuition.Having extracted ”relevant” or ”signiﬁcant” features, some-times we are left with still a combination of high dimensional vectors. Further analysis comes in the form of correlations orother metrics that attempt to show similarities or proximity.The interpretation may stay as mathematical artifact, but thereis a potential that separation of concepts attained by thesemethods can be used to reorganize a black-box model fromwithin. It might be an interesting research direction that lacksjustiﬁcation in terms of real-life application: however, progressin unraveling black-boxes may be a high-risk high-returninvestment.

B.3) Sensitivity

We group together methods that rely on localization, gradi-ents and perturbations under the category of sensitivity. Thesemethods rely on the notion of small changes dx in calculusand the neighborhood of a point in metric spaces. Sensitivity to input noises or neighborhood of data points .Some methods rely on the locality of some input x . Let amodel f ( . ) predicts f ( x ) accurately for some x . Denote x + δ as a slightly noisy version of x . The model is locally faithfulif f ( x + δ ) produces correct prediction, otherwise, the modelis unfaithful and clearly such instability reduces its reliability.Reference [125] introduces meta-predictors as interpretabilitymethods and emphasizes the importance of the variation ofinput x to neural network in explaining a network. They deﬁne explanation and local explanation in terms of the responseof blackbox f to some input. Amongst many of the studiesconducted, they provide experimental results on the effect ofvarying input such as via deletion of some regions in the input. OURNAL OF L A TEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 9

Likewise, when random pixels of an image are deleted (hencethe data point is shifted to its neighborhood in the featurespace) and the resulting change in the output is tested [56],pixels that are important to the prediction can be determined.In text classiﬁcation, [126] provides explanations in the formof partitioned graphs. The explanation is produced in threemain steps, where the ﬁrst step involves sampling perturbedversions of the data using VAE.Testing with Concept Activation Vectors (TCAV) has alsobeen introduced as a technique to interpret the low-levelrepresentation of neural network layer [95]. First, the conceptactivation vector (CAV) is deﬁned. Given input x ∈ R n anda feedforward layer l having m neurons, the activation at thatlayer is given by f l : R n → R m . If we are interested in theconcept C , for example striped pattern, then, using TCAV, wesupply a set P C of examples corresponding to striped pattern(zebra, clothing pattern etc) and the negative examples N . Thiscollection is used to train a binary classiﬁer v lC ∈ R m for layer l that partitions { f l ( x ) : x ∈ P C } and { f l ( x ) : x ∈ N } . Inanother words, a kernel function extracts features by mappingout a set of activations that has relevant information aboutthe stripe-ness. CAV is thus deﬁned as the normal vectorto the hyperplane that separates the positive examples fromthe negative ones, as shown in ﬁg. 3(A1). It then computesdirectional derivative S v,k,l ( x ) = ∇ h l,k ( f l ( x )) · v lC to obtainthe sensitivity of the model w.r.t. the concept C , where h l,k is the logit function for class k of C for layer l .LIME [14] optimizes over models g ∈ G where G is aset of interpretable models G by minimizing locality-awareloss and complexity. In another words, it seeks to obtain theoptimal model ξ ( x ) = argmin g ∈ G L ( f, g, π x ) + Ω( g ) where Ω is the complexity and f is the true function we want tomodel. An example of the loss function is L ( f, g, π x ) =Σ z,z (cid:48) ∈ Z π x ( z )[ f ( x ) − g ( z (cid:48) )] with π x ( z ) being, for example,Euclidean distance and Z is the vicinity of x . From theequation, it can be seen that the desired g will be close to f in the vicinity Z of x , because f ( z ) ≈ g ( z (cid:48) ) for z, z (cid:48) ∈ Z . Inanother words, noisy inputs z, z (cid:48) do not add too much losses.Gradient-based explanation vector ξ ( x ) = ∂∂x P ( Y (cid:54) = g ( x ) | X = x ) is introduced by [127] for Bayesian classiﬁer g ( x ) = argmin c ∈{ i,...,C } P ( Y (cid:54) = c | X = x ) , where x, ξ ared-dimensional. For any i = 1 , ..., d , high absolute value of [ ξ ( x )] i means that component i contributes signiﬁcantly tothe decision of the classiﬁer. If it is positive, the higher thevalue is, the less likely x contributes to decision g ( x ) .ACE algorithm [55] uses TCAV to compute saliency scoreand generate super-pixels as explanations. Grad-CAM [42] is asaliency method that uses gradient for its sensitivity measure.In [128], inﬂuence function is used. While theoretical, thepaper also practically demonstrates how understanding theunderlying mathematics will help develop perturbative trainingpoint for adversarial attack. Sensitivity to dataset . A model is possibly sensitive to thetraining dataset { x i } as well. Inﬂuence function is also usedto understand the effect of removing x i for some i and showsthe consequent possibility of adversarial attack [128]. Studieson adversarial training examples can be found in the paperand its citations, where seemingly random, insigniﬁcant noises can degrade machine decision considerably. The representertheorem is introduced for studying the extent of effect x i hason a decision made by a deep NN [129]. Challenges and Future Prospects . There seems to be a con-cern with locality and globality of the concepts. As mentionedin [95], to achieve global quantiﬁcation for interpretability,explanation must be given for a set of examples or the entireclass rather than “just explain individual data inputs”. Fromour understanding, TCAV is a perturbation method by thevirtue of stable continuity in the usual derivative and it isglobal because the whole subset of dataset with label k ofconcept C has been shown to be well-distinguished by TCAV.However, we may want to point out that despite their claimto globality, it is possible to view the success of TCAV aslocal, since it is only global within each label k rather thanwithin all dataset considered at once. From the point of viewof image processing, the neighborhood of a data point (animage) in the feature space poses a rather subtle question. Forexample, after rotating and stretching the image or deletingsome pixels, how does the position of the image in the featurespace change? Is there any way to control the effect of randomnoises and improve robustness of machine prediction in a waythat is sensible to human’s perception? The transition in thefeature space from one point to another point that belongs todifferent classes is also unexplored. Current trend recognizesthat regions in the input space with signiﬁcant gradients pro-vide interpretability, because deforming these regions quicklydegrades the prediction (conversely, the particular values atthese regions are important to the reach a certain prediction).Gradient is also in the core of loss optimization, making it thenatural target for further studies. B.4) Optimization

We have described several researches that seek to attain in-terpretability via optimization methods. Some have optimiza-tion at the core of their algorithm, but the interpretability isleft to visual observation, while others optimize interpretabilitymathematically.

Quantitatively maximizing interpretability . To approximatea function f , as previously mentioned, LIME [14] performsoptimization by ﬁnding optimal model ξ ∈ G so that f ( z ) ≈ ξ ( z (cid:48) ) for z, z (cid:48) ∈ Z where Z is the vicinity of x , so thatlocal ﬁdelity is said to be achieved. Concurrently the com-plexity Ω( ξ ) is minimized. Minimized Ω means the modelsinterpretability is maximized. MUSE [84] takes in blackboxmodel, prediction and user-input features to output decisionsets based on optimization w.r.t ﬁdelity, interpretability andunambiguity. The available measures of interpretability thatcan be optimized include size , featureoverlap etc (refer to table2 of its appendix). Activation Optimization . Activation optimizations are usedin research works such as [37], [75]–[77] as explained in a pre-vious section. The interpretability relies on direct observationof the neuron-activation-optimized images. While the qualityof the optimized images are not evaluated, the fact that partsof coherent images emerge with respect to a (collection of)neuron(s) does demonstrate some organization of informationin the neural networks.

OURNAL OF L A TEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 10

C. Other Perspectives to Interpretability

There are many other concepts that can be related tointerpretability. Reference [42] conducted experiments to testthe improvements of human performance on a task after beinggiven explanations (in the form of visualization) producedby ML algorithms. We believe this might be an exemplaryform of interpretability evaluation. For example, we want tocompare machine learning algorithms

M L A with M L B . Say,human subjects are given difﬁcult classﬁcation tasks and attaina baseline accuracy. Repeat the task with different set ofhuman subjects, but they are given explanations churned outby M L A and M L B . If the accuracies attained are now and respectively, then M L B is more interpretable.Even then, if human subjects cannot really explain whythey can perform better with the given explanations, thenthe interpretability may be questionable. This brings us tothe question of what kind of interpretability is necessary indifferent tasks and certainly points to the possibility that thereis no need for a uniﬁed version of interpretability. C.1) Data-driven InterpretabilityData in catalogue . A large amount of data has been crucialto the functioning of many ML algorithms, mainly as the inputdata. In this section, we mention works that put a differentemphasize on the treatment of these data arranged in cata-logue. In essence, [10] suggests that we create a matrix whoserows are different real-world tasks (e.g. pneumonia detection),columns are different methods (e.g. decision tree with differentdepths) and the entries are the performance of the methodson some end-task . How can we gather a large collection ofentries into such a large matrix? Apart from competitions andchallenges, crowd-sourcing efforts will aid the formation ofsuch database [147], [148]. A clear problem is how multi-dimensional and gigantic such tabulation will become, not tomention that the collection of entries is very likely uncountablymany. Formalizing interpretability here means we pick latentdimensions (common criteria) that human can evaluate e.g.time constraint or time-spent, cognitive chunks (deﬁned as thebasic unit of explanation, also see the deﬁnition in [83]) etc.These dimensions are to be reﬁned along iterative processesas more user-inputs enter the repository.

Incompleteness . In [10], the problem of incompleteness of problem formulation is ﬁrst posed as the issue in inter-pretability. Incompleteness is present in many forms, fromthe impracticality to produce all test-cases to the difﬁculty injustifying why a choice of proxy is the best for some scenarios.At the end, it suggests that interpretability criteria are to beborn out of collective agreements of the majority, througha cyclical process of discoveries, justiﬁcations and rebuttals.In our opinion, a disadvantage is that there is a possibilitythat no unique convergence will be born, and the situationmay aggravate if, say, two different conﬂicting factions areborn, each with enough advocate. The advantage lies in theexistence of strong roots for the advocacy of certain choice ofinterpretability. This prevents malicious intent from tweakinginterpretability criteria to suit ad hoc purposes.

C.2) InvariancesImplementation invariance . Reference [93] suggests im-plementation invariance as an axiomatic requirement to in- terpretability. In the paper, it is stated as the following.Deﬁne two functionally equivalent functions as f , f so that f ( x ) = f x ( x ) for any x regardless of their implementationdetails. Given any two such networks using attribution method,then the attribution functional A will map the importance ofeach component of an input to f the same way it does to f .In another words, ( A [ f ]( x )) j = ( A [ f ]( x )) j for any j = 1 , , d where d is the dimension of the input. The statement can beeasily extended to methods that do not use attribution as well. Input invariance . To illustrate using image classiﬁcationproblem, translating an image will also translate super-pixelsdemarcating the area that provides an explanation to thechoice of classiﬁcation correspondingly. Clearly, this propertyis desirable and has been proposed as an axiomatic invarianceof a reliable saliency method. There has also been a study onthe input invariance of some saliency methods with respect totranslation of input x → x + c for some c [70]. Of the methodsstudied, gradients/sensitivity-based methods [127] and signalmethods [71], [74] are input invariant while some attributionmethods, such as integrated gradient [93], are not. C.3) Interpretabilities by Utilities

The following utilities-based categorization of interpretabil-ity is proposed by [10].

Application-based . First, an evaluation is application-grounded if human A gives explanation X A on a speciﬁcapplication, so-called the end-task (e.g. a doctor performsdiagnosis) to human B, and B performs the same task. ThenA has given B a useful explanation if B performs better inthe task. Suppose A is now a machine learning model, thenthe model is highly interpretable if human B performs thesame task with improved performance after given X A . Somemedical segmentation works will fall into this category as well,since the segmentation will constitute a visual explanation forfurther diagnosis/prognosis [143], [144] (also see other cate-gories of the grand challenge). Such evaluation is performed,for example, by [42]. They proposed Grad-CAM applied onguided backpropagation (proposed by [74]) of AlexNet CNNand VGG. The produced visualizations are used to help humansubjects in Amazon Mechanical Turks identify objects withhigher accuracy in predicting VOC 2007 images. The humansubjects achieved . accuracy, which is . higherthan visualization provided by guided backpropagation. Human-based . This evaluation involves real humans andsimpliﬁed tasks. It can be used when, for some reasons oranother, having human A give a good explanation X A ischallenging, possibly because the performance on the taskcannot be evaluated easily or the explanation itself requiresspecialized knowledge. In this case, a simpliﬁed or partialproblem may be posed and X A is still demanded. Unlikethe application-based approach, it is now necessary to lookat X A speciﬁcally for interpretability evaluation. Bigger poolof human subjects can then be hired to give a generic valuationto X A or create a model answer ˆ X A to compare X A with, andthen a generic valuation is computed.Now, suppose A is a machine learning model, A is moreinterpretable compared to another ML model if it scores betterin this generic valuation. In [145], a ML model is given adocument containing the conversation of humans making a OURNAL OF L A TEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 11

TABLE II(

CONTINUED FROM TABLE

I) L

IST OF JOURNAL PAPERS ARRANGED ACCORDING TO THE INTERPRETABILITY METHODS USED , HOW INTERPRETABILITYIS PRESENTED OR THE SUGGESTED MEANS OF INTERPRETABILITY .Methods HSI ANN MechanismLinear probe [100] (cid:55) (cid:88) I n t e r p r e t a b ilit yv i a M a t h e m a ti ca l S t r u c t u r e Regression based on CNN [105] (cid:55) (cid:88)

Backwards model for interpretability of linear models [106] (cid:55) (cid:55)

GDM (Generative Discriminative Models): ridge regression + least square [99] (cid:55) (cid:55)

GAM, GA M (Generative Additive Model) [81], [101], [102] (cid:55) (cid:55)

ProtoAttend [104] (cid:55) (cid:88)

Other content-subject-speciﬁc models: N.A. N.A. Pre-deﬁned models+ Kinetic model for CBF (cerebral blood ﬂow) [130] N.A. (cid:88) + CNN for PK (Pharmacokinetic) modelling [131] N.A. (cid:88) + CNN for brain midline shift detection [132] N.A. (cid:88) + Group-driven RL (reinforcement learning) on personalized healthcare [133] N.A. (cid:88) + Also see [107]–[111] N.A. (cid:88)

PCA (Principal Components Analysis), SVD (Singular Value Decomposition) N.A. N.A. Correlation F ea t u r e E x t r ac ti on CCA (Canonical Correlation Analysis) [112] (cid:55) (cid:55)

SVCCA (Singular Vector Canonical Correlation Analysis) [96] = CCA+SVD (cid:55) (cid:88)

F-SVD (Frame Singular Value Decomposition) [113] on electromyography data (cid:55) (cid:55)

DWT (Discrete Wavelet Transform) + Neural Network [134] (cid:55) (cid:88)

MODWPT (Maximal Overlap Discrete Wavelet Package Transform) [135] (cid:55) (cid:55)

GAN-based Multi-stage PCA [117] (cid:88) (cid:55)

Estimating probability density with deep feature embedding [118] (cid:55) (cid:88) t-SNE (t-Distributed Stochastic Neighbour Embedding) [76] (cid:55) (cid:88)

Clustering+ t-SNE on CNN [119] (cid:55) (cid:88) + t-SNE, activation atlas on GoogleNet [120] (cid:55) (cid:88) + t-SNE on latent space in meta-material design [121] (cid:55) (cid:88) + t-SNE on genetic data [136] (cid:55) (cid:88) + mm-t-SNE on phenotype grouping [137] (cid:55) (cid:88)

Laplacian Eigenmaps visualization for Deep Generative Model [123] (cid:55) (cid:88)

KNN (k-nearest neighbour) on multi-center low-rank rep. learning (MCLRR) [124] (cid:55) (cid:88)

KNN with triplet loss and query-result activation map pair [138] (cid:55) (cid:88)

Group-based Interpretable NN with RW-based Graph Convolutional Layer [122] (cid:55) (cid:88)

TCAV (Testing with Concept Activation Vectors) [95] (cid:88) (cid:88) + RCV (Regression Concept Vectors) uses TCAV with Br score [139] (cid:55) (cid:88) + Concept Vectors with UBS [140] (cid:55) (cid:88) + ACE (Automatic Concept-based Explanations) [55] uses TCAV (cid:88) (cid:88)

Inﬂuence function [128] helps understand adversarial training points (cid:55) (cid:88)

Representer theorem [129] (cid:55) (cid:88)

SensitivitySocRat (Structured-output Causual Rationalizer) [126] (cid:55) (cid:88)

Meta-predictors [125] (cid:55) (cid:88)

Explanation vector [127] (cid:55) (cid:55) (cid:55) (cid:88)

OthersInformation theoretic: Information Bottleneck [97], [98] (cid:55) (cid:88)

Database of methods v.s. interpretability [10] N.A. N.A. Data Driven O t h e r P e r s p . Case-Based Reasoning [142] (cid:88) (cid:55)

Integrated Gradients [68], [93] (cid:55) (cid:88)

InvarianceInput invariance [70] (cid:55) (cid:88)

Application-based [143], [144]Human-based [145], [146] N.A. N.A. UtilitiesFunction-based [2], [5], [41]–[43], [95], [96], [143], [144] plan. The ML model produces a ”report” containing relevantpredicates (words) for the task of inferring what the ﬁnalplan is. The metric used for interpretability evaluation is,for example, the percentage of the predicates that appear,compared to human-made report. We believe the format ofhuman-based evaluation needs not be strictly like the above.For example, hybrid human and interactive ML classiﬁersrequire human users to nominate features for training [146].Two different standard MLs can be compared to the hybrid,and one can be said to be more interpretable than another if itpicks up features similar to the hybrid, assuming they performat similarly acceptable level.

Functions-based . Third, an evaluation is functionally- grounded if there exist proxies (which can be deﬁned a priori)for evaluation, for example sparsity [10]. Some papers [2],[5], [41]–[43], [95], [96], [143], [144] use metrics that rely onthis evaluation include many supervised learning models withclearly deﬁned metrics such as (1) Dice coefﬁcients (related tovisual interpretability), (2) attribution values, components ofcanonically transformed variables (see for example CCA) orvalues obtained from dimensionality reduction methods (suchas components of principal components from PCA and theircorresponding eigenvalues), where interpretability is relatedto the degree an object relates to a feature, for example,classiﬁcation of a dog has high values in the feature spacerelated to four limbs, shape of snout and paws etc. Which

OURNAL OF L A TEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 12 suitable metrics to use are highly dependent on the tasks athand. III. XAI IN MEDICAL FIELDML has also gained traction recently in the medical ﬁeld,with large volume of works on automated diagnosis, prog-nosis [149]. From the grand-challenge.org, we can see manydifferent challenges in the medical ﬁeld have emerged andgalvanized researches that use ML and AI methods. Amongstsuccessful deep learning models are [2], [5], using U-Net formedical segmentation. However, being a deep learning neuralnetwork, U-Net is still a blackbox; it is not very interpretable.Other domain speciﬁc methods and special transformations(denoising etc) have been published as well; consider forexample [130] and many other works in MICCAI publications.In the medical ﬁeld the question of interpretability is farfrom just intellectual curiosity. More speciﬁcally, it is pointedout that interpretabilities in the medical ﬁelds include factorsother ﬁelds do not consider, including risk and responsibilities[21], [150], [151]. When medical responses are made, livesmay be at stake. To leave such important decisions to machinesthat could not provide accountabilities would be akin toshirking the responsibilities altogether. Apart from ethicalissues, this is a serious loophole that could turn catastrophicwhen exploited with malicious intent.

TABLE IIIC

ATEGORIZATION BY THE ORGANS AFFECTED BY THE DISEASES . N

EURO * REFERS TO ANY NEUROLOGICAL , NEURODEVELOPMENTAL , NEURODEGENERATIVE ETC DISEASES . T

HE ROWS ARE ARRANGEDACCORDING TO THE FOCUS OF THE INTERPRETABILITY AS THEFOLLOWING : A

PPL .= APPLICATION , M

ETHOD .= METHODOLOGY ,C OMP .= COMPARISON

Appl. brain, neuro* breast [68], lung [6], [81],[47], [67], [130], [152] sleep [153], skin [154][131], [132], [135], [155] others [105]Method. brain, neuro* breast [64], [69], [139], [140][65], [66], [82], [90], [99] skin [138], heart [123][113], [122], [134], [156] others [43], [66], [137], [141]Comp. brain, neuro* [106], [157] lung [92], sleep [158]skin [159], other [136]

Many more works have thus been dedicated to exploringexplainability in the medical ﬁelds [11], [20], [43]. Theyprovide summaries of previous works [21] including subﬁeld-speciﬁc reviews such as [25] for chest radiograph and senti-ment analysis in medicine [160], or at least set aside a sectionto promote awareness for the importance of interpretability inthe medical ﬁeld [161]. In [162], it is stated directly that beinga black-box is a “strong limitation” for AI in dermatology,as it is not capable of performing customized assessment bycertiﬁed dermatologist that can be used to explain clinicalevidence. On the other hand, the exposition [163] argues thata certain degree of opaqueness is acceptable, i.e. it might bemore important that we produce empirically veriﬁed accurateresults than focusing too much on how to the unravel the black-box. We recommend readers to consider them ﬁrst, at least foran overview of interpretability in the medical ﬁeld.We apply categorization from the previous section to theML and AI in the medical ﬁeld. Table III shows catego-rization obtained by tagging (1) how interpretability method is incorporated: either through direct application of existingmethods, methodology improvements or comparison betweeninterpretability methods and (2) the organs targeted by thediseases e.g. brain, skin etc. As there is not yet a substantialnumber of signiﬁcant medical researches that address inter-pretability, we will refrain from presenting any conclusivetrend. However, from a quick overview, we see that theXAI research community might beneﬁt from more studiescomparing different existing methods, especially those withmore informative conclusion on how they contribute to inter-pretability.

A. Perceptive Interpretability

Medical data could come in the form of traditional 2Dimages or more complex formats such as NIFTI or DCOMwhich contain 3D images with multiple modalities and even4D images which are time-evolving 3D volumes. The dif-ﬁculties in using ML for these data include the following.Medical images are sometimes far less available in quantitythan common images. Obtaining these data requires consentfrom the patients and other administrative barriers. Highdimensional data also add complexity to data processing andthe large memory space requirement might prevent data to beinput without modiﬁcation, random sampling or down-sizing,which may compromise analysis. Other possible difﬁcultieswith data collection and management include as left/right-censoring, patients’ death due to unrelated causes or othercomplications etc.When medical data is available, ground-truth images maynot be correct. Not only do these data require some specializedknowledge to understand, the lack of comprehensive under-standing of biological components complicates the analysis.For example, ADC modality of MR images and the isotropicversion of DWI are in some sense derivative, since bothare computed from raw images collected by the scanner.Furthermore, many CT or MRI scans are presented with skull-stripping or other pre-processing. However, without a morecomplete knowledge of what ﬁne details might have beenaccidentally removed, we cannot guarantee that an algorithmcan capture the correct features.

A.1) Saliency

The following articles consist of direct applications ofexisting saliency methods. Chexpert [6] uses GradCAM forvisualization of pleural effusion in a radiograph. CAM isalso used for interpretability in brain tumour grading [152].Reference [67] uses Guided Grad-CAM and feature occlusion,providing complementary heatmaps for the classiﬁcation ofAlzheimer’s disease pathologies. Integrated gradient methodand SmoothGrad are applied for the visualization of CNNensemble that classiﬁes estrogen receptor status using breastMRI [68]. LRP on DeepLight [47] was applied on fMRIdata from Human Connectome Project to generate heatmapvisualization. Saliency map has also been computed usingprimitive gradient of loss, providing interpretability to theneural network used for EEG (Electroencephalogram) sleepstage scoring [153]. There has even been a direct comparisonbetween the feature maps within CNN and skin lesion images

OURNAL OF L A TEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 13 [154], overlaying the scaled feature maps on top of the imagesas a means to interpretability. Some images correspond torelevant features in the lesion, while others appear to explicitlycapture artifacts that might lead to prediction bias.The following articles are focused more on comparisonbetween popular saliency methods, including their deriva-tive/improved versions. Reference [158] trains an artiﬁcialneural network for the classiﬁcation of insomnia using phys-iological network (PN). The feature relevance scores arecomputed from several methods, including DeepLIFT [56].Comparison between 4 different visualizations is performed in[157]. It shows different attributions between different meth-ods and concluded that LRP and guided backpropagation pro-vide the most coherent attribution maps in their Alzheimer’sdisease study. Basic tests on GradCAM and SHAP on der-moscopy images for melanoma classiﬁcation are conducted,concluding with the need for signiﬁcant improvements toheatmaps before practical deployment [159].The following includes slightly different focus on method-ological improvements on top of the visualization. Respond-CAM [43] is derived from [41], [42], and provides a saliency-map in the form of heat-map on 3D images obtained fromCellular Electron Cryo-Tomography. High intensity in theheatmap marks the region where macromolecular complexesare present. Multi-layer class activation map (MLCAM) isintroduced in [90] for glioma (a type of brain tumor) lo-calization. Multi-instance (MI) aggregation method is usedwith CNN to classify breast tumour tissue microarray (TMA)image’s for 5 different tasks [64], for example the classiﬁ-cation of the histologic subtype. Super-pixel maps indicatethe region in each TMA image where the tumour cells are;each label corresponds to a class of tumour. These maps areproposed as the means for visual interpretability. Also, seethe activation maps in [65] where interpretability is studiedby corrupting image and inspecting region of interest (ROI).The autofocus module from [66] promises improvements invisual interpretability for segmentation on pelvic CT scansand segmentation of tumor in brain MRI using CNN. It usesattention mechanism (proposed by [91]) and improves it withadaptive selection of scale with which the network ”sees” anobject within an image. With the correct scale adopted bythe network while performing a single task, human observeranalysing the network can understand that a neural networkis properly identifying the object, rather than mistaking thecombination of the object plus the surrounding as the objectitself.There is also a different formulation for the generationof saliency maps [69]. It deﬁnes a different softmax-likeformula to extract signals from DNN for visual justiﬁcationin classiﬁcation of breast mass (malignant/benign). Textualjustiﬁcation is generated as well.

A.2) Verbal

In [81], a rule-based system could provide the statement hasasthma → lower risk , where risk here refers to death risk dueto pneumonia. Likewise, [82] creates a model called BayesianRule Lists that provides such statements for stroke prediction.Textual justiﬁcation is also provided in the LSTM-based breastmass classiﬁer system [69]. The argumentation theory is implemented in the machine learning training process [155],extracting arguments or decision rules as the explanations forthe prediction of stroke based on the Asymptomatic CarotidStenosis and Risk of Stroke (ACSRS) dataset.One should indeed look closer at the interpretability in[81]. Just as many MLs are able to extract some humanlynon-intuitive pattern, the rule-based system seems to havecaptured the strange link between asthma and pneumonia.The link becomes clear once the actual explanation basedon real situation is provided: a pneumonia patient which alsosuffers from asthma is often sent directly to the Intensive CareUnit (ICU) rather than a standard ward. Obviously, if thereis a variable ICU=0 or 1 that indicates admission to ICU,then a better model can provide more coherent explanation” asthma → ICU → lower risk ”. In the paper, the model appearsnot to identify such variable. We can see that interpretabilityissues are not always clear-cut.Several researches on Visual Question Answering in themedical ﬁeld have also been developed. The initiative byImageCLEF [164], [165] appears to be at its center, thoughVQA itself has yet to gain more traction and successful prac-tical demonstration in the medical sector before widespreadadoption. Challenges and Future Prospects . In many cases, wheresaliency maps are provided, they are provided with insufﬁcientevaluation with respect to their utilities within the medicalpractices. For example, when providing importance attributionto a CT scan used for lesion detection, are radiologistsinterested in heatmaps highlighting just the lesion? Are theymore interested in looking for reasons why a haemorrhageis epidural or subdural when the lesion is not very clear tothe naked eyes? There may be many such medically-relatedsubtleties that interpretable AI researchers may need to knowabout.

B. Interpretability via Mathematical StructureB.1) Pre-deﬁned Model

Models help with interpretability by providing a genericsense of what a variable does to the output variable in question,whether in medical ﬁelds or not. A parametric model is usuallydesigned with at least an estimate of the working mechanismof the system, with simpliﬁcation and based on empiricallyobserved patterns. For example, [130] uses kinetic model forthe cerebral blood ﬂow in ml/ g/min with

CBF = f (∆ M ) 6000 β ∆ M exp ( P LDT b )2 αT b ( SI P D )(1 − exp ( − τT b )) (1)which depends on perfusion-weighted image ∆ M obtainedfrom the signal difference between labelled image of arterialblood water treated with RF pulses and the control image. Thisfunction is incorporated in the loss function in the trainingpipeline of a fully convolutional neural network. At least, aninterpretation can be made partially: the neural network modelis designed to denoise a perfusion-weighted image (and thusimprove its quality) by considering CBF. How the networkunderstands the CBF is again an interpretability problem of aneural network which has yet to be resolved. OURNAL OF L A TEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 14

There is an inherent simplicity in the interpretability ofmodels based on linearity, and thus they have been consideredobviously interpretable as well; some examples include linearcombination of clinical variables [99], metabolites signals forMRS [105] etc. Linearity in different models used in theestimation of brain states is discussed in [106], including howit is misinterpreted. It compares what it refers to as forwardand backward models and then suggested improvement onlinear models. In [81], a logistic regression model pickedup a relation between asthma and lower risk of pneumoniadeath, i.e. asthma has a negative weight as a risk predictorin the regression model. Generative Discriminative Machine(GDM) combines ordinary least square regression and ridgeregression to handle confounding variables in Alzheimersdisease and schizophrenia dataset [99]. GDM parameters aresaid to be interpretable, since they are linear combinationsof the clinical variables. Deep learning has been used forPET pharmacokinetic (PK) modelling to quantify tracer targetdensity [131]. CNN has helped PK modelling as a part of asequence of processes to reduce PET acquisition time, and theoutput is interpreted with respect to the golden standard PKmodel, which is the linearized version of Simpliﬁed ReferenceTissue Model (SRTM). Deep learning method is also usedto perform parameters ﬁtting for Magnetic Resonance Spec-troscopy (MRS) [105]. The parametric part of the MRS signalmodel speciﬁed, x ( t ) = Σ a m x m ( t ) e ∆ α m t +2 πi ∆ f m t , consistsof linear combination of metabolite signals x m ( t ) . The papershows that the error measured in SMAPE (symmetric meanabsolute percentage error) is smallest for most metaboliteswhen their CNN model is used. In cases like this, cliniciansmay ﬁnd the model interpretable as long as the parametersare well-ﬁt, although the neural network itself may still not beinterpretable.The models above use linearity for studies related to brain orneuro-related diseases. Beyond linear models, other brain andneuro-systems can be modelled with relevant subject-contentknowledge for better interpretability as well. Segmentationtask for the detection of brain midline shift is performed usingusing CNN with standard structural knowledge incorporated[132]. A template called model-derived age norm is derivedfrom mean values of sleep EEG features of healthy subjects[156]. Interpretability is given as the deviation of the featuresof unhealthy subject from the age norm.On a different note, reinforcement learning (RL) has beenapplied to personalized healthcare. In particular, [133] intro-duces group-driven RL in personalized healthcare, taking intoconsiderations different groups, each having similar agents.As usual, Q-value is optimized w.r.t policy π θ , which can bequalitatively interpreted as the maximization of rewards overtime over the choices of action selected by many participatingagents in the system. Challenges and Future Prospects . Models may be simplify-ing intractable system. As such, the full potential of machinelearning, especially DNN with huge number of parameters,may be under-used. A possible research direction that tapsonto the hype of predictive science is as the following: givena model, is it possible to augment the model with new,sophisticated components, such that parts of these components can be identiﬁed with (and thus interpreted as) new insights?Naturally, the augmented model needs to be comparable toprevious models and shown with clear interpretation why thenew components correspond to insights previously missed.Do note that there are critiques against the hype around thepotential of AI which we will leave to the readers.

B.2) Feature extraction

Vanilla CNN is used in [141] but it is suggested thatinterpretability can be attained by using a separable model.The separability is achieved by polynomial-transforming scalarvariables and further processing, giving rise to weights usefulfor interpretation. In [122], fMRI is analyzed using correlation-based functional graphs. They are then clustered into super-graph, consisting of subnetworks that are deﬁned to be in-terpretable. A convolutional layer is then used on the super-graph. For more references about neural networks designed forgraph-based problems, see the papers citations. The followingare further sub-categorization for methods that revolve aroundfeature extraction and the evaluations or measurements (suchas correlations) used to obtain the features, similar to theprevious section.

Correlation . DWT-based method (discrete wavelet trans-form) is used to perform feature extraction before eventuallyfeeding the EEG data (after a series of processings) intoa neural network for epilepsy classiﬁcation [134]. A fuzzyrelation analogous to correlation coefﬁcient is then deﬁned.Furthermore, as with other transform methods, the components(the wavelets) can be interpreted component-wise. As a simpleillustration, the components for Fourier transform could betaken as how much certain frequency is contained in a timeseries. Reference [135] mentioned a host of wavelet-basedfeature extraction methods and introduced maximal overlapdiscrete wavelet package transform (MODWPT) also appliedon EEG data for epilepsy classiﬁcation.Frame singular value decomposition (F-SVD) is introducedfor classiﬁcations of electromyography (EMG) data [113]. Itis a pipeline involving a number of processing that includesDWT, CCA and SVD, achieving around accuracies onclassiﬁcations between amyotrophic lateral sclerosis, myopa-thy and healthy subjects. Consider also CCA-based papers thatare cited in the paper, in particular citations 18 to 21 for EMGand EEG signals.

Clustering . VAE is used to obtain vectors in 64-dimensionallatent dimension in order to predict whether the subjects sufferfrom hypertrophic cardiomyopathy (HCM) [123]. A non-lineartransformation is used to create Laplacian Eigenmap (LE)with two dimensions, which is suggested as the means forinterpretability. Skin images are clustered [138] for melanomaclassiﬁcation using k-nearest-neighbour that is customizedto include CNN and triplet loss. A queried image is thencompared with training images ranked according to similaritymeasure visually displayed as query-result activation mappair .t-SNE has been applied on human genetic data and shownto provide more robust dimensionality reduction compared toPCA and other methods [136]. Multiple maps t-SNE (mm-t-SNE) is introduced by [137], performing clustering on pheno-type similarity data.

OURNAL OF L A TEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 15

Sensitivity . Regression Concept Vectors (RCV) is proposedalong with a metric Br score as improvements to TCAV’sconcept separation [139]. The method is applied on breastcancer histopathology classiﬁcation problem. Furthermore,Unit Ball Surface Sampling metric (UBS) is introduced [140]to address the shortcoming of Br score. It uses neural net-works for classiﬁcation of nodules for mammographic images.Guidelinebased Additive eXplanation (GAX) is introduced in[92] for diagnosis using CT lung images. Its pipeline includesLIME-like perturbation analysis and SHAP. Comparisons arethen made with LIME, Grad-CAM and feature importancegenerated by SHAP. Challenges and Future Prospects . We observe popularuses of certain methods ingrained in speciﬁc sectors on theone hand and, on the other hand, emerging applications ofsophisticated ML algorithms. As medical ML (in particularthe application of recently successful DNN) is still a youngﬁeld, we see fragmented and experimental uses of existing orcustomized interpretable methods. As medical ML researchprogresses, the trade-off between many practical factors ofML methods (such as ease of use, ease of interpretation ofmathematical structure possibly regarded as complex) andits contribution to the subject matter will become clearer.Future research and application may beneﬁt from a practiceof consciously and consistently extracting interpretable in-formation for further processing, and the process should besystematically documented for good dissemination. Currently,with feature selections and extractions focused on improvingaccuracy and performance, we may still have vast unexploredopportunities in interpretability research.

C. Other PerspectivesData-driven . Case-Based Reasoning (CBR) performs med-ical evaluation (classiﬁcations etc) by comparing a query case(new data) with similar existing data from a database. [142]combines CBR with an algorithm that presents the similaritybetween these cases by visually providing proxies and mea-sures for users to interpret. By observing these proxies, theuser can decide to take the decision suggested by the algorithmor not. The paper also asserts that medical experts appreciatesuch visual information with clear decision-support system.

D. Risk of Machine Interpretation in Medical FieldJumping conclusion . According to [81], logical statementssuch as has asthma → lower risk are considered interpretable.However, in the example, the statement indicates that a patientwith asthma has lower risk of death from pneumonia, whichmight be strange without any clariﬁcation from the intermedi-ate thought process. While human can infer that the loweredrisk is due to the fact that pneumonia patients with asthmahistory tend to be given more aggressive treatment, we cannotalways assume there is a similar humanly inferable reason be-hind each decision. Furthermore, interpretability method suchas LRP, deconvolution and guided backpropagation introducedearlier are shown to not work for simple model, such as linearmodel, bringing into question their reliability [59]. Manipulation of explanations . Given an image, a similarimage can be generated that is perceptibly indistinguishablefrom the original, yet produces radically different output [94].Naturally, its signiﬁcance attribution and interpretable infor-mation become unreliable. Furthermore, explanation can evenbe manipulated arbitrarily [166]. For example, an explanationfor the classiﬁcation of a cat image (i.e. particular signiﬁcantvalues that contribute to the prediction of cat) can be implantedinto the image of a dog, and the algorithm could be fooled intoclassifying the dog image as a cat image. The risk in medicalﬁeld is clear: even without malicious, intentional manipulation,noises can render explanations wrong. Manipulation of algo-rithm that is designed to provide explanation is also exploredin [167].

Incomplete constraints . In [130], the loss function for thetraining of a fully convolutional network includes CBF as aconstraint. However, many other constraints may play impor-tant roles in the mechanism of a living organ or tissue, notto mention applying kinetic model is itself a simpliﬁcation.Giving an interpretation within limited constraints may placeundue emphasis on the constraint itself. Other works that usepredeﬁned models might suffer similar problems [99], [105],[131].

Noisy training data . The so-called ground truths for medicaltasks, provided by professionals, are not always absolutelycorrect. In fact, news regarding how AI beats human per-formance in medical imaging diagnosis [168] indicates thathuman judgment could be brittle. This is true even of trainedmedical personnel. This might give rise to the classic garbage-in-garbage-out situation.The above risks are presented in large part as a reminderof the nature of automation. It is true that algorithms havebeen used to extract invisible patterns with some successes.However, one ought to view scientiﬁc problems with thecorrect order of priority. The society should not risk over-allocating resources into building machine and deep learningmodels, especially since due improvements to understandingthe underlying science might be the key to solving the rootproblem. For example, higher quality MRI scans might revealkey information not visible with current technology, and manymodels built nowadays might not be very successful becausethere is simply not enough detailed information contained incurrently available MRI scans.IV. C

ONCLUSION

We present a survey on interpretability and explainability ofML algorithms in general, and place different interpretationssuggested by different research works into distinct categories.From general interpretabilities, we apply the categorizationinto the medical ﬁeld. Some attempts are made to formalizeinterpretabilities mathematically, some provide visual expla-nations, while others might focus on the improvement intask performance after being given explanations produced byalgorithms.

Future directions for clinicians and practitioners . Visualand textual explanation supplied by an algorithm might seemlike the obvious choice; unfortunately, the details of decision-making by algorithms such as deep neural networks are still

OURNAL OF L A TEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 16 not clearly exposed. When an otherwise reliable deep learningmodel provides a strangely wrong visual or textual explana-tion, systematic methods to probe into the wrong explanationsdo not seem to exist, let alone methods to correct them. Aspecialized education combining medical expertise, appliedmathematics, data science etc might be necessary to overcomethis. For now, if ”interpretable” algorithms are deployed inmedical practices, human supervision is still necessary. In-terpretability information should be considered nothing morethan complementary support for the medical practices beforethere is a robust way to handle interpretability.

Future directions for algorithm developers and researchers .Before the blackbox is un-blackboxed, machine decision al-ways carries some exploitable risks. It is also clear thata uniﬁed notion of interpretability is elusive. For medicalML interpretability, more comparative studies between theperformance of methods will be useful. The interpretabilityoutput such as heatmaps should be displayed and comparedclearly, including poor results. In the best case scenario,clinicians and practitioners recognize the shortcomings ofinterpretable methods but have a general idea on how tohandle them in ways that are suitable to medical practices.In the worst case scenario, the inconsistencies between thesemethods can be exposed. The very troubling trend of journalpublications emphasizing good results is precarious, and weshould thus continue interpretability research with a mindsetopen to evaluation from all related parties. Clinicians andpractitioners need to be given the opportunity for fair judgmentof utilities of the proposed interpretability methods, not justﬂooded with performance metrics possibly irrelevant to theadoption of medical technology.Also, there may be a need to shift interpretability study awayfrom algorithm-centric studies. An authoritative body settingup the standard of requirements for the deployment of modelbuilding might stiﬂe the progress of the research itself, thoughit might be the most efﬁcient way to reach an agreement.This might be necessary to prevent damages, seeing that evencorporate companies and other bodies non-academic in thetraditional sense have joined the fray (consider health-techstart-ups and the implications). Acknowledging that machineand deep learning might not be fully mature for large-scaledeployment, it might be wise to deploy the algorithms as asecondary support system for now and leave most decisionsto the traditional methods. It might take a long time beforehumanity graduates from this stage, but it might be timely:we can collect more data to compare machine predictions withtraditional predictions and sort out data ownership issues alongthe way. A

CKNOWLEDGMENT

This research was supported by Alibaba Group HoldingLimited, DAMO Academy, Health-AI division under Alibaba-NTU Talent Program. The program is the collaboration be-tween Alibaba and Nanyang Technological university, Singa-pore. R

EFERENCES[1] Eun-Jae Lee, Yong-Hwan Kim, Namkug Kim, and Dong-Wha Kang.Deep into the brain: Artiﬁcial intelligence in stroke imaging.

Journalof Stroke , 19:277–285, 09 2017.[2] Olaf Ronneberger, Philipp Fischer, and Thomas Brox. U-net: Con-volutional networks for biomedical image segmentation.

CoRR ,abs/1505.04597, 2015.[3] Mary T. Dzindolet, Scott A. Peterson, Regina A. Pomranky, Linda G.Pierce, and Hall P. Beck. The role of trust in automation reliance.

Int.J. Hum.-Comput. Stud. , 58(6):697718, June 2003.[4] Liang Chen, Paul Bentley, and Daniel Rueckert. Fully automaticacute ischemic lesion segmentation in dwi using convolutional neuralnetworks.

NeuroImage: Clinical , 15:633 – 643, 2017.[5] ¨Ozg¨un C¸ ic¸ek, Ahmed Abdulkadir, Soeren S. Lienkamp, Thomas Brox,and Olaf Ronneberger. 3d u-net: Learning dense volumetric segmen-tation from sparse annotation.

CoRR , abs/1606.06650, 2016.[6] Jeremy Irvin, Pranav Rajpurkar, Michael Ko, Yifan Yu, Silviana Ciurea-Ilcus, Chris Chute, Henrik Marklund, Behzad Haghgoo, Robyn L. Ball,Katie S. Shpanskaya, Jayne Seekins, David A. Mong, Safwan S. Halabi,Jesse K. Sandberg, Ricky Jones, David B. Larson, Curtis P. Langlotz,Bhavik N. Patel, Matthew P. Lungren, and Andrew Y. Ng. Chexpert:A large chest radiograph dataset with uncertainty labels and expertcomparison.

CoRR , abs/1901.07031, 2019.[7] Fausto Milletari, Nassir Navab, and Seyed-Ahmad Ahmadi. V-net:Fully convolutional neural networks for volumetric medical imagesegmentation.

CoRR , abs/1606.04797, 2016.[8] Liang-Chieh Chen, George Papandreou, Iasonas Kokkinos, Kevin Mur-phy, and Alan L. Yuille. Deeplab: Semantic image segmentation withdeep convolutional nets, atrous convolution, and fully connected crfs.

CoRR , abs/1606.00915, 2016.[9] Christopher J. Kelly, Alan Karthikesalingam, Mustafa Suleyman, GregCorrado, and Dominic King. Key challenges for delivering clinicalimpact with artiﬁcial intelligence.

BMC Medicine , 17(1):195, 2019.[10] Finale Doshi-Velez and Been Kim. Towards a rigorous science ofinterpretable machine learning, 2017. cite arxiv:1702.08608.[11] Sana Tonekaboni, Shalmali Joshi, Melissa D. McCradden, and AnnaGoldenberg. What clinicians want: Contextualizing explainable ma-chine learning for clinical end use.

CoRR , abs/1905.05134, 2019.[12] Jonathan L. Herlocker, Joseph A. Konstan, and John Riedl. Explainingcollaborative ﬁltering recommendations. In

Proceedings of the 2000ACM Conference on Computer Supported Cooperative Work , CSCW00, page 241250, New York, NY, USA, 2000. Association for Com-puting Machinery.[13] Sebastian Lapuschkin, Stephan W¨aldchen, Alexander Binder, Gr´egoireMontavon, Wojciech Samek, and Klaus-Robert M¨uller. Unmaskingclever hans predictors and assessing what machines really learn.

NatureCommunications , 10(1):1096, 2019.[14] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. why should itrust you?: Explaining the predictions of any classiﬁer. In

Proceedingsof the 22nd ACM SIGKDD International Conference on KnowledgeDiscovery and Data Mining , KDD 16, page 11351144, New York,NY, USA, 2016. Association for Computing Machinery.[15] Zachary Chase Lipton. The mythos of model interpretability.

CoRR ,abs/1606.03490, 2016.[16] F. K. Doilovi, M. Bri, and N. Hlupi. Explainable artiﬁcial intelligence:A survey. In , pages 0210–0215, 2018.[17] L. H. Gilpin, D. Bau, B. Z. Yuan, A. Bajwa, M. Specter, and L. Kagal.Explaining explanations: An overview of interpretability of machinelearning. In , pages 80–89, 2018.[18] Alejandro [Barredo Arrieta], Natalia Daz-Rodrguez, Javier [Del Ser],Adrien Bennetot, Siham Tabik, Alberto Barbado, Salvador Garcia,Sergio Gil-Lopez, Daniel Molina, Richard Benjamins, Raja Chatila,and Francisco Herrera. Explainable artiﬁcial intelligence (xai): Con-cepts, taxonomies, opportunities and challenges toward responsible ai.

Information Fusion , 58:82 – 115, 2020.[19] Surjo R. Soekadar, Niels Birbaumer, Marc W. Slutzky, and Leonardo G.Cohen. Brainmachine interfaces in neurorehabilitation of stroke.

Neurobiology of Disease , 83:172 – 179, 2015.[20] Andreas Holzinger, Georg Langs, Helmut Denk, Kurt Zatloukal, andHeimo Mller. Causability and explainability of artiﬁcial intelligence inmedicine.

WIREs Data Mining and Knowledge Discovery , 9(4):e1312,2019.

OURNAL OF L A TEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 17 [21] Yao Xie, Ge Gao, and Xiang ’Anthony’ Chen. Outlining the designspace of explainable intelligent systems for medical diagnosis.

CoRR ,abs/1902.06019, 2019.[22] Alfredo Vellido. The importance of interpretability and visualization inmachine learning for applications in medicine and health care.

NeuralComputing and Applications , 2019.[23] Eric J. Topol. High-performance medicine: the convergence of humanand artiﬁcial intelligence.

Nature Medicine , 25(1):44–56, 2019.[24] A. Fernandez, F. Herrera, O. Cordon, M. Jose del Jesus, and F. Marcel-loni. Evolutionary fuzzy systems for explainable artiﬁcial intelligence:Why, when, what for, and where to?

IEEE Computational IntelligenceMagazine , 14(1):69–81, Feb 2019.[25] K. Kallianos, J. Mongan, S. Antani, T. Henry, A. Taylor, J. Abuya,and M. Kohli. How far have we come?: Artiﬁcial intelligence forchest radiograph interpretation.

Clinical Radiology , 74(5):338–345,May 2019.[26] Grgoire Montavon, Wojciech Samek, and Klaus-Robert Mller. Methodsfor interpreting and understanding deep neural networks.

Digital SignalProcessing , 73:1 – 15, 2018.[27] Wojciech Samek, Thomas Wiegand, and Klaus-Robert M¨uller. Explain-able artiﬁcial intelligence: Understanding, visualizing and interpretingdeep learning models.

CoRR , abs/1708.08296, 2017.[28] Laura Rieger, Pattarawat Chormai, Gr´egoire Montavon, Lars KaiHansen, and Klaus-Robert M¨uller.

Explainable and InterpretableModels in Computer Vision and Machine Learning , chapter StructuringNeural Networks for More Explainable Predictions, pages 115–131.Springer International Publishing, Cham, 2018.[29] Soﬁa Meacham, Georgia Isaac, Detlef Nauck, and Botond Virginas.Towards explainable ai: Design and development for explanationof machine learning predictions for a patient readmittance medicalapplication. In Kohei Arai, Rahul Bhatia, and Supriya Kapoor,editors,

Intelligent Computing , pages 939–955, Cham, 2019. SpringerInternational Publishing.[30] J. Townsend, T. Chaton, and J. M. Monteiro. Extracting relationalexplanations from deep neural networks: A survey from a neural-symbolic perspective.

IEEE Transactions on Neural Networks andLearning Systems

Human BrainMapping , 41(6):1435–1444, 2020.[33] Miles Brundage, Shahar Avin, Jasmine Wang, Haydn Belﬁeld,Gretchen Krueger, Gillian Hadﬁeld, Heidy Khlaaf, Jingying Yang,Helen Toner, Ruth Fong, Tegan Maharaj, Pang Wei Koh, Sara Hooker,Jade Leung, Andrew Trask, Emma Bluemke, Jonathan Lebensold,Cullen O’Keefe, Mark Koren, Tho Ryffel, JB Rubinovitz, Tamay Be-siroglu, Federica Carugati, Jack Clark, Peter Eckersley, Sarah de Haas,Maritza Johnson, Ben Laurie, Alex Ingerman, Igor Krawczuk, AmandaAskell, Rosario Cammarota, Andrew Lohn, David Krueger, CharlotteStix, Peter Henderson, Logan Graham, Carina Prunkl, Bianca Martin,Elizabeth Seger, Noa Zilberman, Sen higeartaigh, Frens Kroeger,Girish Sastry, Rebecca Kagan, Adrian Weller, Brian Tse, ElizabethBarnes, Allan Dafoe, Paul Scharre, Ariel Herbert-Voss, Martijn Rasser,Shagun Sodhani, Carrick Flynn, Thomas Krendl Gilbert, Lisa Dyer,Saif Khan, Yoshua Bengio, and Markus Anderljung. Toward trust-worthy ai development: Mechanisms for supporting veriﬁable claims,2020.[34] Nov 2019. https://ec.europa.eu/digital-single-market/en/news/ethics-guidelines-trustworthy-ai.[35] Danding Wang, Qian Yang, Ashraf Abdul, and Brian Y. Lim. Designingtheory-driven user-centric explainable ai. In

Proceedings of the 2019CHI Conference on Human Factors in Computing Systems , CHI 19,New York, NY, USA, 2019. Association for Computing Machinery.[36] D. Bau, B. Zhou, A. Khosla, A. Oliva, and A. Torralba. Networkdissection: Quantifying interpretability of deep visual representations.In , pages 3319–3327, 2017.[37] Chris Olah, Alexander Mordvintsev, and Ludwig Schubert. Featurevisualization.

Distill , 2(11), November 2017.[38] Chris Olah, Arvind Satyanarayan, Ian Johnson, Shan Carter, LudwigSchubert, Katherine Ye, and Alexander Mordvintsev. The buildingblocks of interpretability, Jan 2020.[39] Scott M Lundberg and Su-In Lee. A uniﬁed approach to interpretingmodel predictions. In I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach,R. Fergus, S. Vishwanathan, and R. Garnett, editors,

Advances in Neural Information Processing Systems 30 , pages 4765–4774. CurranAssociates, Inc., 2017.[40] Alon Jacovi, Oren Sar Shalom, and Yoav Goldberg. Understand-ing convolutional neural networks for text classiﬁcation.

CoRR ,abs/1809.08037, 2018.[41] B. Zhou, A. Khosla, A. Lapedriza, A. Oliva, and A. Torralba. Learningdeep features for discriminative localization. In , pages 2921–2929, June 2016.[42] Ramprasaath R. Selvaraju, Abhishek Das, Ramakrishna Vedantam,Michael Cogswell, Devi Parikh, and Dhruv Batra. Grad-cam: Whydid you say that? visual explanations from deep networks via gradient-based localization.

CoRR , abs/1610.02391, 2016.[43] Guannan Zhao, Bo Zhou, Kaiwen Wang, Rui Jiang, and Min Xu.Respond-cam: Analyzing deep models for 3d imaging data by visual-izations. In Alejandro F. Frangi, Julia A. Schnabel, Christos Davatzikos,Carlos Alberola-L´opez, and Gabor Fichtinger, editors,

Medical ImageComputing and Computer Assisted Intervention – MICCAI 2018 , pages485–492, Cham, 2018. Springer International Publishing.[44] Sebastian Bach, Alexander Binder, Grgoire Montavon, FrederickKlauschen, Klaus-Robert Mller, and Wojciech Samek. On pixel-wiseexplanations for non-linear classiﬁer decisions by layer-wise relevancepropagation.

PLOS ONE , 10(7):1–46, 07 2015.[45] W. Samek, A. Binder, G. Montavon, S. Lapuschkin, and K. Mller.Evaluating the visualization of what a deep neural network haslearned.

IEEE Transactions on Neural Networks and Learning Systems ,28(11):2660–2673, 2017.[46] S¨oren Becker, Marcel Ackermann, Sebastian Lapuschkin, Klaus-RobertM¨uller, and Wojciech Samek. Interpreting and explaining deep neuralnetworks for classiﬁcation of audio signals.

CoRR , abs/1807.03418,2018.[47] A. W. Thomas, H. R. Heekeren, K. R. M¨uller, and W. Samek. Ana-lyzing Neuroimaging Data Through Recurrent Deep Learning Models.

Front Neurosci , 13:1321, 2019.[48] Leila Arras, Franziska Horn, Gr´egoire Montavon, Klaus-Robert M¨uller,and Wojciech Samek. ”what is relevant in a text document?”: Aninterpretable machine learning approach.

CoRR , abs/1612.07843, 2016.[49] V. Srinivasan, S. Lapuschkin, C. Hellge, K. Mller, and W. Samek.Interpretable human action recognition in compressed domain. In

Advances in Neural Information Processing Systems 32 , pages9277–9286. Curran Associates, Inc., 2019.[56] Avanti Shrikumar, Peyton Greenside, and Anshul Kundaje. Learningimportant features through propagating activation differences.

CoRR ,abs/1704.02685, 2017.[57] Luisa M. Zintgraf, Taco S. Cohen, Tameem Adel, and Max Welling.Visualizing deep neural network decisions: Prediction difference anal-ysis.

CoRR , abs/1702.04595, 2017.[58] Yanzhao Zhou, Yi Zhu, Qixiang Ye, Qiang Qiu, and Jianbin Jiao.Weakly supervised instance segmentation using class peak response.

CoRR , abs/1804.00880, 2018.[59] Pieter-Jan Kindermans, Kristof T. Schtt, Maximilian Alber, Klaus-Robert Mller, Dumitru Erhan, Been Kim, and Sven Dhne. Learninghow to explain neural networks: Patternnet and patternattribution, 2017.[60] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Vigas, and MartinWattenberg. Smoothgrad: removing noise by adding noise, 2017. https://pair-code.github.io/saliency/.[61] Leila Arras, Gr´egoire Montavon, Klaus-Robert M¨uller, and WojciechSamek. Explaining recurrent neural network predictions in sentimentanalysis. In

Proceedings of the 8th Workshop on Computational

OURNAL OF L A TEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 18

Approaches to Subjectivity, Sentiment and Social Media Analysis , pages159–168, Copenhagen, Denmark, September 2017. Association forComputational Linguistics.[62] Andrej Karpathy, Justin Johnson, and Li Fei-Fei. Visualizing andunderstanding recurrent networks, 2015.[63] Magdalini Paschali, Sailesh Conjeti, Fernando Navarro, and NassirNavab. Generalizability vs. robustness: Investigating medical imagingnetworks using adversarial examples. In Alejandro F. Frangi, Julia A.Schnabel, Christos Davatzikos, Carlos Alberola-L´opez, and GaborFichtinger, editors,

Medical Image Computing and Computer AssistedIntervention – MICCAI 2018 , pages 493–501, Cham, 2018. SpringerInternational Publishing.[64] Heather D. Couture, J. S. Marron, Charles M. Perou, Melissa A.Troester, and Marc Niethammer. Multiple instance learning forhetero-geneous images: Training acnn for histopathology. In Alejandro F.Frangi, Julia A. Schnabel, Christos Davatzikos, Carlos Alberola-L´opez,and Gabor Fichtinger, editors,

Medical Image Computing and Com-puter Assisted Intervention – MICCAI 2018 , pages 254–262, Cham,2018. Springer International Publishing.[65] Xiaoxiao Li, Nicha C. Dvornek, Juntang Zhuang, Pamela Ventola,and James S. Duncan. Brain biomarker interpretation in asd usingdeep learning and fmri. In Alejandro F. Frangi, Julia A. Schnabel,Christos Davatzikos, Carlos Alberola-L´opez, and Gabor Fichtinger,editors,

Medical Image Computing and Computer Assisted Intervention– MICCAI 2018 , pages 206–214, Cham, 2018. Springer InternationalPublishing.[66] Yao Qin, Konstantinos Kamnitsas, Siddharth Ancha, Jay Nanavati,Garrison W. Cottrell, Antonio Criminisi, and Aditya V. Nori. Autofocuslayer for semantic segmentation.

CoRR , abs/1805.08403, 2018.[67] Ziqi Tang, Kangway V. Chuang, Charles DeCarli, Lee-Way Jin, LaurelBeckett, Michael J. Keiser, and Brittany N. Dugger. Interpretableclassiﬁcation of alzheimer’s disease pathologies with a convolutionalneural network pipeline.

Nature Communications , 10(1):2173, 2019.[68] Zachary Papanastasopoulos, Ravi K. Samala, Heang-Ping Chan,Lubomir Hadjiiski, Chintana Paramagul, Mark A. Helvie M.D., andColleen H. Neal M.D. Explainable AI for medical imaging: deep-learning CNN ensemble for classiﬁcation of estrogen receptor statusfrom breast MRI. In Horst K. Hahn and Maciej A. Mazurowski, editors,

Medical Imaging 2020: Computer-Aided Diagnosis , volume 11314,pages 228 – 235. International Society for Optics and Photonics, SPIE,2020.[69] Hyebin Lee, Seong Tae Kim, and Yong Man Ro. Generation ofmultimodal justiﬁcation using visual word constraint model for ex-plainable computer-aided diagnosis. In Kenji Suzuki, Mauricio Reyes,Tanveer Syeda-Mahmood, Ben Glocker, Roland Wiest, Yaniv Gur,Hayit Greenspan, and Anant Madabhushi, editors,

Interpretability ofMachine Intelligence in Medical Image Computing and MultimodalLearning for Clinical Decision Support , pages 21–29, Cham, 2019.Springer International Publishing.[70] Pieter-Jan Kindermans, Sara Hooker, Julius Adebayo, MaximilianAlber, Kristof T. Sch¨utt, Sven D¨ahne, Dumitru Erhan, and Been Kim.

The (Un)reliability of Saliency Methods , pages 267–280. SpringerInternational Publishing, Cham, 2019.[71] Matthew D. Zeiler and Rob Fergus. Visualizing and understandingconvolutional networks.

CoRR , abs/1311.2901, 2013.[72] Aravindh Mahendran and Andrea Vedaldi. Understanding deep imagerepresentations by inverting them.

CoRR , abs/1412.0035, 2014.[73] Alexey Dosovitskiy and Thomas Brox. Inverting convolutional net-works with convolutional networks.

CoRR , abs/1506.02753, 2015.[74] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, andMartin Riedmiller. Striving for simplicity: The all convolutional net,2014.[75] Dumitru Erhan, Yoshua Bengio, Aaron Courville, and Pascal Vincent.Visualizing higher-layer features of a deep network. Technical Report1341, University of Montreal, June 2009. Also presented at the ICML2009 Workshop on Learning Feature Hierarchies, Montr´eal, Canada.[76] Anh Mai Nguyen, Jason Yosinski, and Jeff Clune. Multifaceted featurevisualization: Uncovering the different types of features learned by eachneuron in deep neural networks.

CoRR , abs/1602.03616, 2016.[77] Jason Yosinski, Jeff Clune, Anh Mai Nguyen, Thomas J. Fuchs, andHod Lipson. Understanding neural networks through deep visualiza-tion.

CoRR , abs/1506.06579, 2015.[78] C. Szegedy, Wei Liu, Yangqing Jia, P. Sermanet, S. Reed, D. Anguelov,D. Erhan, V. Vanhoucke, and A. Rabinovich. Going deeper withconvolutions. In , pages 1–9, 2015. [79] Richard Meyes, Melanie Lu, Constantin Waubert de Puiseau, andTobias Meisen. Ablation studies in artiﬁcial neural networks.

CoRR ,abs/1901.08644, 2019.[80] Richard Meyes, Constantin Waubert de Puiseau, Andres Posada-Moreno, and Tobias Meisen. Under the hood of neural networks: Char-acterizing learned representations by functional neuron populations andnetwork ablations, 2020.[81] Rich Caruana, Yin Lou, Johannes Gehrke, Paul Koch, Marc Sturm,and Noemie Elhadad. Intelligible models for healthcare: Predictingpneumonia risk and hospital 30-day readmission. In

Proceedings of the21th ACM SIGKDD International Conference on Knowledge Discoveryand Data Mining , KDD 15, page 17211730, New York, NY, USA,2015. Association for Computing Machinery.[82] Benjamin Letham, Cynthia Rudin, Tyler H. McCormick, and DavidMadigan. Interpretable classiﬁers using rules and bayesian analysis:Building a better stroke prediction model.

The Annals of AppliedStatistics , 9(3):13501371, Sep 2015.[83] Isaac Lage, Emily Chen, Jeffrey He, Menaka Narayanan, Been Kim,Sam Gershman, and Finale Doshi-Velez. An evaluation of the human-interpretability of explanation.

CoRR , abs/1902.00006, 2019.[84] Himabindu Lakkaraju, Ece Kamar, Rich Caruana, and Jure Leskovec.Faithful and customizable explanations of black box models. In

Pro-ceedings of the 2019 AAAI/ACM Conference on AI, Ethics, and Society ,AIES 19, page 131138, New York, NY, USA, 2019. Association forComputing Machinery.[85] Tao Lei, Regina Barzilay, and Tommi Jaakkola. Rationalizing neuralpredictions. In

Proceedings of the 2016 Conference on EmpiricalMethods in Natural Language Processing , pages 107–117, Austin,Texas, November 2016. Association for Computational Linguistics.[86] Pei Guo, Connor Anderson, Kolten Pearson, and Ryan Farrell. Neuralnetwork interpretation via ﬁne grained textual summarization.

CoRR ,abs/1805.08969, 2018.[87] Aishwarya Agrawal, Jiasen Lu, Stanislaw Antol, Margaret Mitchell,C. Lawrence Zitnick, Devi Parikh, and Dhruv Batra. Vqa: Visualquestion answering.

Int. J. Comput. Vision , 123(1):431, May 2017.[88] Jiasen Lu, Jianwei Yang, Dhruv Batra, and Devi Parikh. Hierarchicalquestion-image co-attention for visual question answering. In

Pro-ceedings of the 30th International Conference on Neural InformationProcessing Systems , NIPS16, page 289297, Red Hook, NY, USA, 2016.Curran Associates Inc.[89] Abhishek Das, Satwik Kottur, Khushi Gupta, Avi Singh, Deshraj Yadav,Jose M. F. Moura, Devi Parikh, and Dhruv Batra. Visual dialog. , Jul 2017.[90] Mohammadhassan Izadyyazdanabadi, Evgenii Belykh, Claudio Cav-allo, Xiaochun Zhao, Sirin Gandhi, Leandro Borba Moreira, JenniferEschbacher, Peter Nakaji, Mark C. Preul, and Yezhou Yang. Weakly-supervised learning-based feature localization for confocal laser en-domicroscopy glioma images. In Alejandro F. Frangi, Julia A.Schnabel, Christos Davatzikos, Carlos Alberola-L´opez, and GaborFichtinger, editors,

Medical Image Computing and Computer AssistedIntervention – MICCAI 2018 , pages 300–308, Cham, 2018. SpringerInternational Publishing.[91] Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. Neuralmachine translation by jointly learning to align and translate. arXiv ,2014.[92] Peifei Zhu and Masahiro Ogino. Guideline-based additive explanationfor computer-aided diagnosis of lung nodules. In Kenji Suzuki,Mauricio Reyes, Tanveer Syeda-Mahmood, Ben Glocker, Roland Wiest,Yaniv Gur, Hayit Greenspan, and Anant Madabhushi, editors,

Inter-pretability of Machine Intelligence in Medical Image Computing andMultimodal Learning for Clinical Decision Support , pages 39–47,Cham, 2019. Springer International Publishing.[93] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attributionfor deep networks. In

Proceedings of the 34th International Confer-ence on Machine Learning - Volume 70 , ICML17, page 33193328.JMLR.org, 2017.[94] Amirata Ghorbani, Abubakar Abid, and James Zou. Interpretation ofneural networks is fragile, 2017.[95] Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler,Fernanda B. Vigas, and Rory Sayres. Interpretability beyond featureattribution: Quantitative testing with concept activation vectors (tcav).In Jennifer G. Dy and Andreas Krause, editors,

ICML , volume 80of

JMLR Workshop and Conference Proceedings , pages 2673–2682.JMLR.org, 2018.[96] Maithra Raghu, Justin Gilmer, Jason Yosinski, and Jascha Sohl-Dickstein. Svcca: Singular vector canonical correlation analysis for

OURNAL OF L A TEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 19 deep learning dynamics and interpretability. In

Proceedings of the 31stInternational Conference on Neural Information Processing Systems ,NIPS17, page 60786087, Red Hook, NY, USA, 2017. Curran Asso-ciates Inc.[97] N. Tishby and N. Zaslavsky. Deep learning and the informationbottleneck principle. In , pages 1–5, 2015.[98] Ravid Shwartz-Ziv and Naftali Tishby. Opening the black box of deepneural networks via information.

CoRR , abs/1703.00810, 2017.[99] Erdem Varol, Aristeidis Sotiras, Ke Zeng, and Christos Davatzikos.Generative discriminative models for multivariate inference and statis-tical mapping in medical imaging. In Alejandro F. Frangi, Julia A.Schnabel, Christos Davatzikos, Carlos Alberola-L´opez, and GaborFichtinger, editors,

Medical Image Computing and Computer AssistedIntervention – MICCAI 2018 , pages 540–548, Cham, 2018. SpringerInternational Publishing.[100] Guillaume Alain and Yoshua Bengio. Understanding intermediatelayers using linear classiﬁer probes, 2016.[101] Trevor Hastie and Robert Tibshirani. Generalized additive models.

Statist. Sci. , 1(3):297–310, 08 1986.[102] Yin Lou, Rich Caruana, and Johannes Gehrke. Intelligible models forclassiﬁcation and regression. In

Proceedings of the 18th ACM SIGKDDInternational Conference on Knowledge Discovery and Data Mining ,KDD 12, page 150158, New York, NY, USA, 2012. Association forComputing Machinery.[103] Yin Lou, Rich Caruana, Johannes Gehrke, and Giles Hooker. Accurateintelligible models with pairwise interactions. In

Proceedings of the19th ACM SIGKDD International Conference on Knowledge Discoveryand Data Mining , KDD 13, page 623631, New York, NY, USA, 2013.Association for Computing Machinery.[104] Sercan ¨Omer Arik and Tomas Pﬁster. Attention-based prototypicallearning towards interpretable, conﬁdent and robust deep neural net-works.

CoRR , abs/1902.06292, 2019.[105] Nima Hatami, Micha¨el Sdika, and H´el`ene Ratiney. Magnetic res-onance spectroscopy quantiﬁcation using deep learning.

CoRR ,abs/1806.07237, 2018.[106] Stefan Haufe, Frank Meinecke, Kai Grgen, Sven Dhne, John-DylanHaynes, Benjamin Blankertz, and Felix Biemann. On the interpretationof weight vectors of linear models in multivariate neuroimaging.

NeuroImage , 87:96 – 110, 2014.[107] Kristof T. Sch¨utt, Farhad Arbabzadah, Stefan Chmiela, Klaus R. M¨uller,and Alexandre Tkatchenko. Quantum-chemical insights from deeptensor neural networks.

Nature Communications , 8(1):13890, 2017.[108] Kristof T. Sch¨utt, Michael Gastegger, Alexandre Tkatchenko, andKlaus-Robert M¨uller.

Quantum-Chemical Insights from InterpretableAtomistic Neural Networks , pages 311–330. Springer InternationalPublishing, Cham, 2019.[109] Christos Liaskos, Ageliki Tsioliaridou, Shuai Nie, Andreas Pitsillides,Sotiris Ioannidis, and Ian F. Akyildiz. An interpretable neural net-work for conﬁguring programmable wireless environments.

CoRR ,abs/1905.02495, 2019.[110] Barnab´as Bede. Fuzzy systems with sigmoid-based membership func-tions as interpretable neural networks. In Ralph Baker Kearfott, IldarBatyrshin, Marek Reformat, Martine Ceberio, and Vladik Kreinovich,editors,

Fuzzy Techniques: Theory and Applications , pages 157–166,Cham, 2019. Springer International Publishing.[111] Markus Kaiser, Clemens Otte, Thomas A. Runkler, and Carl HenrikEk. Interpretable dynamics models for data-efﬁcient reinforcementlearning.

CoRR , abs/1907.04902, 2019.[112] D. R. Hardoon, S. Szedmak, and J. Shawe-Taylor. Canonical cor-relation analysis: An overview with application to learning methods.

Neural Computation , 16(12):2639–2664, 2004.[113] Anil Hazarika, Mausumi Barthakur, Lachit Dutta, and ManabendraBhuyan. F-svd based algorithm for variability and stability mea-surement of bio-signals, feature extraction and fusion for patternrecognition.

Biomedical Signal Processing and Control , 47:26 – 40,2019.[114] Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, DavidWarde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio.Generative adversarial nets. In Z. Ghahramani, M. Welling, C. Cortes,N. D. Lawrence, and K. Q. Weinberger, editors,

Advances in NeuralInformation Processing Systems 27 , pages 2672–2680. Curran Asso-ciates, Inc., 2014.[115] Martin Arjovsky, Soumith Chintala, and L´eon Bottou. Wasserstein gen-erative adversarial networks. In

Proceedings of the 34th InternationalConference on Machine Learning - Volume 70 , ICML17, page 214223.JMLR.org, 2017. [116] Jun-Yan Zhu, Taesung Park, Phillip Isola, and Alexei A. Efros. Un-paired image-to-image translation using cycle-consistent adversarialnetworks. In

The IEEE International Conference on Computer Vision(ICCV) , Oct 2017.[117] Y. Zhu, S. Suri, P. Kulkarni, Y. Chen, J. Duan, and C. . J. Kuo. Aninterpretable generative model for handwritten digits synthesis. In , pages1910–1914, 2019.[118] Ryen Krusinga, Sohil Shah, Matthias Zwicker, Tom Goldstein, andDavid W. Jacobs. Understanding the (un)interpretability of naturalimage distributions using generative models.

CoRR , abs/1901.01499,2019.[119] A. Karpathy. t-sne visualization of cnn codes, 2014. https://cs.stanford.edu/people/karpathy/cnnembed.[120] Shan Carter, Zan Armstrong, Ludwig Schubert, Ian Johnson, andChris Olah. Exploring neural networks with activation atlases, 2019.https://distill.pub/2019/activation-atlas.[121] Wei Ma, Feng Cheng, Yihao Xu, Qinlong Wen, and Yongmin Liu.Probabilistic representation and inverse design of metamaterials basedon a deep generative model with semi-supervised learning strategy.

Advanced Materials , 31(35), 2019.[122] Yujun Yan, Jiong Zhu, Marlena Duda, Eric Solarz, Chandra Sripada,and Danai Koutra. Groupinn: Grouping-based interpretable neuralnetwork for classiﬁcation of limited, noisy brain data. In

Proceedingsof the 25th ACM SIGKDD International Conference on KnowledgeDiscovery and Data Mining , KDD 19, page 772782, New York, NY,USA, 2019. Association for Computing Machinery.[123] Carlo Bifﬁ, Ozan Oktay, Giacomo Tarroni, Wenjia Bai, Antonio M.Simoes Monteiro de Marvao, Georgia Doumou, Martin Rajchl, ReemBedair, Sanjay K. Prasad, Stuart A. Cook, Declan P. O’Regan, andDaniel Rueckert. Learning interpretable anatomical features throughdeep generative models: Application to cardiac remodeling.

CoRR ,abs/1807.06843, 2018.[124] Mingliang Wang, Daoqiang Zhang, Jiashuang Huang, Dinggang Shen,and Mingxia Liu. Low-rank representation for multi-center autismspectrum disorder identiﬁcation. In Alejandro F. Frangi, Julia A.Schnabel, Christos Davatzikos, Carlos Alberola-L´opez, and GaborFichtinger, editors,

Medical Image Computing and Computer AssistedIntervention – MICCAI 2018 , pages 647–654, Cham, 2018. SpringerInternational Publishing.[125] Ruth Fong and Andrea Vedaldi. Interpretable explanations of blackboxes by meaningful perturbation.

CoRR , abs/1704.03296, 2017.[126] David Alvarez-Melis and Tommi Jaakkola. A causal framework forexplaining the predictions of black-box sequence-to-sequence models.In

Proceedings of the 2017 Conference on Empirical Methods inNatural Language Processing , pages 412–421, Copenhagen, Denmark,September 2017. Association for Computational Linguistics.[127] David Baehrens, Timon Schroeter, Stefan Harmeling, Motoaki Kawan-abe, Katja Hansen, and Klaus-Robert M¨uller. How to explain individualclassiﬁcation decisions.

J. Mach. Learn. Res. , 11:18031831, August2010.[128] Pang Wei Koh and Percy Liang. Understanding black-box predictionsvia inﬂuence functions. In

Proceedings of the 34th International Con-ference on Machine Learning - Volume 70 , ICML17, page 18851894.JMLR.org, 2017.[129] Chih-Kuan Yeh, Joon Sik Kim, Ian E.H. Yen, and Pradeep Ravikumar.Representer point selection for explaining deep neural networks. In

Proceedings of the 32nd International Conference on Neural Infor-mation Processing Systems , NIPS18, page 93119321, Red Hook, NY,USA, 2018. Curran Associates Inc.[130] Cagdas Ulas, Giles Tetteh, Stephan Kaczmarz, Christine Preibisch,and Bjoern H. Menze. Deepasl: Kinetic model incorporated loss fordenoising arterial spin labeled mri via deep residual learning. In Alejan-dro F. Frangi, Julia A. Schnabel, Christos Davatzikos, Carlos Alberola-L´opez, and Gabor Fichtinger, editors,

Medical Image Computing andComputer Assisted Intervention – MICCAI 2018 , pages 30–38, Cham,2018. Springer International Publishing.[131] C. J. Scott, J. Jiao, A. Melbourne, N. Burgos, D. M. Cash, E. De Vita,P. J. Markiewicz, A. O’Connor, D. L. Thomas, P. S. Weston, J. M.Schott, B. F. Hutton, and S. Ourselin. Reduced acquisition time PETpharmacokinetic modelling using simultaneous ASL-MRI: proof ofconcept.

J. Cereb. Blood Flow Metab. , 39(12):2419–2432, Dec 2019.[132] Maxim Pisov, Mikhail Goncharov, Nadezhda Kurochkina, Sergey Mo-rozov, Victor Gombolevskiy, Valeria Chernina, Anton Vladzymyrskyy,Ksenia Zamyatina, Anna Chesnokova, Igor Pronin, Michael Shifrin,and Mikhail Belyaev. Incorporating task-speciﬁc structural knowledgeinto cnns for brain midline shift detection. In Kenji Suzuki, Mauricio

OURNAL OF L A TEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 20

Reyes, Tanveer Syeda-Mahmood, Ben Glocker, Roland Wiest, YanivGur, Hayit Greenspan, and Anant Madabhushi, editors,

Interpretabilityof Machine Intelligence in Medical Image Computing and MultimodalLearning for Clinical Decision Support , pages 30–38, Cham, 2019.Springer International Publishing.[133] Feiyun Zhu, Jun Guo, Zheng Xu, Peng Liao, Liu Yang, and JunzhouHuang. Group-driven reinforcement learning for personalized mhealthintervention. In Alejandro F. Frangi, Julia A. Schnabel, ChristosDavatzikos, Carlos Alberola-L´opez, and Gabor Fichtinger, editors,

Medical Image Computing and Computer Assisted Intervention –MICCAI 2018 , pages 590–598, Cham, 2018. Springer InternationalPublishing.[134] Ozan Kocadagli and Reza Langari. Classiﬁcation of eeg signals forepileptic seizures using hybrid artiﬁcial neural networks based wavelettransforms and fuzzy relations.

Expert Systems with Applications ,88:419 – 434, 2017.[135] Tao Zhang, Wanzhong Chen, and Mingyang Li. Classiﬁcation ofinter-ictal and ictal eegs using multi-basis modwpt, dimensionalityreduction algorithms and ls-svm: A comparative study.

BiomedicalSignal Processing and Control , 47:240 – 251, 2019.[136] Wentian Li, Jane E. Cerise, Yaning Yang, and Henry Han. Applicationof t-sne to human genetic data.

Journal of Bioinformatics andComputational Biology , 15(04):1750017, 2017. PMID: 28718343.[137] W. Xu, X. Jiang, X. Hu, and G. Li. Visualization of geneticdisease-phenotype similarities by multiple maps t-SNE with Laplacianregularization.

BMC Med Genomics , 7 Suppl 2:S1, 2014.[138] Noel C. F. Codella, Chung-Ching Lin, Allan Halpern, Michael Hind,Rogerio Feris, and John R. Smith. Collaborative human-ai (chai):Evidence-based interpretable melanoma classiﬁcation in dermoscopicimages. In Danail Stoyanov, Zeike Taylor, Seyed Mostafa Kia, IpekOguz, Mauricio Reyes, Anne Martel, Lena Maier-Hein, Andre F.Marquand, Edouard Duchesnay, Tommy L¨ofstedt, Bennett Landman,M. Jorge Cardoso, Carlos A. Silva, Sergio Pereira, and Raphael Meier,editors,

Understanding and Interpreting Machine Learning in MedicalImage Computing Applications , pages 97–105, Cham, 2018. SpringerInternational Publishing.[139] Mara Graziani, Vincent Andrearczyk, and Henning M¨uller. Regres-sion concept vectors for bidirectional explanations in histopathology.In Danail Stoyanov, Zeike Taylor, Seyed Mostafa Kia, Ipek Oguz,Mauricio Reyes, Anne Martel, Lena Maier-Hein, Andre F. Marquand,Edouard Duchesnay, Tommy L¨ofstedt, Bennett Landman, M. JorgeCardoso, Carlos A. Silva, Sergio Pereira, and Raphael Meier, editors,

Understanding and Interpreting Machine Learning in Medical ImageComputing Applications , pages 124–132, Cham, 2018. Springer Inter-national Publishing.[140] Hugo Yeche, Justin Harrison, and Tess Berthier. Ubs: A dimension-agnostic metric for concept vector interpretability applied to radiomics.In Kenji Suzuki, Mauricio Reyes, Tanveer Syeda-Mahmood, BenGlocker, Roland Wiest, Yaniv Gur, Hayit Greenspan, and Anant Mad-abhushi, editors,

Interpretability of Machine Intelligence in MedicalImage Computing and Multimodal Learning for Clinical DecisionSupport , pages 12–20, Cham, 2019. Springer International Publishing.[141] Alvaro E. Ulloa Cerna, Marios Pattichis, David P. vanMaanen, LinyuanJing, Aalpen A. Patel, Joshua V. Stough, Christopher M. Haggerty, andBrandon K. Fornwalt. Interpretable neural networks for predictingmortality risk using multi-modal electronic health records.

CoRR ,abs/1901.08125, 2019.[142] Jean-Baptiste Lamy, Boomadevi Sekar, Gilles Guezennec, JacquesBouaud, and Brigitte Sroussi. Explainable artiﬁcial intelligence forbreast cancer: A visual case-based reasoning approach.

ArtiﬁcialIntelligence in Medicine , 94:42 – 53, 2019.[143] Youngwon Choi, Yongchan Kwon, Hanbyul Lee, Beom Joon Kim,Myunghee Cho Paik, and Joong-Ho Won. Ensemble of deep convolu-tional neural networks for prognosis of ischemic stroke. In AlessandroCrimi, Bjoern Menze, Oskar Maier, Mauricio Reyes, Stefan Winzeck,and Heinz Handels, editors,

Brainlesion: Glioma, Multiple Sclerosis,Stroke and Traumatic Brain Injuries , pages 231–243, Cham, 2016.Springer International Publishing.[144] Oskar Maier and Heinz Handels. Predicting stroke lesion and clinicaloutcome with random forests. In Alessandro Crimi, Bjoern Menze,Oskar Maier, Mauricio Reyes, Stefan Winzeck, and Heinz Handels,editors,

Brainlesion: Glioma, Multiple Sclerosis, Stroke and TraumaticBrain Injuries , pages 219–230, Cham, 2016. Springer InternationalPublishing.[145] Been Kim, Caleb M. Chacha, and Julie Shah. Inferring robot taskplans from human team meetings: A generative modeling approachwith logic-based prior. In

Proceedings of the Twenty-Seventh AAAI Conference on Artiﬁcial Intelligence , AAAI13, page 13941400. AAAIPress, 2013.[146] Justin Cheng and Michael S. Bernstein. Flock: Hybrid crowd-machinelearning classiﬁers. In

Proceedings of the 18th ACM Conferenceon Computer Supported Cooperative Work and Social Computing ,CSCW ’15, page 600611, New York, NY, USA, 2015. Associationfor Computing Machinery.[147] L. Kuhlmann, P. Karoly, D. R. Freestone, B. H. Brinkmann, A. Temko,A. Barachant, F. Li, G. Titericz, B. W. Lang, D. Lavery, K. Roman,D. Broadhead, S. Dobson, G. Jones, Q. Tang, I. Ivanenko, O. Panichev,T. Proix, M. N?hl?k, D. B. Grunberg, C. Reuben, G. Worrell, B. Litt,D. T. J. Liley, D. B. Grayden, and M. J. Cook. Epilepsyecosystem.org:crowd-sourcing reproducible seizure prediction with long-term humanintracranial EEG.

Brain , 141(9):2619–2630, 09 2018.[148] M. Wiener, F. T. Sommer, Z. G. Ives, R. A. Poldrack, and B. Litt.Enabling an Open Data Ecosystem for the Neurosciences.

Neuron ,92(4):929, 11 2016.[149] F. Jiang, Y. Jiang, H. Zhi, Y. Dong, H. Li, S. Ma, Y. Wang, Q. Dong,H. Shen, and Y. Wang. Artiﬁcial intelligence in healthcare: past, presentand future.

Stroke Vasc Neurol , 2(4):230–243, Dec 2017.[150] C. K. Cassel and A. L. Jameton. Dementia in the elderly: an analysisof medical responsibility.

Ann. Intern. Med. , 94(6):802–807, Jun 1981.[151] Pat Croskerry, Karen Cosby, Mark L. Graber, and Hardeep Singh.Diagnosis : Interpreting the shadows., 2017.[152] S´ergio Pereira, Raphael Meier, Victor Alves, Mauricio Reyes, and Car-los A. Silva. Automatic brain tumor grading from mri data using con-volutional neural networks and quality assessment. In Danail Stoyanov,Zeike Taylor, Seyed Mostafa Kia, Ipek Oguz, Mauricio Reyes, AnneMartel, Lena Maier-Hein, Andre F. Marquand, Edouard Duchesnay,Tommy L¨ofstedt, Bennett Landman, M. Jorge Cardoso, Carlos A. Silva,Sergio Pereira, and Raphael Meier, editors,

Understanding and Inter-preting Machine Learning in Medical Image Computing Applications ,pages 106–114, Cham, 2018. Springer International Publishing.[153] A. Vilamala, K. H. Madsen, and L. K. Hansen. Deep convolutionalneural networks for interpretable analysis of eeg sleep stage scoring.In , pages 1–6, 2017.[154] Pieter Van Molle, Miguel De Strooper, Tim Verbelen, Bert Vankeirs-bilck, Pieter Simoens, and Bart Dhoedt. Visualizing convolutionalneural networks to improve decision support for skin lesion clas-siﬁcation. In Danail Stoyanov, Zeike Taylor, Seyed Mostafa Kia,Ipek Oguz, Mauricio Reyes, Anne Martel, Lena Maier-Hein, Andre F.Marquand, Edouard Duchesnay, Tommy L¨ofstedt, Bennett Landman,M. Jorge Cardoso, Carlos A. Silva, Sergio Pereira, and Raphael Meier,editors,

Understanding and Interpreting Machine Learning in MedicalImage Computing Applications , pages 115–123, Cham, 2018. SpringerInternational Publishing.[155] N. Prentzas, A. Nicolaides, E. Kyriacou, A. Kakas, and C. Pattichis.Integrating machine learning with symbolic reasoning to build anexplainable ai model for stroke prediction. In ,pages 817–821, 2019.[156] H. Sun, L. Paixao, J. T. Oliva, B. Goparaju, D. Z. Carvalho, K. G.van Leeuwen, O. Akeju, R. J. Thomas, S. S. Cash, M. T. Bianchi, andM. B. Westover. Brain age from the electroencephalogram of sleep.

Neurobiol. Aging , 74:112–120, 02 2019.[157] Fabian Eitel and Kerstin Ritter. Testing the robustness of attributionmethods for convolutional neural networks in mri-based alzheimer’sdisease classiﬁcation. In Kenji Suzuki, Mauricio Reyes, Tanveer Syeda-Mahmood, Ben Glocker, Roland Wiest, Yaniv Gur, Hayit Greenspan,and Anant Madabhushi, editors,

Interpretability of Machine Intelli-gence in Medical Image Computing and Multimodal Learning for Clin-ical Decision Support , pages 3–11, Cham, 2019. Springer InternationalPublishing.[158] Christoph Jansen, Thomas Penzel, Stephan Hodel, Stefanie Breuer,Martin Spott, and Dagmar Krefting. Network physiology in insomniapatients: Assessment of relevant changes in network topology withinterpretable machine learning models.

Chaos: An InterdisciplinaryJournal of Nonlinear Science , 29(12):123129, 2019.[159] Kyle Young, Gareth Booth, Becks Simpson, Reuben Dutton, and SallyShrapnel. Deep neural network or dermatologist? In Kenji Suzuki,Mauricio Reyes, Tanveer Syeda-Mahmood, Ben Glocker, Roland Wiest,Yaniv Gur, Hayit Greenspan, and Anant Madabhushi, editors,

Inter-pretability of Machine Intelligence in Medical Image Computing andMultimodal Learning for Clinical Decision Support , pages 48–55,Cham, 2019. Springer International Publishing.

OURNAL OF L A TEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 21 [160] C. Zucco, H. Liang, G. D. Fatta, and M. Cannataro. Explainablesentiment analysis with applications in medicine. In ,pages 1740–1747, 2018.[161] Curtis P. Langlotz, Bibb Allen, Bradley J. Erickson, JayashreeKalpathy-Cramer, Keith Bigelow, Tessa S. Cook, Adam E. Flan-ders, Matthew P. Lungren, David S. Mendelson, Jeffrey D. Rudie,Ge Wang, and Krishna Kandarpa. A roadmap for foundationalresearch on artiﬁcial intelligence in medical imaging: From the 2018nih/rsna/acr/the academy workshop.

Radiology , 291(3):781–791, 2019.PMID: 30990384.[162] Arieh Gomolin, Elena Netchiporouk, Robert Gniadecki, and Ivan V.Litvinov. Artiﬁcial intelligence applications in dermatology: Where dowe stand?

Frontiers in Medicine , 7:100, 2020.[163] A. J. London. Artiﬁcial Intelligence and Black-Box Medical Decisions:Accuracy versus Explainability.

Hastings Cent Rep , 49(1):15–21, Jan2019.[164] Sadid Hasan, Yuan Ling, Dimeji Farri, Joey Liu, Henning Mller, andMatthew Lungren. Overview of imageclef 2018 medical domain visualquestion answering task. 09 2018.[165] Asma Ben Abacha, Sadid Hasan, Vivek Datla, Joey Liu, Dina Demner-Fushman, and Henning Mller. Vqa-med: Overview of the medicalvisual question answering task at imageclef 2019.

Lecture Notes inComputer Science , 09 2019.[166] Ann-Kathrin Dombrowski, Maximillian Alber, Christopher Anders,Marcel Ackermann, Klaus-Robert M¨uller, and Pan Kessel. Explana-tions can be manipulated and geometry is to blame. In H. Wallach,H. Larochelle, A. Beygelzimer, F. d'Alch´e-Buc, E. Fox, and R. Garnett,editors,

Advances in Neural Information Processing Systems 32 , pages13589–13600. Curran Associates, Inc., 2019.[167] Himabindu Lakkaraju and Osbert Bastani. how do i fool you?.

Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society ,Feb 2020.[168] Yun Liu, Krishna Gadepalli, Mohammad Norouzi, George E. Dahl,Timo Kohlberger, Aleksey Boyko, Subhashini Venugopalan, AlekseiTimofeev, Philip Q. Nelson, Gregory S. Corrado, Jason D. Hipp, LilyPeng, and Martin C. Stumpe. Detecting cancer metastases on gigapixelpathology images.