Explainable Artificial Intelligence Approaches: A Survey
Sheikh Rabiul Islam, William Eberle, Sheikh Khaled Ghafoor, Mohiuddin Ahmed
11 Explainable Artificial Intelligence Approaches: ASurvey
Sheikh Rabiul Islam, University of Hartford, William Eberle, Tennessee Tech University, Sheikh KhaledGhafoor, Tennessee Tech University, Mohiuddin Ahmed, Edith Cowan University
Abstract —The lack of explainability of a decision from an Artificial Intelligence (AI) based “black box” system/model, despite itssuperiority in many real-world applications, is a key stumbling block for adopting AI in many high stakes applications of different domainor industry. While many popular Explainable Artificial Intelligence (XAI) methods or approaches are available to facilitate ahuman-friendly explanation of the decision, each has its own merits and demerits, with a plethora of open challenges. We demonstratepopular XAI methods with a mutual case study/task (i.e., credit default prediction), analyze for competitive advantages from multipleperspectives (e.g., local, global), provide meaningful insight on quantifying explainability, and recommend paths towards responsible orhuman-centered AI using XAI as a medium. Practitioners can use this work as a catalog to understand, compare, and correlatecompetitive advantages of popular XAI methods. In addition, this survey elicits future research directions towards responsible orhuman-centric AI systems, which is crucial to adopt AI in high stakes applications.
Index Terms —Explainable Artificial Intelligence, Explainability Quantification, Human-centered Artificial Intelligence, Interpretability. (cid:70)
NTRODUCTION A RTIFICIAL
Intelligence (AI) has become an integralpart of many real-world applications. Factors fuelingthe proliferation of AI-based algorithmic decision makingin many disciplines include: (1) the demand for process-ing a variety of voluminous data, (2) the availability ofpowerful computing resources (e.g., GPU computing, cloudcomputing), and (3) powerful and new algorithms. How-ever, most of the successful AI-based models are “blackbox” in nature, making it a challenge to understand howthe model or algorithm works and generates decisions.In addition, the decisions from AI systems affect humaninterests, rights, and lives; consequently, the decision iscrucial for high stakes applications such as credit approvalin finance, automated machines in defense, intrusion detec-tion in cybersecurity, etc. Regulators are introducing newlaws such as European Union’s General Data ProtectionRegulation (GDPR) [1] aka “right to explanation” [2], USgovernment’s “Algorithmic Accountability Act of 2019” [3], or U.S. Department of Defense’s Ethical Principles forArtificial Intelligence [4] ) to tackle primarily fairness, ac-countability, and transparency-related risks with automateddecision making systems.XAI is a re-emerging research trend, as the need to ad-vocate these principles/laws, and promote the explainabledecision-making system and research, continues to increase.Explanation systems were first introduced in the early ’80sto explain the decisions of expert systems. Later, the focus ofthe explanation systems shifted towards human-computersystems (e.g., intelligent tutoring systems) to provide bettercognitive support to users. The primary reason for therenewed interest in XAI research has stemmed from recentadvancements in AI and ML, and their application to a wide range of areas, as well as prevailing concerns over theunethical use, lack of transparency, and undesired biasesin the models. Many real-world applications in the Indus-trial Control System (ICS) greatly increase the efficiency ofindustrial production from the automated equipment andproduction processes [5]. However, in this setting, the useof ’black box’ is still not in a favorable position due to thelack of explainability and transparency of the model anddecisions.According to [6] and [7], XAI encompasses MachineLearning (ML) or AI systems/tools for demystifying blackmodels internals (e.g., what the models have learned)and/or for explaining individual predictions. In general,explainability of an AI model’s prediction is the extent oftransferable qualitative understanding of the relationshipbetween model input and prediction (i.e., selective/suitablecauses of the event) in a recipient friendly manner. Theterm “explainability” and “interpretability” are being usedinterchangeably throughout the literature. To this end, inthe case of an intelligent system (i.e., AI-based system), itis evident that explainability is more than interpretabilityin terms of importance, completeness, and fidelity of pre-diction. Based on that, we will use these terms accordinglywhere appropriate.Due to the increasing number of XAI approaches, it hasbecome challenging to understand the pros, cons, and com-petitive advantages, associated with the different domains.In addition, there are lots of variations among different XAImethods, such as whether a method is global (i.e., explainsthe model’s behavior on the entire data set), local (i.e.,explains the prediction or decision of a particular instance),ante-hoc (i.e. involved in the pre training stage), post-hoc(i.e. works on already trained model), or surrogate (i.e.deploys a simple model to emulate the prediction of a“black box” model). However, despite many reviews on XAImethods, there is still a lack of comprehensive analysis of a r X i v : . [ c s . A I] J a n XAI when it comes to these methods and perspectives.Some of the popular work/tools on XAI are LIME,DeepVis Toolbox, TreeInterpreter, Keras-vis, Microsoft Inter-pretML, MindsDB, SHAP, Tensorboard WhatIf, Tensorflow’sLucid, Tensorflow’s Cleverhans, etc. However, a few of thesework/tools are model specific. For instance, DeepVis, keras-vis, and Lucid are for a neural network’s explainability, andTreeInterpreter is for a tree-based model’s explainability. Ata high level, each of the proposed approaches have similarconcepts, such as feature importance, feature interactions,shapely values, partial dependence, surrogate models, coun-terfactual, adversarial, prototypes and knowledge infusion.However, despite some visible progress in XAI methods,the quantification or evaluation of explainability is under-focused, and in particular, when it comes to human study-based evaluations.In this paper, we (1) demonstrate popular meth-ods/approaches towards XAI with a mutual task (i.e., creditdefault prediction) and explain the working mechanism inlayman’s terms, (2) compare the pros, cons, and competitiveadvantages of each approach with their associated chal-lenges, and analyze those from multiple perspectives (e.g.,global vs local, post-hoc vs ante-hoc, and inherent vs emu-lated/approximated explainability), (3) provide meaningfulinsight on quantifying explainability, and (4) recommend apath towards responsible or human-centered AI using XAIas a medium. Our survey is only one among the recent ones(See Table 1) which includes a mutual test case with usefulinsights on popular XAI methods (See Table 4).
TABLE 1Comparison with other Surveys
Survey Reference Mutual test case
Adadi et al., 2018 [8] × Mueller et al., 2019 [9] × Samek et al., 2017 [6] × Molnar et al., 2019 [10] × Staniak et al., 2018 [11] × Gilpin et al., 2018 [12] × Collaris et al., 2018 [13] × Ras et al., 2018 [1] × Dosilovic et al., 2018 [14] × Tjoa et al., 2019 [15] × Dosi-Valez et al., 2017 [16] × Rudin et al., 2019 [17] × Arrieta et al., 2020 [18] × Miller et al., 2018 [19] × Zhang et al., 2018 [20] × This Survey (cid:88)
We start with a background of related works (Section2), followed by a description of the test case in Section3, and then a review of XAI methods in Section 4. Weconclude with an overview of quantifying explainabilityand a discussion addressing open questions and futureresearch directions towards responsible for human-centeredAI in Section 5.
ACKGROUND
Research interests in XAI are re-emerging. The earlier workssuch as [21], [22], and [23] focused primarily on explain-ing the decision process of knowledge-based systems andexpert systems. The primary reason behind the renewed interest in XAI research has stemmed from the recent ad-vancements in AI, its application to a wide range of areas,the concerns over unethical use, lack of transparency, andundesired biases in the models. In addition, recent lawsby different governments are necessitating more researchin XAI. According to [6] and [7], XAI encompasses MachineLearning (ML) or AI systems for demystifying black modelsinternals (e.g., what the models have learned) and/or forexplaining individual predictions.In 2019, Mueller et al. presents a comprehensive reviewof the approaches taken by a number of types of “explana-tion systems” and characterizes those into three generations:(1) first-generation systems—for instance, expert systemsfrom the early 70’s, (2) second generation systems—for in-stance, intelligent tutoring systems, and (3) third generationsystems—tools and techniques from the recent renaissancestarting from 2015 [9]. The first generation systems attemptto clearly express the internal working process of the systemby embedding expert knowledge in rules often eliciteddirectly from experts (e.g., via transforming rules into natu-ral language expressions). The second generations systemscan be regarded as the human-computer system designedaround human knowledge and reasoning capacities to pro-vide cognitive support. For instance, arranging the interfacein such a way that complements the knowledge that theuser is lacking. Similar to the first generation systems, thethird generation systems also attempt to clarify the innerworkings of the systems. But this time, these systems aremostly “black box” (e.g., deep nets, ensemble approaches).In addition, nowadays, researchers are using advancedcomputer technologies in data visualizations, animation,and video, that have a strong potential to drive the XAIresearch further. Many new ideas have been proposed forgenerating explainable decisions from the need of primarilyaccountable, fair, and trust-able systems and decisions.There has been some previous work [10] that mentionsthree notions for quantification of explainability. Two outof three notions involve experimental studies with humans(e.g., domain expert or a layperson, that mainly investigatewhether a human can predict the outcome of the model)[24], [25], [26], [27], [28]. The third notion (proxy tasks) doesnot involve a human, and instead uses known truths as ametric (e.g., the less the depth of the decision tree, the moreexplainable the model).Some mentionable reviews on XAI are listed in Table1. However, while these works provide analysis from oneor more of the mentioned perspectives, a comprehensivereview considering all of the mentioned important perspec-tives, using a mutual test case, is still missing. Therefore,we attempt to provide an overview using a demonstrationof a mutual test case or task, and then analyze the variousapproaches from multiple perspectives, with some future di-rections of research towards responsible or human-centeredAI.
EST C ASE
The mutual test case or task that we use in this paperto demonstrate and evaluate the XAI methods is creditdefault prediction . This mutual test case enables a betterunderstanding of the comparative advantages of different
XAI approaches. We predict whether a customer is goingto default on a mortgage payment (i.e., unable to paymonthly payment) in the near future or not, and explain thedecision using different XAI methods in a human-friendlyway. We use the popular Freddie Mac [29] dataset for theexperiments. Table 2 lists some important features and theirdescriptions. The description of features are taken from thedata set’s [29] user guide.We use well-known programming language R’s package“iml” [30] for producing the results for the XAI methodsdescribed in this review.
XPLAINABLE A RTIFICIAL I NTELLIGENCE M ETHODS
This section summarizes different explainability methodswith their pros, cons, challenges, and competitive advan-tages primarily based on two recent comprehensive surveys:[31] and [16]. We then enhance the previous surveys with amulti-perspective analysis, recent research progresses, andfuture research directions. [16] broadly categorize methodsfor explanations into three kinds: Intrinsically InterpretableMethods, Model Agnostic Methods, and Example-BasedExplanations.
The convenient way to achieve explainable results is tostick with intrinsically interpretable models such as Lin-ear Regression, Logistic Regression, and Decision Trees byavoiding the use of “black box” models. However, usually,this natural explainability comes with a cost in performance.In a
Linear Regression , the predicted target consistsof the weighted sum of input features. So the weight orcoefficient of the linear equation can be used as a medium ofexplaining prediction when the number of features is small. y = b + b ∗ x + ... + b n ∗ x n + (cid:15) (1)In Formula 1, y is the target (e.g., chances of credit default),b is a constant value known as the intercept (e.g., .33), b i is the learned feature’s weight or coefficient (e.g., .33) forthe corresponding feature x i (e.g., credit score), and (cid:15) is aconstant error term (e.g., .0001). Linear regression comeswith an interpretable linear relationship among features.However, in cases where there are multiple correlated fea-tures, the distinct feature influence becomes indeterminableas the individual influences in prediction are not additive tothe overall prediction anymore. Logistic Regression is an extension of Linear Regressionto the classification problems. It models the probabilities forclassification tasks. The interpretation of Logistic Regressionis different from Linear Regression as it gives a probabilitybetween 0 and 1, where the weight might not exactly rep-resent the linear relationship with the predicted probability.However, the weight provides an indication of the directionof influence (negative or positive) and a factor of influencebetween classes, although it is not additive to the overallprediction.
Decision Tree-based models split the data multipletimes based on a cutoff threshold at each node until itreaches a leaf node. Unlike Logistic and Linear Regression, it works even when the relationship between input andoutput is non-linear, and even when the features interactwith one another (i.e., a correlation among features). In aDecision Tree, a path from the root node (i.e., starting node)(e.g., credit score in Figure 1) to a leaf node (e.g., default)tells how the decision (the leaf node) took place. Usually, thenodes in the upper-level of the tree have higher importancethan lower-level nodes. Also, the less the number of levels(i.e., height) a tree has, the higher the level of explainabilitythe tree possesses. In addition, the cutoff point of a node inthe Decision Trees provides counterfactual information—forinstance, increasing the value of a feature equal to the cutoffpoint will reverse the decision/prediction. In Figure 1, if thecredit score is greater than the cutoff point 748, then thecustomer is predicted as non-default. Also, tree-based ex-planations are contrastive, i.e., a ”what if” analysis providesthe relevant alternative path to reach a leaf node. Accordingto the tree in Figure 1, there are two separate paths (creditscore → delinquency → non-default; and credit score → non-default) that lead to a non-default classification.However, tree-based explanations cannot express thelinear relationship between input features and output. Italso lacks smoothness; slight changes in input can have a bigimpact on the predicted output. Also, there can be multipledifferent trees for the same problem. Usually, the more thenodes or depth of the tree, the more challenging it is tointerpret the tree. Fig. 1. Decision Trees
Decision Rules (simple IF-THEN-ELSE conditions) arealso an inherent explanation model. For instance, ”IF creditscore is less than or equal to 748 AND if the customer isdelinquent on payment for more than zero days (condition),THEN the customer will default on payment (prediction)”.Although IF-THEN rules are straightforward to interpret,it is mostly limited to classification problems (i.e., does notsupport a regression problem), and inadequate in describinglinear relationships. In addition, the
RuleFit algorithm [32]has an inherent interpretation to some extent as it learnssparse linear models that can detect the interaction effectsin the form of decision rules. Decision rules consist of thecombination of split decisions from each of the decisionpaths. However, besides the original features, it also learnssome new features to capture the interaction effects of
TABLE 2Dataset description
Feature Description creditScore A number in between 300 and 850 that indicates the creditworthiness of the borrowers.originalUPB Unpaid principle balance on the note date.originalInterestRate Original interest rate as indicated by the mortgage note.currentLoanDelinquencyStatus Indicates the number of days the borrower is delinquent.numberOfBorrower Number of borrower who are obligated to repay the loan.currentInterestRate Active interest rate on the note.originalCombinedLoanToValue Ratio of all mortgage loans and apprised price of mortgaged property on the note date.currentActualUPB Unpaid principle balance as of latest month of payment.defaulted Whether the customer was default on payment (1) or not (0.) original features. Usually, interpretability degrades with anincreasing number of features.Other interpretable models include the extension of lin-ear models such as
Generalized Linear Models (GLMs) and
Generalized Additive Models (GAMs) ; they help todeal with some of the assumptions of linear models (e.g.,the target outcome y and given features follow a GaussianDistribution; and no interaction among features). However,these extensions make models more complex (i.e., addedinteractions) as well as less interpretable. In addition, a
Na¨ıve Bayes Classifier based on Bayes Theorem, where theprobability of classes for each of the features is calculatedindependently (assuming strong feature independence), and
K-Nearest Neighbors , which uses nearest neighbors of adata point for prediction (regression or classification), alsofall under intrinsically interpretable models.
Model-agnostic methods separate explanation from a ma-chine learning model, allowing the explanation method tobe compatible with a variety of models. This separation hassome clear advantages such as (1) the interpretation methodcan work with multiple ML models, (2) provides differentforms of explainability (e.g., visualization of feature impor-tance, linear formula) for a particular model, and (3) allowsfor a flexible representation—a text classifier uses abstractword embedding for classification but uses actual wordsfor explanation. Some of the model-agnostic interpretationmethods include Partial Dependence Plot (PDP), IndividualConditional Expectation (ICE), Accumulation Local Effects(ALE) Plot, Feature Interaction, Feature Importance, GlobalSurrogate, Local Surrogate (LIME), and Shapley Values(SHAP).
The partial Dependence Plot (PDP) or PD plot shows themarginal effect of one or two features (at best three featuresin 3-D) on the predicted outcome of an ML model [33]. It isa global method, as it shows an overall model behavior, andis capable of showing the linear or complex relationshipsbetween target and feature(s). It provides a function thatdepends only on the feature(s) being plotted by marginal-izing over other features in such a way that includes theinteractions among them. PDP provides a clear and causalinterpretation by providing the changes in prediction dueto changes in particular features. However, PDP assumesfeatures under the plot are not correlated with the remaining features. In the real world, this is unusual. Furthermore,there is a practical limit of only two features that PD plotcan clearly explain at a time. Also, it is a global method, as itplots the average effect (from all instances) of a feature(s) onthe prediction, and not for all features on a specific instance.The PD plot in Figure 2 shows the effect of credit score onprediction. Individual bar lines along the X axis representthe frequency of samples for different ranges of credit scores.
Fig. 2. Partial Dependence Plot (PDP)
Unlike PDP, ICE plots one line per instance showing howa feature influences the changes in prediction (See Figure3. The average on all lines of an ICE plot gives a PD plot[34] (i.e., the single line shown in the PD plot in Figure 2).Figure 4, combines both PDP and ICE together for a betterinterpretation.Although ICE curves are more intuitive to understandthan a PD plot, it can only display one feature meaningfullyat a time. In addition, it also suffers from the problem ofcorrelated features and overcrowded lines when there aremany instances.
Similar to PD plots (Figure 2, ALE plots (Figure 5 describehow features influence the prediction on average. However,
Fig. 3. Individual Conditional Expectation (ICE)Fig. 4. PDP and ICE combined together in the same plot unlike PDP, ALE plot reasonably works well with correlatedfeatures and is comparatively faster. Although ALE plot isnot biased to the correlated features, it is challenging to in-terpret the changes in prediction when features are stronglycorrelated and analyzed in isolation. In that case, only plotsshowing changes in both correlated features together makesense to understand the changes in the prediction.
When the features interact with one another, individualfeature effects do not sum up to the total feature effectsfrom all features combined. An H-statistic (i.e., Friedman’sH-statistic) helps to detect different types of interaction,even with three or more features. The interaction strengthbetween two features is the difference between the partialdependence function for those two features together and the sum
Fig. 5. Accumulated Local Effects (ALE) Plot of the partial dependence functions for each feature separately .Figure 6 shows the interaction strength of each partici-pating feature. For example, current Actual UPB has thehighest level of interaction with other features, and creditscore has the least interaction with other features. However,calculating feature interaction is computationally expensive.Furthermore, using sampling instead of the entire datasetusually shows variances from run to run. 6,
Fig. 6. Feature interaction
Usually, the feature importance of a feature is the increasein the prediction error of the model when we permute the values of the feature to break the true relationshipbetween the feature and the true outcome. After shufflingthe values of the feature, if errors increase, then the featureis important. [35] introduced the permutation-based featureimportance for Random Forests; later [36] extended thework to a model-agnostic version. Feature importance pro-vides a compressed and global insight into the ML model’sbehavior. For example, Figure 7 shows the importance ofeach participating feature, current Actual UPB possess thehighest feature importance, and credit score possess the low-est feature importance. Although feature importance takesinto account both the main feature effect and interaction,this is a disadvantage as feature interaction is included inthe importance of correlated features. We can see that thefeature current Actual UPB possesses the highest featureimportance (Figure 7), at the same time it also possesses thehighest interaction strength 6. As a result, in the presenceof interaction among features, the feature importance doesnot add up to total drop-in of performance. Besides, it isunclear whether the test set or training set should be usedfor feature importance, as it demonstrates variance from runto run in the shuffled dataset. It is necessary to mention thatfeature importance also falls under the global methods.
Fig. 7. Feature importance
A global surrogate model tries to approximate the overallbehavior of a “black box” model using an interpretable MLmodel. In other words, surrogate models try to approxi-mate the prediction function of a black-box model usingan interpretable model as correctly as possible, given theprediction is interpretable. It is also known as a meta-model,approximate model, response surface model, or emulator.We approximate the behavior of a Random Forest usingCART decision trees (Figure 8). The original black box model could be avoided given the surrogate model demon-strates a comparable performance. Although a surrogatemodel comes with interpretation and flexibility (i.e., suchas model agnosticism), diverse explanations for the same“black box” such as multiple possible decision trees withdifferent structures, is a drawback. Besides, some wouldargue that this is only an illusion of interpretability.
Fig. 8. Global surrogate
Unlike global surrogate, local surrogate explains individ-ual predictions of black-box models. Local InterpretableModel-Agnostic Explanations (LIME) was proposed by [37].Lime trains an inherently interpretable model (e.g., DecisionTrees) on a new dataset made from the permutation ofsamples and the corresponding prediction of the black box.Although the learned model can have a good approximationof local behavior, it does not have a good global approxi-mation. This trait is also known as local fidelity. Figure 9is a visualization of the output from LIME. For a randomsample, the black box predicts that a customer will defaulton payment with a probability of 1; the local surrogatemodel, LIME also predict that the customer will defaulton the payment, however, the probability is 0.99, that islittle less than the black box models prediction. LIME alsoshows which feature contributes to the decision makingand by how much. Furthermore, LIME allows replacing theunderlying “black box” model by keeping the same localinterpretable model for the explanation. In addition, LIMEworks for tabular data, text, and images. As LIME is an ap-proximation model, and the local model might not cover thecomplete attribution due to the generalization (e.g., usingshorter trees, lasso optimization), it might be unfit for caseswhere we legally need complete explanations of a decision.Furthermore, there is no consensus on the boundary of theneighborhood for the local model; sometimes, it providesvery different explanations for two nearby data points.
Fig. 9. Local Interpretable Model-Agnostic Explanations (LIME)
Shapley is another local explanation method. In 1953, Shap-ley [38] coined the Shapley Value. It is based on coalitionalgame theory that helps to distribute feature importanceamong participating features fairly. Here the assumption isthat each feature value of the instance is a player in a game,and the prediction is the overall payout that is distributedamong players (i.e., features) according to their contributionto the total payout (i.e., prediction). We use Shapely val-ues (See Figure 10) to analyze the prediction of a randomforest model for the credit default prediction problem. Theactual prediction for a random sample is 1.00, the averageprediction from all samples in the data set is 0.53, andtheir difference .47 (1.00 − Current Actual UPB contributes 0.36). The Shapely Value is the average contri-bution in prediction over all possible coalition of features,which make it computationally expensive when there is alarge number of features—for example, for k number offeatures, there will be 2 k number of coalitions. Unlike LIME,Shapely Value is an explanation method with a solid theorythat provides full explanations. However, it also suffers fromthe problem of correlated features. Furthermore, the Shapelyvalue returns a single value per feature; there is no wayto make a statement about the changes in output resultingfrom the changes in input. One mentionable implementationof the Shapely value is in the work of [39] that they callSHAP. The Break Down package provides the local explanation andis loosely related to the partial dependence algorithm withan added step-wise procedure known as “Break Down”(proposed by [11]). It uses a greedy strategy to identify and remove features iteratively based on their influenceon the overall average predicted response (baseline) [40].For instance, from the game theory perspective, it startswith an empty team, then adds feature values one by onebased on their decreasing contribution. In each iteration, theamount of contribution from each feature depends on thefeatures values of those are already in the team, which isconsidered as a drawback of this approach. However, it isfaster than the Shapley value method due to the greedyapproach, and for models without interactions, the resultsare the same [31]. Figure 11 is a visualization of break down for a random sample, showing contribution (positive ornegative) from each of the participating features towardsthe final prediction.
Example-Based Explanation methods use particular in-stances from the dataset to explain the behavior of the modeland the distribution of the data in a model agnostic way.It can be expressed as “X is similar to Y and Y caused Z,so the prediction says X will cause Z”. According to [31],a few explanation methods that fall under Example-BasedExplanations are described as follows:
The counterfactual method indicates the required changesin the input side that will have significant changes (e.g.,reverse the prediction) in the prediction/output. Coun-terfactual explanations can explain individual predictions.For instance, it can provide an explanation that describescausal situations such as “If A had not occurred, B wouldnot have occurred”. Although counterfactual explanationsare human-friendly, it suffers from the “Rashomon effect”,where each counterfactual explanation tells a different storyto reach a prediction. In other words, there are multipletrue explanations (counterfactual) for each instance levelprediction, and the challenge is how to choose the best one.The counterfactual methods do not require access to dataor models and could work with a system that does not usemachine learning at all. In addition, this method does notwork well for categorical variables with many values. Forinstance, if the credit score of customer 5 (from Table 3) canbe increased to 749 (similar to the credit score of customer 6)from 748, given other features values remain unchanged, thecustomer will not default on a payment. In short, there canbe multiple different ways to tune feature values to makecustomers move from non-default to default, or vice versa.Traditional explanation methods are mostly based onexplaining correlation rather than causation. Moraffah et al.[41] focus on the causal interpretable model that explainsthe possible decision under different situations such asbeing trained with different inputs or hyperparameters. Thiscausal interpretable approach share concept of counterfac-tual analysis as both work on causal inference. Their workalso suggests possible use in fairness criteria evaluation ofdecisions.
An adversarial technique is capable of flipping the decisionusing counterfactual examples to fool the machine learner
Fig. 10. Shapely values TABLE 3Example-Based Explanations
Customer Delinquency Credit score Defaulted1 162 680 yes2 149 691 yes3 6 728 yes4 6 744 yes5 0 748 yes6 0 749 no7 0 763 no8 0 790 no9 0 794 no10 0 806 no (i.e., small intentional perturbations in input to make a falseprediction). However, adversarial examples could help todiscover hidden vulnerabilities as well as to improve themodel. For instance, an attacker can intentionally designadversarial examples to cause the AI system to make amistake (i.e., fooling the machine), which poses greaterthreats to cyber-security and autonomous vehicles. As anexample, the credit default prediction system can be fooledfor customer 5, just by increasing the credit score by 1 (seeTable 3), leading to a reversed prediction.Hartl et al. [42] emphasize on understanding the implica- tions of adversarial samples on Recurrent Neural Network(RNNs) based IDS because RNNs are good for sequentialdata analysis, and network traffic exhibits some sequentialpatterns. They find that adversarial the adversarial train-ing procedure can significantly reduce the attack surface.Furthermore, [43] apply an adversarial approach to findingminimum modification of the input features of an intrusiondetection system needed to reverse the classification of themisclassified instance. Besides satisfactory explanations ofthe reason for misclassification, their approach work pro-vide further diagnosis capabilities.
Prototypes consist of a selected set of instances that rep-resent the data very well. Conversely, the set of instancesthat do not represent data well are called criticisms [44].Determining the optimal number of prototypes and crit-icisms are challenging. For example, customers 1 and 10from Table 3 can be treated as prototypes as those are strongrepresentatives of the corresponding target. On the otherhand, customers 5 and 6 (from Table 3) can be treated as acriticism as the distance between the data points is minimal,and they might be classified under either class from run torun of the same or different models.
Fig. 11. Breakdown
Influential instances are data points from the training setthat are influential for prediction and parameter determina-tion of the model. While it helps to debug the model and un-derstand the behavior of the model better, determining theright cutoff point to separate influential or non-influentialinstances is challenging. For example, based on the valuesof feature credit score and delinquency, customers 1, 2, 9,and 10 from Table 3 can be treated as influential instances asthose are strong representatives of the corresponding target.On the other hand, customers 5 and 6 are not influential in-stances, as those would be in the margin of the classificationdecision boundary.
The prediction of the k-nearest neighbor model can beexplained with the k-neighbor data points (neighbors thosewere averaged to make the prediction). A visualization ofthe individual cluster containing similar instances providesan interpretation of why an instance is a member of aparticular group or cluster. For example, in Figure 12, thenew sample (black circle) is classified according to the otherthree (3-nearest neighbor) nearby samples(one gray, twowhite). This visualization gives an interpretation of why aparticular sample is part of a particular class.Table 4 summarizes the explainability methods from theperspective of (A) whether the method approximates the
Fig. 12. KNN model behavior (i.e., creates an illusion of interpretability)or finds actual behavior, (B) whether the method alone isinherently interpretable or not, (C) whether the interpre-tation method is ante-hoc, that is, it incorporates explain- ability into a model from the beginning, or post-hoc, whereexplainability is incorporated after the regular training ofthe actual model (i.e., testing time), (D) whether the methodis model agnostic (i.e., works for any ML model) or specificto an algorithm, and (E) whether the model is local, provid-ing instance-level explanations, or global, providing overallmodel behavior.Our analysis says there is a lack of an explainabilitymethod (i.e., a gap in the literature), which is, at the sametime actual and direct (i.e., does not create an illusion ofexplainability by approximating the model), model agnostic,and local, such that it utilizes the full potential of theexplainability method in different applications. There aresome recent works that bring external knowledge and infusethat into the model for better interpretation. These XAImethods have the potential to fill the gap to some extent byincorporating domain knowledge into the model in a modelagnostic and transparent way (i.e., not by illusion). Chen et al. [45] introduce an instance-wise feature selectionas a methodology for model interpretation where the modellearns a function to extract a subset of most informativefeatures for a particular instance. The feature selector at-tempt to maximize the mutual information between selectedfeatures and response variables. However, their approach ismostly limited to posthoc approaches.In a more recent work, [46] study explainable ML us-ing information theory where they quantify the effect ofan explanation by the conditional mutual information be-tween the explanation and prediction considering user back-ground. Their approach provides personalized explanationbased on the background of the recipient, for instance, adifferent explanation for those who know linear algebra andthose who don’t. However, this work is yet to be consideredas a comprehensive approach which considers a variety ofuser and their explanation needs. To understand the flowof information in a Deep Neural Network (DNN), [47]analyzed different gradient-based attribution methods thatassign an attribution value (i.e., contribution or relevance) toeach input feature (i.e., neuron) of a network for each out-put neurons. They use a heatmap for better visualizationswhere a particular color represents features that contributepositively to the activation of target output, and anothercolor for features that suppress the effect on it.A survey on the visual representation of ConvolutionalNeural Networks (CNNs), by [20], categorizes works basedon a) visualization of CNN representations in intermediatenetwork layers, b) diagnosis of CNN representation forfeature space of different feature categories or potentialrepresentation flaws, c) disentanglement of “the mixture ofpatterns” encoded in each filter of CNNs, d) interpretableCNNs, and e) semantic disentanglement of CNN represen-tations.In the industrial control system, an alarm from theintrusion/anomaly detection system has a very limited roleunless the alarm can be explained with more information.[5] design a layer-wise relevance propagation method forDNN to map the abnormalities between the calculation pro-cess and features. This process helps to compare the normal samples with abnormal samples for better understandingwith detailed information. [48] propose a concept attribution-based approach (i.e.,sensitivity to the concept) that provides an interpretationof the neural network’s internal state in terms of human-friendly concepts. Their approach,
Testing with CAV (TCAV) ,quantifies the prediction’s sensitivity to a high dimensionalconcept. For example, a user-defined set of examples thatdefines the concept ’striped’, TCAV can quantify the in-fluence of ’striped’ in the prediction of ’zebra’ as a singlenumber. However, their work is only for image classificationand falls under the post-modeling notion (i.e., post-hoc) ofexplanation.[49] propose a knowledge-infused learning that mea-sures information loss in latent features learned by theneural networks through Knowledge Graphs (KGs). Thisexternal knowledge incorporation (via KGs) aids in super-vising the learning of features for the model. Althoughmuch work remains, they believe that (KGs) will play acrucial role in developing explainable AI systems.[50] and [51] infuse popular domain principles from thedomain in the model and represent the output in terms ofthe domain principle for explainable decisions. In [50], fora bankruptcy prediction problem they use the 5C’s of creditas the domain principle which is commonly used to analyzekey factors: character (reputation of the borrower/firm),capital (leverage), capacity (volatility of the borrower’s earn-ings), collateral (pledged asset) and cycle (macroeconomicconditions) [52], [53]. In [51], for an intrusion detection andresponse problem, they incorporate the CIA principles intothe model; C stands for confidentiality —concealment of infor-mation or resources, I stands for integrity —trustworthinessof data or resources, and A stands for availability —abilityto use the information or resource desired [54]. In bothcases, the infusion of domain knowledge leads to betterexplainability of the prediction with negligible compromisesin performance. It also comes with better execution time anda more generalized model that works better with unknownsamples.Although these works [50], [51] come with unique com-binations of merits such as model agnosticism, the capabilityof both local and global explanation, and authenticity ofexplanation—simulation or emulation free, they are stillnot fully off-the-shelf systems due to some domain-specificconfiguration requirements. Much work still remains andneeds further attention. UANTIFYING E XPLAINABILITY AND F UTURE R ESEARCH D IRECTIONS
The quantification or evaluation of explainability is an openchallenge. There are two primary directions of research to-wards the evaluation of explainability of an AI/ML model:(1) model complexity-based, and (2) human study-based. TABLE 4Comparison of different explainability methods from a set of key perspectives (approximation or actual; inherent or not; post-hoc or ante-hoc;model-agnostic or model specific; and global or local)
Method Approx. Inherent Post/Ante Agnos./Spec. Global/LocalLinear/Logistic Regression No Yes Ante Specific BothDecision Trees No Yes Ante Specific BothDecision Rules No Yes Ante Specific Bothk-Nearest Neighbors No Yes Ante Specific BothPartial Dependence Plot (PDP) Yes No Post Agnostic GlobalIndividual Conditional Expectation (ICE) Yes No Post Agnostic BothAccumulated Local Effects (ALE) Plot Yes No Post Agnostic GlobalFeature Interaction No Yes Both Agnostic GlobalFeature Importance No Yes Both Agnostic GlobalGlobal Surrogate Yes No Post Agnostic GlobalLocal Surrogate (LIME) Yes No Post Agnostic LocalShapley Values (SHAP) Yes No Post Agnostic LocalBreak Down Yes No Post Agnostic LocalCounterfactual explanations Yes No Post Agnostic LocalAdversarial examples Yes No Post Agnostic LocalPrototypes Yes No Post Agnostic LocalInfluential instances Yes No Post Agnostic Local
In the literature, model complexity and (lack of) modelinterpretability are often treated as the same [10]. For in-stance, in [55], [56], model size is often used as a measure ofinterpretability (e.g., number of decision rules, depth of thetree, number of non-zero coefficients).[56] propose a scalable Bayesian Rule List (i.e., proba-bilistic rule list) consisting of a sequence of IF-THEN rules,identical to a decision list or one-sided decision tree. Unlikethe decision tree that uses greedy splitting and pruning,their approach produces a highly sparse and accurate rulelist with a balance between interpretability, accuracy, andcomputation speed. Similarly, the work of [55] is also rule-based. They attempt to evaluate the quality of the rulesusing a rule learning algorithm by: the observed coverage,which is the number of positive examples covered by therule, which should be maximized to explain the trainingdata well; and consistency, which is the number of negativeexamples covered by the rule, which should be minimizedto generalize well to unseen data.According to [57], while the number of features and thesize of the decision tree are directly related to interpretabil-ity, the optimization of the tree size or features (i.e., featureselection) is costly as it requires the generation of a largeset of models and their elimination in subsequent steps.However, reducing the tree size (i.e., reducing complexity)increases error, as they could not find a way to formulatethe relation in a simple functional form. More recently, [10]attempts to quantify the complexity of the arbitrary machinelearning model with a model agnostic measure. In thatwork, the author demonstrates that when the feature in-teraction (i.e., the correlation among features) increases, thequality of representations of explainability tools degrades.For instance, the explainability tool ALE Plot (see Figure5 starts to show harsh lines (i.e., zigzag lines) as featureinteraction increases. In other words, with more interactioncomes a more combined influence in the prediction, inducedfrom different correlated subsets of features (at least two), which ultimately makes it hard to understand the causalrelationship between input and output, compared to anindividual feature influence in the prediction. In fact, fromour study of different explainability tools (e.g., LIME, SHAP,PDP), we have found that the correlation among features isa key stumbling block to represent feature contribution ina model agnostic way. Keeping the issue of feature inter-actions in mind, [10] propose a technique that uses threemeasures: number of features, interaction strength amongfeatures, and the main effect (excluding the interaction part)of features, to measure the complexity of a post-hoc modelfor explanation.Although, [10] mainly focuses on model complexity forpost-hoc models, their work was a foundation for the ap-proach by [58] for the quantification of explainability. Theirapproach to quantify explainability is model agnostic andis for a model of any notion (e.g., pre-modeling, post-hoc)using proxy tasks that do not involve a human. Instead,they use known truth as a metric (e.g., the less number offeatures, the more explainable the model). Their proposedformula for explainability gives a score in between 0 and 1for explainability based on the number of cognitive chunks(i.e., individual pieces of information) used on the input sideand output side, and the extent of interaction among thosecognitive chunks.
The following works deal with the application-level andhuman-level evaluation of explainability involving humanstudies.[26] investigate the suitability of different alternativerepresentation formats (e.g., decision tables, (binary) deci-sion trees, propositional rules, and oblique rules) for clas-sification tasks primarily focusing on the explainability ofresults rather than accuracy or precision. They discover thatdecision tables are the best in terms of accuracy, responsetime, the confidence of answer, and ease of use.[24] argue that interpretability is not an absolute con-cept; instead, it is relative to the target model, and may or may not be relative to the human. Their finding suggeststhat a model is readily interpretable to a human when it usesno more than seven pieces of information [59]. Although,this might vary from task to task and person to person.For instance, a domain expert might consume a lot moredetailed information depending on their experience.The work of [27] is a human-centered approach, focus-ing on previous work on human trust in a model frompsychology, social science, machine learning, and human-computer interaction communities. In their experiment withhuman subjects, they vary factors (e.g., number of features,whether the model internals are transparent or a black box)that make a model more or less interpretable and measureshow the variation impacts the prediction of human subjects.Their results suggest that participants who were shown atransparent model with a small number of features weremore successful in simulating the model’s predictions andtrusted the model’s predictions.[25] investigate interpretability of a model based on twoof its definitions: simulatability, which is a user’s ability topredict the output of a model on a given input; and “whatif” local explainability, which is a user’s ability to predictchanges in prediction in response to changes in input, giventhe user has the knowledge of a model’s original predictionfor the original input. They introduce a simple metric called runtime operation count that measures the interpretability,that is, the number of operations (e.g., the arithmetic opera-tion for regression, the boolean operation for trees) neededin a user’s mind to interpret something. Their findingssuggest that interpretability decreases with an increase inthe number of operations.Despite some progress, there are still some open chal-lenges surrounding explainability such as an agreement ofwhat an explanation is and to whom; a formalism for theexplanation; and quantifying the human comprehensibilityof the explanation. Other challenges include addressingmore comprehensive human studies requirements and in-vestigating the effectiveness among different approaches(e.g., supervised, unsupervised, semi-supervised) for var-ious application areas (e.g., natural language processing,image recognition). The long term goal for current AI initiatives is to contributeto the design, development, and deployment of human-centered artificial intelligent systems, where the agents col-laborate with the human in an interpretable and explainablemanner, with the intent on ensuring fairness, transparency,and accountability. To accomplish that goal, we propose aset of research plans/directions towards achieving respon-sible or human-centered AI using XAI as a medium.
The work in [50] and [51], demonstrates a way to collectand leverage domain knowledge from two different do-mains, finance and cybersecurity, and further infused thatknowledge into black-box models for better explainability.In both of these works, competitive performance with en-hanced explainability is achieved. However, there are some open challenges such as (A) a lack of formalism of theexplanation, (B) a customized explanation for different typesof explanation recipients (e.g., layperson, domain expert,another machine), (C) a way to quantify the explanation,and (D) quantifying the level of comprehensibility withhuman studies. Therefore, leveraging the knowledge frommultiple domains, a generic framework could be usefulconsidering the mentioned challenges. As a result, mission-critical applications from different domains will be able toleverage the black-box model with greater confidence andregulatory compliance.
Responsible use of AI is crucial for avoiding risks stemmingfrom a lack of fairness, accountability, and transparency inthe model. Remediation of data, algorithmic, and societalbiases is vital to promote fairness; the AI system/adoptershould be held accountable to affected parties for its deci-sion; and finally, an AI system should be analyzable, wherethe degree of transparency should be comprehensible tohave trust in the model and its prediction for mission-critical applications. Interestingly, XAI enhances understat-ing directly, increasing trust as a side-effect. In addition,the explanation techniques can help in uncovering potentialrisks (e.g., what are possible fairness risks). So it is crucial toadhere to fairness, accountability, and transparency princi-ples in the design and development of explainable models.
To ensure the responsible use of AI, the design, devel-opment, and deployment of human-centered AI, that col-laborates with the humans in an explainable manner, isessential. Therefore, the explanation from the model needsto be comprehensible by the user, and there might be somesupplementary questions that need to be answered for aclear explanation. So, the interaction (e.g., follow-ups afterthe initial explanation) between humans and machines isimportant. The interaction is more crucial for adaptive ex-plainable models that provide context-aware explanationsbased on user profiles such as expertise, domain knowledge,interests, and cultural backgrounds. The social sciences andhuman behavioral studies have the potential to impactXAI and human-centered AI research. Unfortunately, theHuman-Computer Interaction (HCI) community is kind ofisolated. The combination of HCI empirical studies andhuman science theories could be a compelling force for thedesign of human-centered AI models as well as furtheringXAI research. Therefore, efforts to bring a human into theloop, enabling the model to receive input (repeated feed-back) from the provided visualization/explanations to thehuman, and improving itself with the repeated interactions,has the potential to further human-centered AI. Besidesadherence to fairness, accountability, and transparency, theeffort will also help in developing models that adhere to ourethics, judgment, and social norms.
From the explanation perspective, there is plenty of researchin philosophy, psychology, and cognitive science on how people generate, select, evaluate, and represent explana-tions and associate cognitive biases and social expectationsin the explanation process. In addition, from the interac-tion perspective, human-computer teaming involving socialscience, the HCI community, and social-behavioral stud-ies could combine for further breakthroughs. Furthermore,from the application perspective, the collectively learnedknowledge from different domains (e.g., Health-care, Fi-nance, Medicine, Security, Defense) can contribute to fur-thering human-centric AI and XAI research. Thus, there is aneed for a growing interest in multidisciplinary research topromote human-centric AI as well as XAI in mission-criticalapplications from different domains. ONCLUSION
We demonstrate and analyze mutual XAI methods usinga mutual test case to explain competitive advantages andelucidate the challenges and further research directions.Most of the available works on XAI are on the post-hocnotion of explainability. However, the post-hoc notion ofexplainability is not purely transparent and can be mis-leading, as it explains the decision after it has been made.The explanation algorithm can be optimized to placatesubjective demand, primarily stemming from the emulationeffort of the actual prediction, and the explanation can bemisleading, even when it seems plausible [60], [61]. Thus,many suggest not to explain black-box models using post-hoc notions, instead, they suggest adhering to simple andintrinsically explainable models for high stakes decisions[17]. Furthermore, from the literature review, we find thatexplainability in pre-modeling is a viable option to avoidthe transparency related issues, albeit, under-focused. Inaddition, knowledge infusion techniques have the potentialto enhance explainability greatly, although, also an under-focused challenge. Therefore, we need more focus on theexplainability of “black box” models using domain knowl-edge. At the same time, we need to focus on the evaluationor quantification of explainability using both human andnon-human studies. We believe this review provides a goodinsight into the current progress on XAI approaches, eval-uation and quantification of explainability, open challenges,and a path towards responsible or human-centered AI usingXAI as a medium. A CKNOWLEDGMENTS
Our sincere thanks to Christoph Molnar for his open E-book on Interpretable Machine Learning and contributionto the open-source R package “iml”. Both were very usefulin conducting this survey. R EFERENCES [1] G. Ras, M. van Gerven, and P. Haselager, “Explanation methodsin deep learning: Users, values, concerns and challenges,” in
Explainable and Interpretable Models in Computer Vision and MachineLearning . Springer, 2018, pp. 19–36.[2] B. Goodman and S. Flaxman, “Eu regulations on algorithmicdecision-making and a “right to explanation”,” in
ICML workshopon human interpretability in machine learning (WHI 2016), New York,NY. http://arxiv. org/abs/1606.08813 v1
Sensors , vol. 20, no. 14, p. 3817, 2020.[6] W. Samek, T. Wiegand, and K.-R. M ¨uller, “Explainable artificialintelligence: Understanding, visualizing and interpreting deeplearning models,” arXiv preprint arXiv:1708.08296 , 2017.[7] A. Fernandez, F. Herrera, O. Cordon, M. J. del Jesus, and F. Mar-celloni, “Evolutionary fuzzy systems for explainable artificial in-telligence: why, when, what for, and where to?” ieee ComputationalintelligenCe magazine , vol. 14, no. 1, pp. 69–81, 2019.[8] A. Adadi and M. Berrada, “Peeking inside the black-box: A surveyon explainable artificial intelligence (xai),”
IEEE Access , vol. 6, pp.52 138–52 160, 2018.[9] S. T. Mueller, R. R. Hoffman, W. Clancey, A. Emrey, and G. Klein,“Explanation in human-ai systems: A literature meta-review, syn-opsis of key ideas and publications, and bibliography for explain-able ai,” arXiv preprint arXiv:1902.01876 , 2019.[10] C. Molnar, G. Casalicchio, and B. Bischl, “Quantifying modelcomplexity via functional decomposition for better post-hoc in-terpretability,” in
Joint European Conference on Machine Learning andKnowledge Discovery in Databases . Springer, 2019, pp. 193–204.[11] M. Staniak and P. Biecek, “Explanations of model predictions withlive and breakdown packages,” arXiv preprint arXiv:1804.01955 ,2018.[12] L. H. Gilpin, D. Bau, B. Z. Yuan, A. Bajwa, M. Specter, and L. Ka-gal, “Explaining explanations: An overview of interpretability ofmachine learning,” in . IEEE, 2018, pp. 80–89.[13] D. Collaris, L. M. Vink, and J. J. van Wijk, “Instance-level ex-planations for fraud detection: A case study,” arXiv preprintarXiv:1806.07129 , 2018.[14] F. K. Doˇsilovi´c, M. Brˇci´c, and N. Hlupi´c, “Explainable artificialintelligence: A survey,” in . IEEE, 2018, pp. 0210–0215.[15] E. Tjoa and C. Guan, “A survey on explainable artificial intelli-gence (xai): towards medical xai,” arXiv preprint arXiv:1907.07374 ,2019.[16] F. Doshi-Velez and B. Kim, “Towards a rigorous science of inter-pretable machine learning,” arXiv preprint arXiv:1702.08608 , 2017.[17] C. Rudin, “Stop explaining black box machine learning modelsfor high stakes decisions and use interpretable models instead,”
Nature Machine Intelligence , vol. 1, no. 5, pp. 206–215, 2019.[18] A. B. Arrieta, N. D´ıaz-Rodr´ıguez, J. Del Ser, A. Bennetot, S. Tabik,A. Barbado, S. Garc´ıa, S. Gil-L´opez, D. Molina, R. Benjamins et al. ,“Explainable artificial intelligence (xai): Concepts, taxonomies,opportunities and challenges toward responsible ai,”
InformationFusion , vol. 58, pp. 82–115, 2020.[19] T. Miller, “Explanation in artificial intelligence: Insights from thesocial sciences,”
Artificial Intelligence , 2018.[20] Q.-s. Zhang and S.-C. Zhu, “Visual interpretability for deeplearning: a survey,”
Frontiers of Information Technology & ElectronicEngineering , vol. 19, no. 1, pp. 27–39, 2018.[21] B. Chandrasekaran, M. C. Tanner, and J. R. Josephson, “Explainingcontrol strategies in problem solving,”
IEEE Intelligent Systems ,no. 1, pp. 9–15, 1989.[22] W. R. Swartout and J. D. Moore, “Explanation in second generationexpert systems,” in
Second generation expert systems . Springer,1993, pp. 543–585.[23] W. R. Swartout, “Rule-based expert systems: The mycin experi-ments of the stanford heuristic programming project: Bg buchananand eh shortliffe,(addison-wesley, reading, ma, 1984); 702 pages,”1985.[24] A. Dhurandhar, V. Iyengar, R. Luss, and K. Shanmugam, “Tip:Typifying the interpretability of procedures,” arXiv preprintarXiv:1706.02952 , 2017.[25] S. A. Friedler, C. D. Roy, C. Scheidegger, and D. Slack, “Assess-ing the local interpretability of machine learning models,” arXivpreprint arXiv:1902.03501 , 2019. [26] J. Huysmans, K. Dejaeger, C. Mues, J. Vanthienen, and B. Baesens,“An empirical evaluation of the comprehensibility of decisiontable, tree and rule based predictive models,” Decision SupportSystems , vol. 51, no. 1, pp. 141–154, 2011.[27] F. Poursabzi-Sangdeh, D. G. Goldstein, J. M. Hofman, J. W.Vaughan, and H. Wallach, “Manipulating and measuring modelinterpretability,” arXiv preprint arXiv:1802.07810 , 2018.[28] Q. Zhou, F. Liao, C. Mou, and P. Wang, “Measuring interpretabilityfor different types of machine learning models,” in
Pacific-AsiaConference on Knowledge Discovery and Data Mining et al. , “Interpretable machine learning: A guide for mak-ing black box models explainable,”
E-book at¡ https://christophm.github. io/interpretable-ml-book/¿, version dated , vol. 10, 2018.[32] J. H. Friedman, B. E. Popescu et al. , “Predictive learning via ruleensembles,”
The Annals of Applied Statistics , vol. 2, no. 3, pp. 916–954, 2008.[33] J. H. Friedman, “Greedy function approximation: a gradient boost-ing machine,”
Annals of statistics , pp. 1189–1232, 2001.[34] A. Goldstein, A. Kapelner, J. Bleich, and M. A. Kapelner, “Package‘icebox’,” 2017.[35] L. Breiman, “Random forests,”
Machine learning , vol. 45, no. 1, pp.5–32, 2001.[36] A. Fisher, C. Rudin, and F. Dominici, “Model class re-liance: Variable importance measures for any machine learningmodel class, from the “rashomon” perspective,” arXiv preprintarXiv:1801.01489 , 2018.[37] M. T. Ribeiro, S. Singh, and C. Guestrin, “Why should i trust you?:Explaining the predictions of any classifier,” in
Proceedings of the22nd ACM SIGKDD international conference on knowledge discoveryand data mining . ACM, 2016, pp. 1135–1144.[38] L. S. Shapley, “A value for n-person games,”
Contributions to theTheory of Games , vol. 2, no. 28, pp. 307–317, 1953.[39] S. Lundberg and S.-I. Lee, “An unexpected unity amongmethods for interpreting model predictions,” arXiv preprintarXiv:1611.07478 , 2016.[40] B. B. . B. Greenwell, “Chapter 16 interpretable machine learning— hands-on machine learning with r,” https://bradleyboehmke.github.io/HOML/iml.html, (Accessed on 11/28/2019).[41] R. Moraffah, M. Karami, R. Guo, A. Raglin, and H. Liu, “Causalinterpretability for machine learning-problems, methods and eval-uation,”
ACM SIGKDD Explorations Newsletter , vol. 22, no. 1, pp.18–33, 2020.[42] A. Hartl, M. Bachl, J. Fabini, and T. Zseby, “Explainability andadversarial robustness for rnns,” arXiv preprint arXiv:1912.09855 ,2019.[43] D. L. Marino, C. S. Wickramasinghe, and M. Manic, “An adversar-ial approach for explainable ai in intrusion detection systems,” in
IECON 2018-44th Annual Conference of the IEEE Industrial ElectronicsSociety . IEEE, 2018, pp. 3237–3243.[44] B. Kim, R. Khanna, and O. O. Koyejo, “Examples are not enough,learn to criticize! criticism for interpretability,” in
Advances inNeural Information Processing Systems , 2016, pp. 2280–2288.[45] J. Chen, L. Song, M. J. Wainwright, and M. I. Jordan, “Learning toexplain: An information-theoretic perspective on model interpre-tation,” arXiv preprint arXiv:1802.07814 , 2018.[46] A. Jung and P. H. J. Nardelli, “An information-theoretic approachto personalized explainable machine learning,”
IEEE Signal Pro-cessing Letters , 2020.[47] M. Ancona, E. Ceolini, C. ¨Oztireli, and M. Gross, “Towards betterunderstanding of gradient-based attribution methods for deepneural networks,” arXiv preprint arXiv:1711.06104 , 2017.[48] B. Kim, M. Wattenberg, J. Gilmer, C. Cai, J. Wexler, F. Viegas, andR. Sayres, “Interpretability beyond feature attribution: Quantita-tive testing with concept activation vectors (tcav),” arXiv preprintarXiv:1711.11279 , 2017.[49] U. Kursuncu, M. Gaur, and A. Sheth, “Knowledge infused learn-ing (k-il): Towards deep incorporation of knowledge in deeplearning,” arXiv preprint arXiv:1912.00512 , 2019.[50] S. R. Islam, W. Eberle, S. Bundy, and S. K. Ghafoor, “Infusingdomain knowledge in ai-based” black box” models for betterexplainability with application in bankruptcy prediction,”
ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2019,Anomaly Detection in Finance Workshop , 2019.[51] S. R. Islam, W. Eberle, S. K. Ghafoor, A. Siraj, and M. Rogers,“Domain knowledge aided explainable artificial intelligence forintrusion detection and response,” arXiv preprint arXiv:1911.09853 ,2019.[52] E. Angelini, G. di Tollo, and A. Roli, “A neural network approachfor credit risk evaluation,”
The quarterly review of economics andfinance et al. , Introduction to computer security . Pearson EducationIndia, 2006.[55] J. F ¨urnkranz, D. Gamberger, and N. Lavraˇc, “Rule learning in anutshell,” in
Foundations of Rule Learning . Springer, 2012, pp. 19–55.[56] H. Yang, C. Rudin, and M. Seltzer, “Scalable bayesian rule lists,” in
Proceedings of the 34th International Conference on Machine Learning-Volume 70 . JMLR. org, 2017, pp. 3921–3930.[57] S. R ¨uping et al. , “Learning interpretable models,” 2006.[58] S. R. Islam, W. Eberle, and S. K. Ghafoor, “Towards quantificationof explainability in explainable artificial intelligence methods,” arXiv preprint arXiv:1911.10104 , 2019.[59] G. A. Miller, “The magical number seven, plus or minus two: Somelimits on our capacity for processing information.”
Psychologicalreview , vol. 63, no. 2, p. 81, 1956.[60] Z. C. Lipton, “The mythos of model interpretability,” arXiv preprintarXiv:1606.03490arXiv preprintarXiv:1606.03490