[PDF] Explainable AI without Interpretable Model

Abstract

Explainability has been a challenge in AI for as long as AI has existed. With the recently increased use of AI in society, it has become more important than ever that AI systems would be able to explain the reasoning behind their results also to end-users in situations such as being eliminated from a recruitment process or having a bank loan application refused by an AI system. Especially if the AI system has been trained using Machine Learning, it tends to contain too many parameters for them to be analysed and understood, which has caused them to be called `black-box' systems. Most Explainable AI (XAI) methods are based on extracting an interpretable model that can be used for producing explanations. However, the interpretable model does not necessarily map accurately to the original black-box model. Furthermore, the understandability of interpretable models for an end-user remains questionable. The notions of Contextual Importance and Utility (CIU) presented in this paper make it possible to produce human-like explanations of black-box outcomes directly, without creating an interpretable model. Therefore, CIU explanations map accurately to the black-box model itself. CIU is completely model-agnostic and can be used with any black-box system. In addition to feature importance, the utility concept that is well-known in Decision Theory provides a new dimension to explanations compared to most existing XAI methods. Finally, CIU can produce explanations at any level of abstraction and using different vocabularies and other means of interaction, which makes it possible to adjust explanations and interaction according to the context and to the target users.

Full PDF

EE XPLAINABLE AI WITHOUT I NTERPRETABLE M ODEL

A P

REPRINT

Kary Främling

Department of Computing ScienceUmeå UniversityMit-huset, 901 87 Umeå, Sweden [email protected]

September 30, 2020 A BSTRACT

Explainability has been a challenge in AI for as long as AI has existed. With the recently increaseduse of AI in society, it has become more important than ever that AI systems would be able to explainthe reasoning behind their results also to end-users in situations such as being eliminated from arecruitment process or having a bank loan application refused by an AI system. Especially if theAI system has been trained using Machine Learning, it tends to contain too many parameters forthem to be analysed and understood, which has caused them to be called ‘black-box’ systems. MostExplainable AI (XAI) methods are based on extracting an interpretable model that can be used forproducing explanations. However, the interpretable model does not necessarily map accurately to theoriginal black-box model. Furthermore, the understandability of interpretable models for an end-userremains questionable. The notions of Contextual Importance and Utility (CIU) presented in this papermake it possible to produce human-like explanations of black-box outcomes directly, without creatingan interpretable model. Therefore, CIU explanations map accurately to the black-box model itself.CIU is completely model-agnostic and can be used with any black-box system. In addition to featureimportance, the utility concept that is well-known in Decision Theory provides a new dimension toexplanations compared to most existing XAI methods. Finally, CIU can produce explanations at anylevel of abstraction and using different vocabularies and other means of interaction, which makes itpossible to adjust explanations and interaction according to the context and to the target users.

Explainability has been a challenge in AI for as long as AI has existed. Shortliffe et al pointed out already in 1975that ‘It is our belief, therefore, that a consultation program will gain acceptance only if it serves to augment rather thanreplace the physician’s own decision making processes.’ [1]. The system described in that paper was MYCIN, an expertsystem that was capable of advising physicians who request advice regarding selection of appropriate antimicrobialtherapy for hospital patients with bacterial infections. Great emphasis was put into the interaction with the end-user, inthis case a skilled physician.With the recently increased use of AI in society, it has become more important than ever that AI systems should be ableto explain the reasoning behind their results also to end-users, in situations such as being eliminated from a recruitmentprocess or having a bank loan application refused. Meanwhile, many XAI researchers have pointed out that it is rarethat current XAI research would truly take ‘normal’ end-users into consideration. For instance, Miller et al. illustratethe phenomenon in their article entitled ‘Explainable AI: Beware of Inmates Running the Asylum’ for expressing thetendency that current XAI methods mainly help AI researchers to understand their own results and models [2].Many XAI researchers also point out that it is fair to say that most XAI work uses only the researchers’ intuition of whatconstitutes a ‘good’ explanation, while ignoring the vast and valuable bodies of research in philosophy, psychology, andcognitive science of how people deﬁne, generate, select, evaluate, and present explanations [3][4]. Another domain thatseems neglected in current XAI work is

Decision Theory and related sub-domains such as

Multiple Criteria DecisionMaking (MCDM) [5]. Decision Theory is tightly connected with the other mentioned domains because methods of a r X i v : . [ c s . A I] S e p PREPRINT - S

EPTEMBER

30, 2020Decision Theory are intended to produce Decision Support Systems (DSS) that are understood and used by humanswhen taking decisions. Decision Theory and MCDM provide clear deﬁnitions of what is meant by the importance of aninput, as well as what is the utility of a given input value towards the outcome of the DSS. A simple linear DSS modelis the weighted sum, where a numerical weight expresses the importance of an input and a numerical score expressesthe utility of different possible input values for different outcomes of the DSS, i.e. how good or favorable a value is.

Contextual Importance and Utility (CIU) extends this linear deﬁnition of importance and utility towards non-linearmodels such as those produced by typical ML methods. In many (or most) real-life situations the importance of aninput and the utility of different input values changes depending on values of other inputs. For instance, the outdoortemperature has a great importance on a person’s comfort level as long as the person is outdoors. When the person goesinside, the situation (context) changes and the outdoor temperature then only has an indirect (if any) importance for theperson’s comfort level. Regarding utility, both a very cold and a very warm outdoor temperature might be good or baddepending on the context. For instance, a − ◦ C temperature tends to be uncomfortable if wearing a T-shirt, whereas a ◦ C temperature is uncomfortable if wearing winter clothes. The utility of different temperature values changes whenadding or removing clothes, and vice versa, the utility of different clothes changes when the temperature changes.After this Introduction, Section 2 goes through the most relevant state-of-the-art of XAI methods. Section 3 presentsthe formal deﬁnition of CIU. Experimental results are shown in Section 4. Open questions and future research arepresented in Section 5, followed by conclusions in Section 6.

There does not seem to be a clear agreement in XAI literature on the meaning of the terms interpretable versus explainable . For the rest of this paper, interpretable model will be used to signify models whose behaviour humans canunderstand to some extent, such as rules or linear models.

Explanation will be used to signify what is actually presentedto a user for a speciﬁc prediction or outcome.XAI methods can be classiﬁed into categories model explanation , outcome explanation and model inspection accordingto [6]. Model explanation signiﬁes providing a global explanation of the black-box model through an interpretableand transparent model. This model should be able to mimic the entire behavior of the black-box and it should alsobe understandable by humans. Rule extraction methods and estimation of global feature importance are examples ofmodel explanation methods, as well as decision tree, attention model, etc.Outcome explanation consists in providing an explanation of the outcome of the black-box for a speciﬁc instance (orcontext) and can therefore be considered local . It is not required to explain the underlying logic of the entire black-boxbut only the reason for the outcome on a speciﬁc input instance. Model inspection is not truly a XAI category, it mainlyrefers to how model or outcome explanations are presented to users (visual or textual for instance) for understandingthe black-box model or its outcome.Most (or all) current outcome explanation methods are so-called post-hoc methods, i.e. they require creating anintermediate interpretable model to provide explanations. The Local Interpretable Model-agnostic Explanations (LIME)method presented in 2016 [7] might be considered a cornerstone regarding post-hoc outcome explanation. LIME belongsto the family of additive feature attribution methods [8] that are based on the assumption that a locally linear model thatrepresents the gradient around the current context is sufﬁcient for outcome explanation purposes. Other methods thatbelong to the same family are for instance Shapley values, DeepLIFT and Layer-Wise Relevance Propagation [8].A major challenge of all methods that use an intermediate interpretable model (the ‘explanation model’ in [8]) is to whatextent the interpretable model actually corresponds to the black-box model. A rising concern among XAI researchersis that current XAI methods themselves tend to be black-boxes whose behaviour is as difﬁcult to understand as thatof the explained AI black-boxes, which causes challenges to assess to what extent XAI explanations can be trusted.Furthermore, it is not evident whether a gradient-based, locally linear model is adequate or accurate for interpreting orexplaining black-box behaviour. CIU differs radically from the existing state-of-the-art in XAI because CIU does notcreate or use an intermediate interpretable model.

The underlying idea behind CIU is to use a similar approach to explanation as humans do when explaining or justifyinga decision to other humans. In a XAI context, the explainer is a (X)AI system that justiﬁes or explains its decisions oractions and the explainee is a human (one or many) that is the target of the explanation [3]. Human explainers tendto identify what were the most important aspects that inﬂuenced their decision and start their explanation with them.Human explainers also adapt the abstraction level and vocabulary used in the explanation to their expectations about2

PREPRINT - S

EPTEMBER

30, 2020what is best understood and accepted by the explainee. It is generally not enough to explain only the taken decision, itis also often necessary to justify why another decision wasn’t taken instead.CIU was initially developed in a MCDM context [9]. In MCDM, importance and utility concepts are clearly deﬁned.The Analytic Hierarchy Process (AHP) [10] that was originally developed in the 1970’s seems to have become the mostpopular MCDM method in research and practice [11]. AHP is essentially based on a weighted sum, where the globaloutput can be broken into intermediate concepts in a hierarchical manner. The importance of different criteria (features,inputs of the model) is expressed by numeric weights. The utility expresses how good, favorable or typical a value is forthe output of the model. For a car selection problem, importance and utility can be used for giving explanations suchas ‘This car is good because it has a good size, decent performances and a reasonable price, which are very importantfeatures’, where words indicating utilities are underlined and only the most important features are presented. The use ofa linear model makes the meaning of importance and utility quite understandable to humans, as illustrated in Figure 1a.Rule-based systems, as well as classiﬁcation trees for instance, are a way of overcoming the linearity limitation buttends to lead to step-wise models as illustrated in Figure 1b. Non-linear models such as neural nets can learn smoothand non-linear functions as illustrated in Figure 1c. Even though CIU can deal with all three kinds of models, the focushere is on the kind of non-linear functions in Figure 1c. We will begin the formal deﬁnition of CIU by providing a setof deﬁnitions.

Deﬁnition 1 (Black-box model) . A black-box model is a mathematical transformation f that maps inputs x to outputs y according to y = f ( x ) . Deﬁnition 2 (Context) . A Context C deﬁnes the input values x that describe the current situation or instance to beexplained. Deﬁnition 3 (Pre-deﬁned output range) . The value range [ absmin j , absmax j ] that an output y j can take by deﬁnition. In classiﬁcation tasks, the Pre-deﬁned output range is typically [0 , . In regression tasks the minimum and max-imum output values present in a training set used for Machine Learning can usually be used as an estimate of [ absmin j , absmax j ] . Deﬁnition 4 (Set of studied inputs for CIU) . The index set { i } deﬁnes the indices of inputs x for which CIU iscalculated. Deﬁnition 5 (Estimated output range) . [ Cmin j ( C, { i } ) , Cmax j ( C, { i } )] is the range of values that an output y j cantake in the Context C when modifying the values of inputs x { i } . The values used for the inputs x { i } should be ‘representative’ or realistic within the Context C . The meaning of‘representative’ is discussed further down in this paper.We are now ready to provide the ﬁrst deﬁnition of Contextual Importance, using a Pre-deﬁned output range , followedby the deﬁnition of Contextual Utility.

Deﬁnition 6 (Contextual Importance) . Contextual Importance CI j ( C, { i } ) is a numeric value that expresses to whatextent variations in one or several inputs { i } affect the value of an output j of a black-box model f , according to CI j ( C, { i } ) = Cmax j ( C, { i } ) − Cmin j ( C, { i } ) absmax j − absmin j (1) x x y (a) Weighted sum. x x y (b) Rule-based. x x y (c) Non-linear model. Figure 1: Examples of linear ( y = 0 . x + 0 . x ), rule-based and non-linear ( y = ( x . + x ) / ) models.3 PREPRINT - S

EPTEMBER

30, 2020 . . . . . . x1 (with constant x2=0.2) y absmin = 0.0absmax = 1.0 Cmin = 0.02Cmax = 0.52out = 0.178 0.0 0.2 0.4 0.6 0.8 1.0 . . . . . . x2 (with constant x1=0.1) y absmin = 0.0absmax = 1.0 Cmin = 0.158Cmax = 0.658out = 0.178 Figure 2: Illustration of calculations of CI and CU for the non-linear model in Figure 1c.

Deﬁnition 7 (Contextual Utility) . Contextual Utility CU j ( C, { i } ) is a numeric value that expresses to what extent thecurrent input values C are favorable for the output y j ( C ) of a black-box model, according to CU j ( C, { i } ) = y j ( C ) − Cmin j ( C, { i } ) Cmax j ( C, { i } ) − Cmin j ( C, { i } ) (2)CI and CU are illustrated in Figure 2 for the non-linear function in Figure 1c. With ( C ) = (0 . , . , CI ( C, { } ) = 0 . and CI ( C, { } ) = 0 . , which signiﬁes that both inputs are exactly as important for the output value. For the utilities, CU ( C, { } ) = 0 , and CU ( C, { } ) = 0 . , so even though the x value is higher than the x value, the utilityof the x value is higher than the utility of the x value for the result y .The estimation of the range [ Cmin j ( C, { i } ) , Cmax j ( C, { i } )] is the only part of CIU that requires more thanone y = f ( x ) calculation. It is also the most critical part of CIU for producing explanations that truly corre-spond to and explain the behaviour of the black-box. Even though it might be possible to calculate or estimate [ Cmin j ( C, { i } ) , Cmax j ( C, { i } )] directly for some models, that is not the case for generic black-box models. Onepossible approach is to generate a Set of representative input vectors . Deﬁnition 8 (Set of representative input vectors) . S ( C, { i } ) is an N × M matrix, where M is the length of x and N is a parameter that gives the number of input vectors to generate for obtaining an adequate estimate of the Estimatedoutput range [ Cmin j ( C, { i } ) , Cmax j ( C, { i } )] . A simple way to construct S ( C, { i } ) is to set all input vectors in S ( C, { i } ) to C and then replace the values of inputs { i } with random values from a pre-deﬁned value range that may be different for every input x i . N is the only adjustableparameter of CIU and needs to be determined based on the complexity of the function learned by the model. Moreefﬁcient approaches than random values certainly exist but that remains a topic of future research. Furthermore, randomvalues do not guarantee that the generated input vectors are ‘representative’ . It might even result in input vectors thatare impossible in reality. There is also a risk to have input vectors that are not even close to the examples that wereincluded in the training set of the black-box. This challenge can be addressed at least in the following ways:1. Use a black-box model that has some guarantees that [ Cmin j ( C, { i } ) , Cmax j ( C, { i } )] does not go out-of-bounds even with ‘non-representative’ input vectors. In a classiﬁcation task, for instance, ‘non-representative’input vectors are not a problem for models whose outputs do not go under zero and do not go over one underany conditions .2. Eliminate or correct input vectors that are impossible in reality or that are too far from those included in thetraining set. One way of doing this could be to remove all rows in S ( C, { i } ) that are too far from any examplein the training set. One example of non-realistic input vectors that are straightforward to correct is if there areone-hot encoded inputs, where only one of the concerned inputs is allowed to be TRUE in every input vector.4 PREPRINT - S

EPTEMBER

30, 20203. Use ‘non-representative’ input vectors on purpose for potentially detecting inconsistencies in the learnedmodel.Now that we have studied how to estimate CI and CU of one or more inputs { i } for any output out j , we will introducethe notion of Intermediate Concept . Deﬁnition 9 (Intermediate Concept) . An Intermediate Concept names a given set of inputs { i } . As deﬁned by Equations (1) and (2), CIU can be estimated for any set of inputs { i } . Intermediate concepts make itpossible to specify vocabularies that can be used for producing explanations on any level of abstraction. Differentnames can be used for the same Intermediate Concept (as well as for input features) and change the concept name usedaccording to the current context and the target explainee(s).In addition to using Intermediate Concepts for explaining y values, Intermediate Concept values can be explained usingmore speciﬁc Intermediate Concepts or input features. The following deﬁnes Generalized Contextual Importance forexplaining Intermediate Concepts.

Deﬁnition 10 (Generalized Contextual Importance) . CI j ( C, { i } , { I } ) = Cmax j ( C, { i } ) − Cmin j ( C, { i } ) Cmax j ( C, { I } ) − Cmin j ( C, { I } ) (3) where { I } is the set of input indices that correspond to the Intermediate Concept that we want to explain and { i } ∈ { I } . Equation 3 is similar to Equation 1 when { I } is the set of all inputs, i.e. the range [ absmin j , absmax j ] has beenreplaced by the range [ Cmin j ( C, { I } , Cmax j ( C, { I } ] . Equation 2 for CU does not change by the introduction ofIntermediate Concepts. In other words, Equation 3 allows the explanation of the outputs y j as well as the explanation ofany Intermediate Concept that leads to y j . Experimental results are shown for two well-know benchmark data sets: Iris ﬂowers and Boston Housing. Iris ﬂowersis a classiﬁcation task, whereas Boston Housing is a regression task. This choice of using simple but well-known datasets signiﬁes that it is relatively easy to understand the learned models and also to assess what ‘correct’ explanationsmight look like.Bar plot visualisations are used in this paper for illustrating CIU. The length on the bar corresponds to the CI value.A conﬁgurable threshold value of CU neutral = 0 . has been used for dividing the CU range [0 , into ‘defavorable’and ‘favorable’ ranges. A red-yellow-green color scale visualises the CU value, where CU ∈ [0 , CU neutral ] gives acontinuous transition from red to yellow. CU ∈ [ CU neutral , gives a continuous transition from yellow to dark green.Future analysis and validation with more data-sets and real-life applications will be needed in order to assess whether CU neutral needs to be adjusted in practice.A set S ( C, { i } ) with N = 1000 has been used for all results reported here, which gives a negligible calculation timeusing RStudio Version 1.2 on a MacBook Pro from 2017 with a 2,8 GHz Quad-Core Intel Core i7 processor and 16 GB2133 MHz LPDDR3 memory. The neural net described in [12] was used for the Iris data set, which has the useful property of converging towards themean output value when the input values go towards inﬁnity. Therefore the range [ Cmin j ( C, { i } ) , Cmax j ( C, { i } )] should remain within reasonable bounds. A speciﬁc Iris test instance is studied that is quite a typical Virginica withvalues C = (7 , . , , . for the input features ‘Sepal Length’, ‘Sepal Width’, ‘Petal Length’, ‘Petal Width’. Thetrained neural network gives us y = (0 . , . , . for the three outputs classes ‘Setosa’, ‘Versicolor’, and‘Virginica’, so it is clearly a Virginica. Table 1 shows the corresponding CIU values for Iris test .Some questions that could be asked are ‘

Why is it a Virginica? ’ but also ‘

Why is it not a Versicolor or a Setosa? ’. Figure3a shows bar plot explanations for the three Iris classes. It is clear that

Iris test is not a Setosa because none of thefeatures is typical for a Setosa and modifying any of the values will not change the situation. On the other hand, allfeatures are typical for a Virginica. Petal length is clearly the most important feature for the classiﬁcation of

Iris test .Figure 4 shows how the output value (estimated class probability) changes for Versicolor and Virginica as a function of5

PREPRINT - S

EPTEMBER

30, 2020

Sepal LengthSepal widthPetal LengthPetal width

Setosa

Contextual Importance0.0 0.2 0.4 0.6 0.8 1.0 Sepal LengthSepal widthPetal LengthPetal width

Versicolor

Contextual Importance0.0 0.2 0.4 0.6 0.8 1.0 Sepal LengthSepal widthPetal LengthPetal width

Virginica

Contextual Importance0.0 0.2 0.4 0.6 0.8 1.0 (a) All four inputs versus Iris class.

Sepal size and shapePetal size and shape

Setosa

Contextual Importance0.0 0.2 0.4 0.6 0.8 1.0 Sepal size and shapePetal size and shape

Versicolor

Contextual Importance0.0 0.2 0.4 0.6 0.8 1.0 Sepal size and shapePetal size and shape

Virginica

Contextual Importance0.0 0.2 0.4 0.6 0.8 1.0 (b) Intermediate Concepts ‘Sepal size and shape’ and ‘Petal size and shape’ versus Iris class.

Petal LengthPetal width

Petal size and shape ( Setosa )

Contextual Importance0.0 0.2 0.4 0.6 0.8 1.0 Petal LengthPetal width

Petal size and shape ( Versicolor )

Contextual Importance0.0 0.2 0.4 0.6 0.8 1.0 Petal LengthPetal width

Petal size and shape ( Virginica )

Contextual Importance0.0 0.2 0.4 0.6 0.8 1.0 (c) CIU of ‘Petal width’ and ‘Petal length’ versus Intermediate Concept ‘Petal size and shape’.

Figure 3: CIU bar plot visualisations for Iris task. Bar lengths correspond to CI values. CU values are visualised using acontinuous red-yellow-green color palette.Table 1: CIU values for Iris classes versus input.Iris class ( y j ) Setosa (0.012) Versicolor (0.158) Virginica (0.830)Input feature CI CU CI CU CI CUSepal Length 0.067 0.000 0.242 0.015 0.309 0.990Sepal Width 0.044 0.130 0.234 0.130 0.278 0.880Petal Length 0.314 0.015 0.640 0.008 0.638 0.995Petal Width 0.061 0.087 0.388 0.302 0.448 0.729Sepal size and shape 0.087 0.030 0.320 0.104 0.399 0.910Petal size and shape 0.408 0.023 0.895 0.189 0.903 0.809All inputs 0.869 0.013 0.927 0.110 0.920 0.886the four input features. These graphs conﬁrm that Petal Length is the feature that discriminates Versicolor and Virginicathe most from each other.For showing the use of Intermediate Concepts, a small vocabulary was developed. The vocabulary speciﬁes that‘Sepal size and shape’ is the combination of features ‘Sepal Length’ and ‘Sepal Width’. ‘Petal size and shape’ is thecombination of features ‘Petal Length’ and ‘Petal Width’. When studying the results using the Intermediate Concepts‘Sepal size and shape’ and ‘Petal size and shape’, we get the bar plot explanation in Figure 3b.Finally, Figure 3c answers questions such as ‘why is Petal size and shape not so typical for Versicolor?’ and ‘why isPetal size and shape typical for Virginica?’. These bar plots express what can be observed also in the 3D graphs ofFigure 4, where we can see that the combination of ‘Petal Length’ and ‘Petal Width’ could be even more typical forVirginica than what it is for Iris test . 6

PREPRINT - S

EPTEMBER

30, 2020 . . . Versicolor

Sepal Length O u t pu t v a l ue . . . Versicolor

Sepal width O u t pu t v a l ue . . . Versicolor

Petal Length O u t pu t v a l ue . . . Versicolor

Petal width O u t pu t v a l ue . . . Virginica

Sepal Length O u t pu t v a l ue . . . Virginica

Sepal width O u t pu t v a l ue . . . Virginica

Petal Length O u t pu t v a l ue . . . Virginica

Petal width O u t pu t v a l ue P e t a l Leng t h P e t a l w i d t h O u t pu t v a l ue Versicolor P e t a l Leng t h P e t a l w i d t h O u t pu t v a l ue Virginica

Figure 4: Output y j as a function of input values for Versicolor (left) and Virginica (right). Red dot shows input andoutput values for Iris test . A gradient boosting model was used for the Boston Housing data set. It learned the mapping from the 13 input variablesto the Median value (medv) of owner-occupied homes in $1000’s. The resulting CIU bar plot is shown in Figure 5 forinstances

Two simple benchmark data sets were used in this paper for reasons of illustration and to enable human readers toassess the validity of the results. Experience from more data sets and use cases might lead to further extensions ofCIU. For instance, use cases involving one-hot coding will require using Intermediate Concepts for aggregating theconcerned one-hot inputs into one single explainable feature.CIU provides many topics for future research. For instance, is it always better to use CI values as such or normalisethem to one? What ways of visualising CIU are the best understood by humans? Research is ongoing in these directions7

PREPRINT - S

EPTEMBER

30, 2020 crimzninduschasnoxrmagedisradtaxptratioblacklstat

Row

Contextual Importance0.0 0.2 0.4 0.6 0.8 1.0 crimzninduschasnoxrmagedisradtaxptratioblacklstat

Row

Contextual Importance0.0 0.2 0.4 0.6 0.8 1.0

Figure 5: CIU for two Boston Housing data set instances.but how to best interact with human explainees is a vast domain. Research has also been initiated for using CIU togetherwith Reinforcement and Unsupervised learning.

CIU is a model-agnostic method that allows producing explanations from any black-box model (no matter how ‘black’or not it is), without producing an intermediate interpretable model. Therefore CIU does not have the same challengesof black-box model ﬁdelity as most other XAI methods do. Compared to other output explanation methods, CIU allowsfor more ﬂexibility in how explanations can be produced and presented to explainees due to the possibility to apply CIUto sets of features and Intermediate Concepts. Intermediate Concepts enable the use of different vocabularies dependingon the context and on the explainee. The Contextual Utility concept also allows to produce explanations in a morehuman-like way than other XAI methods.By not using an intermediate interpretable model, CIU does not ﬁt into any of the existing categories presented bymajor XAI survey articles. CIU has only one adjustable parameter, i.e. the number of samples in S ( C, { i } ) , whichmight be possible to eliminate or automate in the future. Therefore, CIU establishes a new category of XAI methodsthat will hopefully help to solve at least some of the many challenges that AI and XAI are currently facing. References [1] Edward H. Shortliffe, Randall Davis, Stanton G. Axline, Bruce G. Buchanan, C.Cordell Green, and Stanley N.Cohen. Computer-based consultations in clinical therapeutics: Explanation and rule acquisition capabilities of themycin system.

Computers and Biomedical Research , 8(4):303 – 320, 1975.[2] Tim Miller, Piers Howe, and Liz Sonenberg. Explainable AI: Beware of inmates running the asylum. In

IJCAI2017 Workshop on Explainable Artiﬁcial Intelligence (XAI) , 2017.[3] Tim Miller. Explanation in artiﬁcial intelligence: Insights from the social sciences.

Artiﬁcial Intelligence ,267:1–38, February 2019.[4] M. Westberg, A. Zelvelder, and A. Najjar. A historical perspective on cognitive science and its inﬂuence on xairesearch.

Lecture Notes in Computer Science (including subseries Lecture Notes in Artiﬁcial Intelligence andLecture Notes in Bioinformatics) , 11763 LNAI:205–219, 2019.[5] Ralph Keeney and Howard Raiffa.

Decisions with Multiple Objectives: Preferences and Value Trade-Offs .Cambridge University Press., 1976.[6] Riccardo Guidotti, Anna Monreale, Salvatore Ruggieri, Franco Turini, Fosca Giannotti, and Dino Pedreschi. Asurvey of methods for explaining black box models.

ACM Computing Surveys (CSUR) , 51(5):93, 2018.[7] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. Why should i trust you?: Explaining the predictions ofany classiﬁer, 2016.[8] Scott M Lundberg and Su-In Lee. A uniﬁed approach to interpreting model predictions. In I. Guyon, U. V. Luxburg,S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, editors,

Advances in Neural InformationProcessing Systems 30 , pages 4765–4774. Curran Associates, Inc., 2017.8

PREPRINT - S

EPTEMBER

30, 2020[9] Kary Främling.

Modélisation et apprentissage des préférences par réseaux de neurones pour l’aide à la décisionmulticritère . Phd thesis, INSA de Lyon, March 1996.[10] Thomas L. Saaty.

Decision Making for Leaders: The Analytic Hierarchy Process for Decisions in a ComplexWorld . RWS Publications, Pittsburgh, Pennsylvania, 1999.[11] Sylvain Kubler, Jérémy Robert, William Derigent, Alexandre Voisin, and Yves Le Traon. A state-of the-art survey& testbed of fuzzy ahp (fahp) applications.

Expert Systems with Applications , 65:398–422, 12 2016.[12] Kary Främling and Didier Graillot. Extracting Explanations from Neural Networks. In