[PDF] XAI-P-T: A Brief Review of Explainable Artificial Intelligence from Practice to Theory

Abstract

In this work, we report the practical and theoretical aspects of Explainable AI (XAI) identified in some fundamental literature. Although there is a vast body of work on representing the XAI backgrounds, most of the corpuses pinpoint a discrete direction of thoughts. Providing insights into literature in practice and theory concurrently is still a gap in this field. This is important as such connection facilitates a learning process for the early stage XAI researchers and give a bright stand for the experienced XAI scholars. Respectively, we first focus on the categories of black-box explanation and give a practical example. Later, we discuss how theoretically explanation has been grounded in the body of multidisciplinary fields. Finally, some directions of future works are presented.

Full PDF

XXAI-P-T: A Brief Review of ExplainableArtificial Intelligence from Practice to Theory

Nazanin Fouladgar and Kary Fr¨amling , Department of Computing Science, Ume˚a University, Sweden, [email protected] Aalto University, School of Science and Technology, Finland, [email protected]

Abstract.

In this work, we report the practical and theoretical aspects ofExplainable AI (XAI) identified in some fundamental literature. Althoughthere is a vast body of work on representing the XAI backgrounds, mostof the corpuses pinpoint a discrete direction of thoughts. Providinginsights into literature in practice and theory concurrently is still a gapin this field. This is important as such connection facilitates a learningprocess for the early stage XAI researchers and give a bright standfor the experienced XAI scholars. Respectively, we first focus on thecategories of black-box explanation and give a practical example. Later,we discuss how theoretically explanation has been grounded in the bodyof multidisciplinary fields. Finally, some directions of future works arepresented.

Keywords:

Explainable AI · Categorization · Black-Box · Multidisci-plinary Explanation · Practice · Theory

Machine learning has been immensely applied in different applications and isprogressing over time by the advent of new models [4]. Nevertheless, the complexbehavior of these models impedes human to understand concisely how specificdecisions were made. This limitation requires machines to provide transparencyby means of explanation. Therefore, one could assign ‘‘black-box’’ to the machinelearning models, aiming at making decisions, and in contrast ‘‘white-box” to theexplained version of these models under the topic of

Explanation AI (XAI) .The corpus of research in [12] recall explanation as a reasoning and simulatingprocess in response to ‘‘what happened”, ‘‘how happened” and ‘‘why happened”questions. The questions indicate that explanation could be interpreted in formsof both causal and non-causal relationships. Although causality has had morepopularity among AI researchers to solve the algorithmic decision problems, thenon-causal relationship has recently appealed the scholars of human-computerinteraction field. In fact, XAI is not yet matured enough and there are lots of openrooms for providing explanation in practice and theory. While addressing these a r X i v : . [ c s . A I] D ec N. Fouladgar and K. Fr¨amling two aspects are crucial, there is still lack of a brief and concurrent understandingof how currently the XAI field stands practically and theoretically.In this work, we give an understanding of XAI in literatures, consisting ofthree directions. First, we provide a neat categorization of black-box explanationincorporating connections between studies in Section 2. Later, we introduce apractical example in Section 3. Finally, we take a look at explanation in theoryfrom the social science perspective retrieved from Miller [12] work in Section 4.We also guide readers through some open research directions in Section 5.

Explaining machine learning models has recently become prevalent in the XAIdomain. Most of the approaches fit into the general categorization proposed byMengnan et al. [2]. The authors discuss two major interpretability categories:

Intrinsic and

Post-hoc . While in the former, the focus is on constructing self-explanatory models (e.g. decision tree, rule-based models, etc.), in the latter, theeffort relies on creating an alternative model to provide an explanation of existingmodels. These two classes are rather broad, yet in a more detailed granularity,each of them are divided into global and local explanation view. The main concernin the global explanation stands for understanding the structure and parametersof the model. However, in the local view, the causal relationship between specificinput and the predicted outcome is mainly unveiled. The schematic diagram ofthis categorization has been illustrated in Figure 1 in a blue dot block.Different approaches have been proposed by researchers, incorporating expla-nation in each of the four discussed classes. Providing global interpretations inthe intrinsic category, the models are usually enforced to comprise fewer featuresin terms of explanation constraint [7] or to approximate the base models byreadily interpretable models [16]. In the local interpretation of intrinsic category, attention mechanisms are widely applied on some models such as RecurrentNeural Networks (RNNs) [10]. The latter mechanisms provide the contributionof attended part of input in a particular decision. In general, there is a risk ofscarifying accuracy in the cost of higher explanation in the intrinsic category.Moving toward post-hoc explanation to keep accuracy and fidelity, in theglobal view, feature importance has been discussed massively [1]. This approachimplies for statistical contributions of each feature in black-box models. In thissense, permutating features iteratively and later investigating how the accuracydeviates, have proved high efficiency in the interpretation of classical machinelearning models. In case of deep learning models, the main concern is to findpreferred inputs for neurons of specific layer to maximize their activation [15].This technique is called activation maximization , whose performance highlydepends on the optimization procedure. Provided the local post-hoc explanation,a neighborhood of input has been approximated by utilizing interpretable models(white-box) and establishing explanation on the new model prediction [13]. Puttingattention on specific models of the latter category, back-propagating the gradients (variants) of outputs with respect to their inputs has been discussed in the deep

Brief Review of Explainable Artificial Intelligence 3 leaning models [11]. This process refers to a top-down strategy, exploring morerelevant features in prediction.Apart from generic categorization of machine learning explanations in [2] andthe aforementioned solutions in the literatures, the formal formulations of thecategories have been articulated in [9]. In this study, the global and local post-hoc interpretations are recalled by model explanation and outcome explanation terminologies respectively. Figure 1 shows these terminologies in a red dot blockwith respect to their equivalent categories in [2]. The red bidirectional arrowsillustrate these equivalencies. According to the work in [9], the two explanationsare formulated as following respectively:Fig. 1: Categorization of explainable machine learning models -- Given a black-box predictor b and a set of instances X , the model explanation problem consists of finding an explanation E ∈ E , belonging to a humaninterpretable domain E , through an interpretable global predictor c g = f ( b, X )derived from the black box b and the instances X using some process f ( ., . ).An explanation E ∈ E is obtained through c g , if E = ε g ( c g , X ) for someexplanation logic ε g ( ., . ), which reasons over c g and X . -- Given a black box predictor b and an instance x , the outcome explanation problem consists of finding an explanation e ∈ E , belonging to a humaninterpretable domain E , through an interpretable local predictor c l = f ( b, x )derived from the black box b and the instance x using some process f ( ., . ). An N. Fouladgar and K. Fr¨amling explanation e ∈ E is obtained through c l , if e = ε l ( c l , x ) for some explanationlogic ε l ( ., . ), which reasons over c l and x .Likewise the difference between the global and local post-hoc explanations,the difference between the above formulations relies on the explanations of wholeblack-box logic and specific input contribution in the black-box decisions.Another class of black-box explanation introduced in [9] is called model in-spection , designed to leverage domain-required analysis. This terminology locatessomehow in the middle of the two previous categories, the model explanation andthe outcome explanation . In fact, model inspection provides a visual or textualrepresentations, to give an explanation of either black box specific property orthe decision made (e.g. one could vary the inputs and observe the predictionchanges visually in sensitivity analysis). The schematic diagram of this blockhas been illustrated with a red unidirectional arrow in Figure 1 and its formaldefinition is represented as following [9]: -- Given a black box b and a set of instances X , the model inspection problemconsists of providing a (visual or textual) representation r = f ( b, X ) of someproperty of b , using some process f ( ., . ).All the three mentioned terminologies in [9] stand for defining either anexternal model or a visual/textual representation. Recalling from [2], the intrinsicexplanations are replaced by transparent box design terminology in [9]. The leftbidirectional red arrows in Figure 1 indicate this replacement and the transparentbox design is formulated as following [9]: -- Given a training dataset D = ( X, ˆ Y ), the transparent box design problemconsists of learning a locally or globally interpretable predictor c from D . Fora locally interpretable predictor c , there exists a local explanator logic ε l toderive an explanation ε l ( c, x ) of the decision c ( x ) for an instance x . For aglobally interpretable predictor c , there exists a global explanator logic ε g toderive an explanation ε g ( c, X ).As it is clear in the formulation above, the explanation ε is given based on itsown predictor c ( x ) rather than external global models ( c g = f ( b, X )) or externallocal models ( c l = f ( b, x )).In addition to the formulations above, the categorized explanation tools suchas salient mask (SM) and partial dependence plot (PDP) are scrutinized in [9].This study also investigates the explained machine learning models and theirexamined data types. It has been turned out that neural networks, tree ensemblesand support vector machine are widely explained with tabular data, targetingthe model explanation category. Yet, the deep neural networks explanation aremostly provided with image data for the purpose of outcome explanation and model inspection . Solving the transparent box design problems, decision ruleswith tabular data are underlined in the majority of literatures. Brief Review of Explainable Artificial Intelligence 5

As an example of outcome explanation in practice, we pinpoint our very recentwork in [3]. This paper exploits two neural networks in the classification task ofmultimodal affect computation over two datasets. Tabular time series data ofthese datasets are extracted from different wearable sensors such as electrodermalactivity (EDA) and learnt by the networks. Accordingly, the networks detect thehuman state of mind for an individual in each dataset. To explain why a specificstate is detected, two concepts of

Contextual Importance (CI) and

ContextualUtility (CU) , introduced by Fr¨amling [5], are employed. CI explores how important each sensor (feature) is for a given outcome, i.e.how much the outcome can change depending on the feature value. CU indicatesto what extent the feature value of the studied instance contributes to a highoutput value, i.e. in a classification task how typical the current value is for theclass in question. These concepts have been primarily initiated in the contextof Multiple Criteria Decision Making (MCDM), where decisions are establishedon a consensus between different stakeholders preferences. According to [6], CI and CU are theoretically correct from the Decision Theory point of view. Themathematical formulations of these concepts are as following [5]: CI = Cmax x ( C i ) − Cmin x ( C i ) absmax − absmin (1) CI = y ij − Cmin x ( C i ) Cmax x ( C i ) − Cmin x ( C i ) (2)Where C i is the i th context (specific input of black-box), y ij is the valueof j th output (class probability) with respect to the context C i , Cmax x ( C i )and Cmin x ( C i ) are the maximum and minimum values indicating the range ofoutput values observed by varying each attribute x of context C i , absmax and absmin are also the maximum and minimum values indicating the range of j thoutput (the class probability value).Representing the outcome explanation , Figure 2 shows the EDA variation forthe specific instance ( C i ) of WESAD [14], one of the examined datasets in [3].With respect to this instance, the affective state of an individual has been detectedas ‘‘meditation’’. As it could be inferred, the absmin and absmax have beenranged in [0 − Cmin and

Cmax values are located somewhere withinthis range, indicating the importance portion of EDA sensor in the process ofdetecting ‘‘meditation’’ state. Considering the EDA sensor’s utility, y ij presentsa high value within the range of Cmin and

Cmax for the context of C i . Looking into explanation from a multidisciplinary and theoretical perspective,we highlight Miller’s work in [12]. He scrutinizes explanation as two processes ofcognitive and social. While the cognitive process determines a subset of identified

N. Fouladgar and K. Fr¨amling

Fig. 2:

Cmin and

Cmax values for input variation in EDA [3]causes, the social process mainly aims at transferring knowledge between theexplainee and the explainer. Considering explanation as a process and product,the explanation resulting from the cognitive process is appointed to a product.In this paper [12], one of the key findings from the cognitive science perspectivein XAI has been noted as contrastive explanation . Such explanation is in congruentwith how people explain the causes of events in their daily life. In fact, the maingoal is to answer: ‘‘Why P rather than Q ?”. P refers to a fact did occur and Q is a foil did not occur, yet P and Q follow a similar history. Emphasizing onthe why-question, the “within object” differences are mainly under the focusof contrastive explanation . However, the question could be designed to coverthe ‘‘between objects” or ‘‘within objects over time” differences. In the lattercases, the questions are posed by ‘‘Why does object a have property P , whileobject b has property Q ?” or ‘‘Why does object a have property P at time t Q at time t contrastive explanation is more applicable than providing thewhole chain of causes to the explainee [12]. Specifying three steps of generatingcauses, selecting and evaluating, we illustrate the contrastive explanation processin Figure 3. In the first step, it has been argued that counterfactuals are generatedby applying some heuristics. Abnormality, intentionality, time and controllabilityof events/actions are among these heuristics. In the second step, some criteriaare explored to clarify which causes should be selected. Necessity, sufficiencyof causes and robustness to changes are a few criteria people typically devotein selecting explanations. Finally, evaluations of provided explanations shouldbe devised concisely. It has been highlighted that people accept more likely theexplanations consistent with their prior knowledge. Furthermore, people labelexplanations good according to the truth of causes. However, the latter measurecould not necessarily provide the best explanation. Other measures such assimplicity, relevancy and the goal of explanation could influence the evaluationsas well.Shifting to the social process, the causal explanations takes the form of con-versation [12]. In this process, the communication problem between two rolesof ‘‘explainer” and ‘‘explainee” matters and accordingly some rules undertake Brief Review of Explainable Artificial Intelligence 7

Fig. 3: Contrastive explanation processes: cognitive science perspectivethe protocols of such interaction. Specifically, basic rules follow the so-called

Grice’s maxims principles [8]: quality, quantity, relation and manner. Clearlyspeaking, the rules are performed on saying what is believed (quality), as muchas necessary (quantity), what is relevant (relation) and in a nice way (manner)to construct a conversational model. In an extension of this model, explanationsare presumed arguments in the logical contexts. It has been thought that aswell as explaining causes, there should be the potentiality of defending claims.The idea comes from an argument’s main components: causes (support) andclaims (conclusion). Following the arguments as explanation, a formal explana-tion dialogue with argumentation framework and the rules of shifting betweenexplainee and explainer have been pointed out in [12]. One of the advantagesof the conversational model of explanation and its extension is their generality,implying for applying in any algorithmic decisions. Being language-independent,these models could be compatible with visual representations of explanation toboth explainee questions and explainer answers. To provide how the explanationhas been discussed in social perspective, we show the related schematic diagramin Figure 4. Fig. 4: Social science perspective of explanation

Generally, the literatures discuss concrete categories and formalizations foropening black-box problems, different tools, explained black-boxes and the applied

N. Fouladgar and K. Fr¨amling data types. However, there is no explicit categorization of the explained modelstargeting different types of users such as experts vs. non-experts. Therefore, webelieve that a comprehensive review of such models adds values to the XAIresearch communities.In the exemplified work [3], the CI concept mainly addresses the varyingoutput range of specific variable as a measure of feature importance. Thereare several cases in multimodal affect computing applications that a particularvariable is influenced by other variables of domain. In other words, the currentversion of CI concept does not consider the casual relationships between endoge-nous variables. By addressing this gap, one could provide a more realistic androbust black-box explanation to the end-users. Another gap refers to the fact thatthe CI and CU concepts are not timely dynamic and theoretically well-matured.Therefore, other future directions could be devoted to a timely-aware CI and CU as well as their intersection with the cognitive and social science theories,respectively. As Miller [12] argues the socially interactive explanation, it couldalso be worthwhile to conduct a user study and investigate the impact of theseconcepts in a social interaction. References

1. Altmann, A., Tolo¸si, L., Sander, O., Lengauer, T.: Permutation importance: acorrected feature importance measure. Bioinformatics (10), 1340--1347 (2010)2. Du, M., Liu, N., Hu, X.: Techniques for interpretable machine learning. Communi-cations of the ACM (1), 68–77 (2019)3. Fouladgar, N., Alirezaie, M., Fr¨amling, K.: Decision explanation: Applying contex-tual importance and contextual utility in affect detection. In: Italian Workshop onExplainable Artificial Intelligence, XAI.it 2020. AI*IA SERIES, vol. 2742, p. 1–13(2020)4. Fouladgar, N., Fr¨amling, K.: A novel lstm for multivariate time series with massivemissingness. Sensors , 2832 (2020)5. Fr¨amling, K.: Explaining results of neural networks by contextual importance andutility. In: the AISB’96 conference. Citeseer (1996)6. Fr¨amling, K.: Decision theory meets explainable ai. In: Explainable, TransparentAutonomous Agents and Multi-Agent Systems. pp. 57--74. Springer (2020)7. Freitas, A.A.: Comprehensible classification models: A position paper. ACMSIGKDD Explorations Newsletter (1), 1–10 (2014)8. Grice, H.: Logic and conversation. Syntax and Semantics pp. 41--58 (1975)9. Guidotti, R., Monreale, A., Ruggieri, S., Turini, F., Giannotti, F., Pedreschi, D.:A survey of methods for explaining black box models. ACM Computing Surveys (5) (2018). https://doi.org/10.1145/323600910. Kelvin, X., Ba, J., Kiros, R., Cho, K., Courville, A., Salakhudinov, R., Zemel,R., Bengio, Y.: Show, attend and tell: Neural image caption generation withvisual attention. In: Proceedings of the 32nd International Conference on MachineLearning. Proceedings of Machine Learning Research, vol. 37, pp. 2048--2057 (2015)11. Lapuschkin, S., Binder, A., Montavon, G., Klauschen, F., M¨uller, K.R., Samek, W.:On pixel-wise explanations for non-linear classifier decisions by layer-wise relevancepropagation. PLoS ONE , e0130140 (2015) Brief Review of Explainable Artificial Intelligence 912. Miller, T.: Explanation in artificial intelligence: Insights from the social sciences.Artificial Intelligence , 1–38 (2019)13. Ribeiro, M., Singh, S., Guestrin, C.: “why should i trust you?”: Explaining thepredictions of any classifier. In: In Proceedings of the ACM SIGKDD InternationalConference Knowledge Discovery and Data Mining. pp. 97--101 (2016)14. Schmidt, P., Reiss, A., Duerichen, R., Marberger, C., Van Laerhoven, K.: Intro-ducing wesad, a multimodal dataset for wearable stress and affect detection. In:International Conference on Multimodal Interaction (ICMI). p. 400–408. ACM(2018)15. Simonyan, K., Vedaldi, A., Zisserman, A.: Deep inside convolutional networks:Visualising image classification models and saliency maps. In: In Proceedings ofthe ICLR Workshop (2014)16. Vandewiele, G., Janssens, O., Ongenae, F., Turck, F., Hoecke, S.V.: Genesim:genetic extraction of a single, interpretable model. ArXiv abs/1611.05722abs/1611.05722