[PDF] bLIMEy: Surrogate Prediction Explanations Beyond LIME

Abstract

Surrogate explainers of black-box machine learning predictions are of paramount importance in the field of eXplainable Artificial Intelligence since they can be applied to any type of data (images, text and tabular), are model-agnostic and are post-hoc (i.e., can be retrofitted). The Local Interpretable Model-agnostic Explanations (LIME) algorithm is often mistakenly unified with a more general framework of surrogate explainers, which may lead to a belief that it is the solution to surrogate explainability. In this paper we empower the community to "build LIME yourself" (bLIMEy) by proposing a principled algorithmic framework for building custom local surrogate explainers of black-box model predictions, including LIME itself. To this end, we demonstrate how to decompose the surrogate explainers family into algorithmically independent and interoperable modules and discuss the influence of these component choices on the functional capabilities of the resulting explainer, using the example of LIME.

Full PDF

bbLIMEy:Surrogate Prediction Explanations Beyond LIME

Kacper Sokol

Department of Computer ScienceUniversity of BristolBristol, United Kingdom

[email protected]

Alexander Hepburn

Department of Engineering MathematicsUniversity of BristolBristol, United Kingdom [email protected]

Raul Santos-Rodriguez

Department of Engineering MathematicsUniversity of BristolBristol, United Kingdom [email protected]

Peter Flach

Department of Computer ScienceUniversity of BristolBristol, United Kingdom

[email protected]

Abstract

Surrogate explainers of black-box machine learning predictions are of paramountimportance in the ﬁeld of eXplainable Artiﬁcial Intelligence since they can beapplied to any type of data (images, text and tabular), are model-agnostic and arepost-hoc (i.e., can be retroﬁtted). The Local Interpretable Model-agnostic Explana-tions (LIME) algorithm is often mistakenly uniﬁed with a more general frameworkof surrogate explainers, which may lead to a belief that it is the solution to surrogateexplainability. In this paper we empower the community to “build LIME yourself”(bLIMEy) by proposing a principled algorithmic framework for building customlocal surrogate explainers of black-box model predictions, including LIME itself.To this end, we demonstrate how to decompose the surrogate explainers family intoalgorithmically independent and interoperable modules and discuss the inﬂuenceof these component choices on the functional capabilities of the resulting explainer,using the example of LIME.

Local Interpretable Model-agnostic Explanations (LIME) [10] is a popular technique for explainingpredictions of black-box machine learning models. It greatly improves on surrogate explanations [1]by introducing interpretable data representations , hence making it applicable to image and text data inaddition to tabular data. Images can be represented as a collection of super-pixels and translated intoa binary on/off vector indicating which super-pixel stays the same and which is occluded (removed).Equivalently, text can be represented as a bag of words translated into a similar on/off binary vector.Furthermore, LIME can be used as a fast approximation of SHapley Additive exPlanations (SHAP)[8] as the latter is computationally expensive – the cost of providing various guarantees for theproduced explanations. However, the adoption of LIME is limited since it is provided as a monolithicexplainability tool with little room for customisation when, in reality, it is just one possible realisationof the highly modular surrogate explanations framework. We argue that allowing the user to makeinformed choices and build a custom surrogate explainer that is designed for a speciﬁc task cangreatly improve the quality of produced explanations, therefore warrant a wider adoption of surrogateexplanations. a r X i v : . [ c s . L G ] O c t ince surrogate explanations are model-agnostic, they can be applied to any predictive system, hencehave a high impact in the ﬁeld of eXplainable Artiﬁcial Intelligence if they become accessible,accountable and accurate. According to the “no free lunch” theorem, a single solution can neverperform better than all the other approaches across the board. However, by allowing the user to takeadvantage of surrogate explanations’ modularity, therefore customising them for a problem at hand,we can achieve the best possible explanation that the surrogate explainers family can offer. We canfurther improve the quality of the explanations by educating the users on the possible componentchoices, their properties, inﬂuence on the explanations, advantages and caveats.To the best of our knowledge, LIME is the only available surrogate explainability tool and modifyingits default behaviour often requires tinkering with LIME’s source code what may discourage someof the practitioners. We address these shortcomings by taking advantage of surrogate explanations’modularity to create a uniﬁed algorithmic framework for building this type of explainers, which wecall bLIMEy – b uild LIME y ourself. A range of possible algorithms can be used for each module –with the choices discussed in Section 3 – creating a suite of customisable surrogate explainers. Theirvarying capabilities and restrictions greatly inﬂuence the resulting surrogate explainer, thereforeeach of them should be accompanied by a critical discussion and usage suggestions. To this end, wehave decomposed the surrogate explanation framework into independent algorithmic components.We implemented a choice of algorithms for each of them in Python under the BSD 3-Clause opensource licence, therefore allowing for their commercial use. Our implementation is accompanied by a“how-to” guide outlining how to compose custom surrogate explainers and discussing pros and consof selected component choices. It is also capable of recreating the LIME algorithm for tabular, imageand text data in a way that mitigates most of its issues reported in the literature [2, 5, 7].Our research on surrogate explanations [1] was inspired by manuscripts investigating the instabilityand sources of randomness [2, 5] in LIME explanations [10], which could not pinpoint the root causeof this undesired and detrimental behaviour. Laugel et al. [7] attempted to “ﬁx” LIME for tabulardata by replacing its sampling method with an explicitly local sampler, however their experimentsused LIME with disabled discretisation (responsible for generating interpretable data representation),therefore unintentionally compromising the integrity of the algorithm making the two methodsincomparable and the improvements not applicable to more general cases beyond the speciﬁc onespresented in their research. Henin and Le Métayer [4] introduced a uniﬁed (theoretical) frameworkthat allows for systematic comparison of black-box explainers by characterising them alongsidetwo dimensions: data sampling and explanation generation . In contrast, our (practical) approach isfocused on algorithmic and implementation aspects of the surrogate subset of the black-box explainersfamily and extends Henin and Le Métayer’s decomposition with a third dimension – interpretablerepresentation – therefore bridging the gap between LIME and surrogate explanations [1]. Before delving into the bLIMEy framework we encourage the reader to consult Appendix A1 foran overview of the LIME explainer architecture. The bLIMEy framework decomposes surrogateexplanations of a black-box model prediction for a selected data point into three distinct steps:

Interpretable Data Representation

Transformation (possibly bidirectional) from the original datadomain (i.e., feature space) into an interpretable domain (and back). This step is optional fortabular data but required for image and text data. Interpretable domains tend to be binaryvectors encoding presence or absence of human-comprehensible characteristic in the data.

Data Sampling

Data augmentation (sampling) in the neighbourhood of the data point selected to beexplained. For images and text data sampling must be performed in the interpretable domainwhile for tabular data it can be done in either of the domains. Next, the sampled data mustbe predicted with the black-box model. If the data were sampled in the interpretable domain,they need to be reverted back to the original representation to complete this task.

Explanation Generation

An inherently interpretable model is trained on the locally sampled data,which is used to explain the selected data point. If an interpretable (binary) representationis used, XNOR can be applied between the selected data point and the sampled data https://fat-forensics.org/how_to/transparency/tabular-surrogates.html Produced with LIME’s ofﬁcial open source implementation: https://github.com/marcotcr/lime . if the same and if different.

2o focus the local model on presence and absence of these interpretable characteristics.The interpretable features for which the value of the selected data point is can be safelyremoved to reduce the dimensionality. This task enforces the locality of the sampled data andintroduces sparsity to the explanation . Sparsity can be enforced even further by applying adimensionality reduction technique. To further enforce the locality of the explanation, whentraining the local model the sampled data can be weighted based on a kernelised distancebetween the chosen data point and the sampled data (in either representation). When building a surrogate explainer every module choice may limit its overall functionality and therange of algorithms supported by other modules. Here, we discuss (often unintended) consequencesof choosing a particular algorithm for each module and exemplify how such a choice has affected theexplainer using the example of LIME. We support our discussion with an empirical comparison ofvarious data samplers (Appendix A2) and an illustration of how a tree-based surrogate improves overa linear one, i.e., LIME, for tabular data without an interpretable representation (Appendix A3).To avoid unnecessary randomness affecting the explanations produced by surrogate explainers, thedata transformation from their original domain X into an interpretable representation X (cid:48) must be bijective – the mapping from X to X (cid:48) has to be a one-to-one correspondence – and it must havea corresponding and uniquely deﬁned inverse function – a data point in X (cid:48) can be translated intoa unique data point in X . In LIME, the interpretable representation for both image and text datasatisﬁes these two requirements. A sentence can be easily represented as a binary vector indicatingpresence or absence of unique words in that sentence and such a binary vector can be transformedinto (bag-of-words representation of) a sentence. Similar reasoning applies to images where a binaryvector indicates whether a super-pixel (large, non-overlapping chunks of an image) in an image shouldhave the same pixel values as the original image or be occluded with an arbitrary patch or a solidcolour. For tabular data, an interpretable representation is achieved by discretisation and binarisation(one-hot encoding), in which case bijection is preserved but the inverse function is ill-deﬁned. Whilethe binarisation of categorical features (there is no need for discretisation) is invertible, numericalfeatures that have been discretised by binning (and binarised) cannot be uniquely reconstructed intotheir original representation X . LIME resolves this by sampling from a normal distribution (withclipping at bin boundaries) ﬁtted to each numerical bin for each sampled data point to reconstruct itin the original domain ( X ) – the unidentiﬁed source of randomness reported by Fen et al. [2].This undesired behaviour for tabular data is a consequence of transforming the data into theirinterpretable representation X (cid:48) ﬁrst and then sampling. While such order is required for image andtext data – it would be meaningless to sample from a grid of pixels or a sequence of charactersrespectively – it is not compulsory for tabular data, therefore providing an opportunity to avoidthe “reverse sampling” step. By sampling ﬁrst (in X ) and transforming the tabular data into aninterpretable representation later ( X (cid:48) ), the inversion of the latter step is no longer required sinceboth of the representations are available. This order of operations, however, requires the samplingprocedure to be “as much local as possible” since sampling from the interpretable domain implicitlyintroduces the locality . While any sort of sampling should sufﬁce in the interpretable domain (aslong as the XNOR ﬁltering is performed – see the next paragraph) a local sampling method, e.g.,MixUp [12] or Growing Spheres [6], is required when it is performed in the original data domain (fortabular data) – see the results shown in Appendix A2. Furthermore, sampling in this domain requiresthe sampling method to produce data points that are assigned more than one class (or signiﬁcantlydifferent class probabilities for probabilistic models) by the black-box model, otherwise a meaningful local surrogate model cannot be ﬁtted. Given the random nature of the sampling procedure, the onlyway to ensure reproducible explanations is to always have the same local sample, which can only beachieved by ﬁxing the random seed.While reducing the dimensionality of the interpretable domain for image and text data is detrimentalfor the explanation – e.g., “black holes” in images and missing words in sentences – it is recommended The XNOR operation and dropping -valued interpretable features do not affect image and text data as theselected data point is always represented as an all- binary vector in the interpretable domain. For text data this is sampling from within the same sentence by leaving the words intact or removing themand for images this is modifying the original image by occluding its parts. For tabular data LIME achieves thesample locality in this step by applying the XNOR operation explained in the next paragraph. . This operation is equivalentto keeping only the categorical feature values and numerical feature bins in which the explaineddata point resides (the aforementioned implicit locality for tabular data). For example, for twonumerical features ( x , x ) and their interpretable representations ( x < , ≤ x < , ≤ x ) and ( x < , ≤ x ) and an explained data point (4 , – (0 , , , , in the interpretable representation– only ≤ x < and ≤ x dimensions would be preserved. This step combined with transformingthe interpretable feature space by applying the XNOR operation between the explained data pointand the sampled data is required for proper functioning of LIME since its goal is to explain aprediction by quantifying the (positive or negative) effect of changing any of the interpretable featureranges within which the explained data point resides, hence the choice of a linear model for the localsurrogate and the use of its coefﬁcients as the explanation. Therefore, the question that the LIMEexplanation tries to answer is whether for the current black-box classiﬁcation of the selected datapoint the value of x between and has a positive or a negative effect when compared to x beingoutside of this range. Similarly, for a particular classiﬁcation of an image or a sentence, does a givensuper-pixel or a word have a positive or a negative effect, i.e., would removing this super-pixel orword change the black-box classiﬁcation outcome. Thereby using LIME for tabular data withoutan interpretable representation forfeits the locality of the sample introduced by applying the XNORfeature transformation in which case the explainer relies purely on kernelised distance weighting andnormal data sampling around the explained data point (albeit with sampling variance for each featurecalculated based on the whole data set, which decreases the locality effect) to induce the explanationlocality. An interpretable representation for tabular data may be skipped altogether in which caseother modules of the surrogate explainer, like the data sampler (cf. Appendix A2), should guaranteethe locality of an explanation.The surrogate model can be trained in a number of different ways: as a regressor of the probabilitiesoutputted by the black-box model (as in LIME), in which case it has to model (explain) a singleclass selected by the user; or as a classiﬁer when the black-box model is a thresholded probabilisticmodel or a classiﬁer. Another choice is the training scheme: the surrogate model can either betrained as a multi-class or one-vs-rest predictor, with the latter approach being required for surrogateregressors of black-box probabilistic predictors. The choice of a surrogate model is also important;if local feature importance (or interpretable feature inﬂuence) is desired, a linear model is a goodpick as long as all of the features are normalised to the same range (LIME satisﬁes this by using theinterpretable binary representation or otherwise explicitly normalising the features) and these featuresare “reasonably” independent. A lack of normalisation causes the feature weights to be incomparable ,therefore rendering the explanation uninformative. While explainability of linear models is limited tofeature importance, a different type of an explanation – logical conditions outlining the behaviourof a black-box model in the neighbourhood of the selected data point – can be generated with asurrogate decision tree (cf. Appendix A3). The selection here should be motivated by the desired typeof the explanation – e.g., “Why class A?” vs “Why class A and not B?” – and its format – a featureimportance bar plot vs a conjunction of logical conditions – which are problem-dependant. In this paper we introduced bLIMEy: a modular algorithmic framework for building custom surrogateexplainers of black-box predictions. We discussed dangers associated with algorithmic choices foreach of its modules and showed how to avoid common pitfalls. bLIMEy is accompanied by an opensource implementation that includes a selection of algorithms for every module of the framework,therefore empowering the community to build surrogate explainers customised to the task at hand.In the future we will investigate the behaviour of various surrogate models in high-dimensional spacesand design a range of metrics to measure the quality and stability of surrogate explanations. We willprovide one measure for each of the following three competing objectives: (1) local approximation ofthe closest decision boundary; (2) the ability to mimic the black-box model locally; and (3) the globalfaithfulness of the local surrogate model, which will engender trust in the explanations and mitigatethe need for user studies, which lack universally agreed objective and often entail conﬁrmation bias.4 eferences [1] Mark Craven and Jude W Shavlik. Extracting tree-structured representations of trained networks. In

Advances in neural information processing systems , pages 24–30, 1996.[2] Hui Fen, Kuangyan Song, Madeilene Udell, Yiming Sun, Yujia Zhang, et al. Why should you trust myinterpretation? understanding uncertainty in lime predictions. arXiv preprint arXiv:1904.12991 , 2019.[3] Ronald A Fisher. The use of multiple measurements in taxonomic problems.

Annals of eugenics , 7(2):179–188, 1936.[4] Clement Henin and Daniel Le Métayer. Towards a generic framework for black-box explanations ofalgorithmic decision systems.

IJCAI 2019 Workshop on Explainable Artiﬁcial Intelligence (XAI) , 2019.[5] Himabindu Lakkaraju, Ece Kamar, Rich Caruana, and Jure Leskovec. Faithful and customizable explana-tions of black box models. In

Proceedings of the 2019 AAAI/ACM Conference on AI, Ethics, and Society .ACM, 2019.[6] Thibault Laugel, Marie-Jeanne Lesot, Christophe Marsala, Xavier Renard, and Marcin Detyniecki.Comparison-based inverse classiﬁcation for interpretability in machine learning. In

Information Pro-cessing and Management of Uncertainty in Knowledge-Based Systems. Theory and Foundations , pages100–111. Springer International Publishing, 2018. ISBN 978-3-319-91473-2.[7] Thibault Laugel, Xavier Renard, Marie-Jeanne Lesot, Christophe Marsala, and Marcin Detyniecki. Deﬁninglocality for surrogates in post-hoc interpretablity. , 2018.[8] Scott M Lundberg and Su-In Lee. A uniﬁed approach to interpreting model predictions. In I. Guyon, U. V.Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, editors,

Advances in Neural In-formation Processing Systems 30 , pages 4765–4774. Curran Associates, Inc., 2017. URL http://papers.nips.cc/paper/7062-a-unified-approach-to-interpreting-model-predictions.pdf .[9] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer,R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay.Scikit-learn: Machine learning in Python.

Journal of Machine Learning Research , 12:2825–2830, 2011.[10] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. "why should I trust you?": Explaining thepredictions of any classiﬁer. In

Proceedings of the 22nd ACM SIGKDD International Conference onKnowledge Discovery and Data Mining, San Francisco, CA, USA, August 13-17, 2016 , pages 1135–1144,2016.[11] Kacper Sokol, Raul Santos-Rodriguez, and Peter Flach. FAT Forensics: A Python Toolbox for AlgorithmicFairness, Accountability and Transparency. arXiv preprint arXiv:1909.05167 , 2019.[12] Hongyi Zhang, Moustapha Cisse, Yann N. Dauphin, and David Lopez-Paz. mixup: Beyond empirical riskminimization.

International Conference on Learning Representations , 2018. URL https://openreview.net/forum?id=r1Ddp1-Rb . ppendix All of the results presented in this appendix can be reproduced with a Jupyter Notebook hosted in thebLIMEy’s GitHub repository . This notebook can be executed online using Binder by followingthe URL placed in its top cell. The experiments were done using FAT Forensics [11] – an opensource package implementing various fairness, accountability and transparency algorithms. Thedescription of each function used for these experiments can be found in the API documentation ofFAT Forensics . A1: LIME Overview

Before discussing the LIME algorithm we examine the concept of LIME’s interpretable data rep-resentations. For text data the interpretable representation is achieved by encoding a sentence as abag of words and representing it as a binary vector, which indicates presence and absence of uniquewords. Such interpretable word vectors can then be represented as sentences by removing wordsfor which the value in this binary vector has been changed to . The interpretable representation forimages is generated by dividing images into super-pixels – non-overlapping segments – each oneencompassing a part of the image that represents a concept (e.g., an object) meaningful to humans.Such interpretable image vectors can be then represented as images by occluding the super-pixels forwhich the value in this binary vector has been set to . Finally, tabular data can be transformed into theinterpretable representation by one-hot encoding the categorical features and binning the numericalones. Such interpretable tabular data vectors can be then represented in the original feature domainby altering the feature values accordingly to the changes made in the binary vector. This importantconcept of interpretable data representations enables LIME to explain data (i.e., raw features suchas pixel values for images) or their representations (internally used by black-box predictive models,such as high-dimensional word embeddings) that are inherently human-incomprehensible.Therefore, for a data point x ∈ X to be explained and an arbitrary black-box probabilistic predictivemodel f : X → Y the default LIME algorithm proceeds as follows:1. Find the human interpretable representation x (cid:48) ∈ X (cid:48) of the data point x chosen to beexplained, where X (cid:48) denotes the interpretable domain.2. Sample data from the interpretable domain X (cid:48) . For image and text data this is done byuniformly replacing ’s in x (cid:48) with values from { , } set to get new data points in the“neighbourhood” of x , e.g., by randomly occluding super-pixels in an image or removingwords from a sentence. Tabular data is ﬁrst discretised into a representation where categoricalfeatures are left unchanged and each numerical feature is transformed into a categoricalfeature that indicates the numerical bin (interpretable representation) to which this featurebelongs, e.g, ˆ x = 1 for { ( −∞ , . , [0 . , . , [1 . , ∞ ) } bins indicates x ∈ [0 . , . .This representation ( (cid:98) X ) is used for tabular data sampling to avoid assigning a sample totwo different bins for a single feature, what could have happened had the sampling beenperformed in the binary representation ( X (cid:48) ) – where each bin for each feature is representedas a separate binary feature, e.g., ( x (cid:48) − , x (cid:48) − , x (cid:48) − ) sampled in a binary domain andresulting in a (0 , , vector would give x ∈ [0 . , . and x ∈ [1 . , ∞ ) . After thesampling step the tabular data is transformed into the binary interpretable representation X (cid:48) .3. Invert the representation of the sampled data from X (cid:48) to X and predict their probability fora selected class c with the black-box model f . Usually, c is selected to be the class assignedto the explained data point x by the black-box model f .4. Drop all of the interpretable features for which the value of the explained data point (inthe interpretable domain) is , therefore creating a new representation (cid:101) X – this introducessparsity of the explanation and enforces its locality (see Section 3 for more details).5. Calculate the distances between the sampled data and the explained data point in (cid:101) X andkernelise them (using the exponential kernel) to serve as proximity/similarity scores used to https://github.com/So-Cool/bLIMEy/tree/master/HCML_2019 https://mybinder.org Application Programming Interface https://fat-forensics.org/api.html (cid:101) X of the explainedinstance ˜ x and all of the sampled data points ˜ x sampled to create a data set (cid:101) X XNOR whichprescribes the effect of change of a data point in the interpretable domain on its classiﬁcationoutcome (see Section 3 for more details).7. Use K-LASSO to further limit the number of features used in the explanation and traina linear regression on (cid:101) X XNOR using the black-box predictions computed before as thetarget (i.e., probabilities of the previously selected class c ). The coefﬁcients of this model(feature weights) are used to interpret the (positive or negative) importance of each human-comprehensible feature.For more details please consult the LIME manuscript [10] and its ofﬁcial open source implementa-tion . A2: Comparison of Data Samplers for Tabular Data

To show the behaviour of different data sampling methods we use the Iris data set [3]. We plotthe data alongside two dimensions – sepal length (cm) on the x-axis and sepal width (cm) on they-axis – to facilitate easy visual comparison. The three colours visible in the plots represent the threeclasses of the Iris data set: setosa, virginica and versicolor. The data set plotted alongside these twofeatures with the markers colour-coded based on the ground truth annotation is shown in Figure 1.The background shading in this plot indicates the decision boundary of the underlying black-boxmodel (a Random Forest Classiﬁer trained with scikit-learn [9]). s e p a l w i d t h ( c m ) setosaversicolorvirginica Figure 1: The Iris data set plotted alongside sepal length (cm) on the x-axis and sepal width (cm) on they-axis with the markers colour-coded based on the ground truth annotation. The background shadingrepresents the decision boundary of the underlying black-box model (a random forest classiﬁer).We initialise each of the samplers with the full Iris data set and generate 150 data points around theselected instance: the black dot. Figure 2 shows the importance of choosing an appropriate samplingmethod when building a surrogate explainer. Some of the data samplers may have difﬁculties locatingthe closes decision boundary (see, for example, Figure 2a and 2b) when the selected data point is farfrom any decision boundary of the black-box model (which may be common in high-dimensionalspaces due to the curse of dimensionality), therefore generating data for which ﬁtting a local surrogatemodel may not be possible. Another important aspect of the sampled data is their clear classimbalance, which needs to be accounted for during the training procedure of the surrogate model. https://github.com/marcotcr/lime s e p a l w i d t h ( c m ) setosaversicolorvirginicaselected data point (a) Normal. (b) Truncated normal. (c) MixUp. (d) Normal Class Discovery. Figure 2: The effect of different sampling algorithms on the locality of the sample for the Iris data setplotted along sepal length (cm) on the x-axis and sepal width (cm) on the y-axis. The black dot is theexplained data point for which the sample is generated. Red, blue and green dots are the predictions(the three classes of the Iris data set) assigned by the underlying black-box model (cf. Figure 1).

A3: Decision Tree-based Surrogate Explainer for Tabular Data

To show the importance of selecting a good surrogate model and the difference in explanations that itcan produce we explain a carefully selected data point from the two moons data set. The two moonsdata set – shown in Figure 3 and generated with scikit-learn – is a synthetic 2-dimensional, binaryclassiﬁcation data set with a complex decision boundary. It is suitable for this type of experiments asdepending on which data point is chosen the resulting explanations can be quite diverse.Figures 4 and 5 show two variants of surrogate explainers – based on a linear and a decision treemodel respectively – built for the data point marked with a black dot. It is clear that for complexdecision boundaries a tree-based approach is superior. In addition to better approximating the localdecision boundary (of the underlying black-box random forest classiﬁer), a tree-based local surrogategenerates locally-faithful interpretable representation from the feature splits learnt by the tree, whichcan be used to convey the explanation as a conjunction of logical conditions in high-dimensionalspaces where visualisations are impossible. https://scikit-learn.org/stable/modules/generated/sklearn.datasets.make_moons.html .0 0.2 0.4 0.6 0.8 1.00.00.20.40.60.81.0 0.000.150.300.450.600.750.901.05 Figure 3: The two moons data set with 1000 samples. The colour of the marker indicates theground truth label of each data point and the background shading depicts the decision boundary ofthe underlying black-box model (a random forest classiﬁer). Since the model is probabilistic thecolour-bar (placed to the right of the plot) provides a legend for the predicted probabilities of the blueclass. The black dot represents the data point that will be explain with local surrogates (see Figures 4and 5).

Figure 4: Linear surrogate explainer for the selected data point (black dot) – equivalent to LIMEwithout discretisation. The background shading represents the predicted value from the local ridgeregression model (the value encoding is given in the colour-bar). The local regression model is trainedto predict the probability of belonging to the blue class (outputted by the black-box model), hence thepredicted values may be outside of the expected [0 , range. If the surrogate’s threshold is set at . ,the yellow bar would be partially predicted as blue, therefore incorrectly classifying the upper leftpart of the red cloud of points. The importance of the x-axis feature is − . and the importanceof the y-axis feature is . . Since an interpretable representation was not used, these values do notcarry any particular meaning. 9 .0 0.2 0.4 0.6 0.8 1.00.00.20.40.60.81.0 0.5000.2780.0560.1670.3890.6110.8331.0561.2781.500 Figure 5: Decision tree-based surrogate explainer for the selected data point (black dot). Thebackground shading represents the predicted value from the local decision tree regression model (thevalue encoding is given in the colour-bar). The local decision tree model is trained to predict theprobability of belonging to the blue class (outputted by the black-box model). The green and lightgreen areas have high probability of the blue class, therefore giving a good approximation of thelocal decision boundary, which is fairly complex for the selected data point. The orange and yellowblocks have low probability of the blue class, therefore providing a precise approximation of the redclass. A possible explanation that can be derived from the local decision tree is: it is the blue class forthe x-axis feature ≤ . or the y-axis feature > . ; and it is the red class for the y-axis feature ≤ . and the x-axis feature bounded between (0 . , .528]