[PDF] The Intriguing Properties of Model Explanations

Abstract

Linear approximations to the decision boundary of a complex model have become one of the most popular tools for interpreting predictions. In this paper, we study such linear explanations produced either post-hoc by a few recent methods or generated along with predictions with contextual explanation networks (CENs). We focus on two questions: (i) whether linear explanations are always consistent or can be misleading, and (ii) when integrated into the prediction process, whether and how explanations affect the performance of the model. Our analysis sheds more light on certain properties of explanations produced by different methods and suggests that learning models that explain and predict jointly is often advantageous.

Full PDF

TThe Intriguing Properties of Model Explanations

Maruan Al-Shedivat

Carnegie Mellon University [email protected]

Avinava Dubey

Carnegie Mellon University [email protected]

Eric P. Xing

Carnegie Mellon University [email protected]

Abstract

Linear approximations to the decision boundary of a complex model have becomeone of the most popular tools for interpreting predictions. In this paper, we studysuch linear explanations produced either post-hoc by a few recent methods orgenerated along with predictions with contextual explanation networks (CENs).We focus on two questions: (i) whether linear explanations are always consistent orcan be misleading, and (ii) when integrated into the prediction process, whether andhow explanations affect performance of the model. Our analysis sheds more lighton certain properties of explanations produced by different methods and suggeststhat learning models that explain and predict jointly is often advantageous.

Model interpretability is a long-standing problem in machine learning that has become quite acute withthe accelerating pace of widespread adoption of complex predictive algorithms. There are multipleapproaches to interpreting models and their predictions ranging from a variety of visualizationtechniques [1–3] to explanations by example [4, 5]. The approach that we consider in this paperthinks of explanations as models themselves that approximate the decision boundary of the originalpredictor but belong to a class that is signiﬁcantly simpler (e.g., local linear approximations).Explanations can be generated either post-hoc or alongside predictions. A popular method, calledLIME [6], takes the ﬁrst approach and attempts to explain predictions of an arbitrary model bysearching for linear local approximations of the decision boundary. On the other hand, recentlyproposed contextual explanation networks (CENs) [7] incorporate a similar mechanism directly intodeep neural networks of arbitrary architecture and learn to predict and to explain jointly. Here, wefocus on analyzing a few properties of the explanations generated by LIME, its variations, and CEN.In particular, we seek answers to the following questions:1. Explanations are as good as the features they use to explain predictions. We ask whether andhow feature selection and feature noise affect consistency of explanations.2. When explanation is a part of the learning and prediction process, how does that affect perfor-mance of the predictive model?3. Finally, what kind of insight we can gain by visualizing and inspecting explanations?

We start with a brief overview of the methods compared in this paper: LIME [6] and CENs [7]. Givena dataset of inputs, x ∈ X , and targets, y ∈ Y , our goal is to learn a predictive model, f : X (cid:55)→ Y . Toexplain each prediction, we have access to another set of features, z ∈ Z , and construct explanations, g x : Z (cid:55)→ Y , such that they are consistent with the original model, g x ( z ) = f ( x ) . These additionalfeatures, z , are assumed to be more interpretable than x , and are called interpretable representation in [6] and attributes in [7]. Interpretable ML Symposium, 31st Conference on Neural Information Processing Systems (NIPS 2017), LongBeach, CA, USA. a r X i v : . [ c s . L G ] J a n .1 LIME and Variations Given a trained model, f , and an instance with features ( x , z ) , LIME constructs an explanation, g x ,as follows: g x = argmin g ∈G L ( f, g, π x ) + Ω( g ) (1)where L ( f, g, π x ) is the loss that measures how well g approximates f in the neighborhood deﬁnedby the similarity kernel, π x : Z (cid:55)→ R + , in the space of additional features, Z , and Ω( g ) is the penaltyon the complexity of explanation. Now more speciﬁcally, Ribeiro et al. [6] assume that G is the classof linear models: g x ( z ) := b x + w x · z (2)and deﬁne the loss and the similarity kernel as follows: L ( f, g, π x ) := (cid:88) z (cid:48) ∈Z π x ( z (cid:48) ) ( f ( x (cid:48) ) − g ( z (cid:48) )) , π x ( z (cid:48) ) := exp (cid:8) − D ( z , z (cid:48) ) /σ (cid:9) (3)where the data instance is represented by ( x , z ) , z (cid:48) and the corresponding x (cid:48) are the perturbed features, D ( z , z (cid:48) ) is some distance function, and σ is the scale parameter of the kernel. Ω( g ) is further chosento favor sparsity of explanations. LIME is a post-hoc model explanation method. This means that it justiﬁes model predictions byproducing explanations which, while locally correct, are never used to make the predictions in the ﬁrstplace. Contrary to that, CENs use explanations as the integral part of the learning process and makepredictions by applying generated explanations. Now more formally, CENs construct the predictivemodel f : X × Z (cid:55)→ Y via a composition: given x , an encoder, e θ : X (cid:55)→ G , produces an explanation g which is further applied to z to make a prediction. In other words: f ( x , z ) := g x ( z ) , where g x := e θ ( x ) (4)In [7] we introduced a more general probabilistic framework that allows to combine differentdeterministic and probabilistic encoders with explanations represented by arbitrary graphical models.To keep our discussion simple and concrete, here we assume that explanations take the same linearform (2) as for LIME and the encoder maps x to ( b x , w x ) as follows: b x := α θ ( x ) (cid:62) B, w x := α θ ( x ) (cid:62) W, where K (cid:88) k =1 α ( k ) θ ( x ) = 1 , ∀ k : α ( k ) θ ( x ) ≥ (5)In other words, explanation ( b x , w x ) is constrained to be a convex combination of K componentsfrom a global learnable dictionary, D := ( B, W ) , where the combination weights, α θ ( x ) , also called attention , are produced by a deep network. Encoder of such form is called constrained deterministicmap in [7] and the model is trained jointly w.r.t. ( θ, B, W ) to minimize the prediction error. Both LIME and CEN produce explanations in the form of linear models that can be further usedfor prediction diagnostics. Our goal is to understand how different conditions affect explanationsgenerated by both methods, see whether this may lead to erroneous conclusions, and ﬁnally understandhow jointly learning to predict and to explain affects performance.We use the following 3 tasks in our analysis: MNIST image classiﬁcation , sentiment classiﬁcationof the IMDB reviews [8], and poverty prediction for households in Uganda from satellite imageryand survey data [9]. The details of the setup are omitted in the interest of space but can be foundin [7], as we follow exactly the same setup. Linear explanation assign weights to the interpretable features, z , and hence strongly depend theirquality and the way we select them. We consider two cases where (a) the features are corrupted withadditive noise, and (b) selected features are incomplete. For analysis, we use MNIST and IMDB data. http://yann.lecun.com/exdb/mnist/ − −

10 0 10

SNR, dB T e s t e rr o r( % ) MNIST

CNNLIME- pxl

CEN- pxl

CEN- hog − −

10 0 10

SNR, dB

IMDB

LSTMLIME- bow

CEN- bow

CEN- tpc (a)

Feature subset size (%) T e s t e rr o r( % ) MNIST

CNNLIME- pxl

CEN- pxl

CEN- hog

Feature subset size (%)

IMDB

LSTMLIME- bow

CEN- bow

CEN- tpc (b)Fig. 1: The effect of feature quality on explanations. (a) Explanation test error vs. the level of the noise added tothe interpretable features. (b) Explanation test error vs. the total number of interpretable features.

We train baseline deep architectures (CNN on MNIST and LSTM on IMDB) and their CEN variants.For MNIST, z is either pixels of a scaled down image ( pxl ) or HOG features ( hog ). For IMDB, z iseither a bag of words ( bow ) or a topic vector ( tpc ) produced by a pre-trained topic model. The effect of noisy features.

In this experiment, we inject noise into the features z and ask LIMEand CEN to ﬁt explanations to the noisy features. The predictive performance of the producedexplanations on noisy features is given on Fig. 1a. Note that after injecting noise, each data point hasa noiseless representation x and noisy ˜ z . Since baselines take only x as inputs, their performancestays the same and, regardless of the noise level, LIME “successfully” overﬁts explanations—it isable to almost perfectly approximate the decision boundary of the baselines using very noisy features.On the other hand, performance of CEN gets worse with the increasing noise level indicating thatmodel fails to learn when the selected interpretable representation is low quality. The effect of feature selection.

Here, we use the same setup, but instead of injecting noise into z ,we construct ˜ z by randomly subsampling a set of dimensions. Fig. 1b demonstrates the result. Whileperformance of CENs degrades proportionally to the size of ˜ z , we see that, again, LIME is able to ﬁtexplanations to the decision boundary of the original models despite the loss of information.These two experiments indicate a major drawback of explaining predictions post-hoc : when con-structed on poor, noisy, or incomplete features, such explanations can overﬁt the decision boundary ofa predictor and are likely to be misleading. For example, predictions of a perfectly valid model mightend up getting absurd explanations which is unacceptable from the decision support point of view. In this part, we compare CENs with baselines in terms of performance. In each task, CENs aretrained to simultaneously generate predictions and construct explanations. Overall, CENs show verycompetitive performance and are able to approach or surpass baselines in a number of cases, especiallyon the IMDB data (see Table 1). This suggests that forcing the model to produce explanations alongwith predictions does not limit its capacity.

Epoch number . . . T r a i ne rr o r( % ) MNIST

CNNCEN- pxl

CEN- hog

Batch number

IMDB

LSTMCEN- tpc (a)

Training set size (%) V a li da t i one rr o r( % ) MNIST

CNNCEN- pxl

CEN- hog

20 40

Training set size (%)

IMDB

LSTMCEN- bow

CEN- tpc (b)Fig. 2: (a) Training error vs. iteration (epoch or batch) for baselines and CENs. (b) Validation error for modelstrained on random subsets of data of different sizes.

Additionally, the “explanation layer” in CENs affects the geometry of the optimization problem andcauses faster and better convergence (Fig. 2a). Finally, we train the models on subsets of data (thesize varied from 1% to 20% for MNIST and from 2% to 40% for IMDB) and notice that explanationsplay the role of a regularizer which strongly improves the sample complexity (Fig. 2b). We use Gaussian noise with zero mean and select variance for each signal-to-noise ratio level appropriately. able 1: Performance of the models on classiﬁcation tasks (averaged over 5 runs; the std. are on the order of theleast signiﬁcant digit). The subscripts denote the features on which the linear models are built: pixels ( pxl ),HOG ( hog ), bag-or-words ( bow ), topics ( tpc ), embeddings ( emb ), discrete attributes ( att ). MNIST IMDB SatelliteModel Err (%) Model Err (%) Model Acc (%) AUC (%) LR pxl . LR bow . LR emb . . LR hog . LR tpc . LR att . . CNN . LSTM . MLP . . MoE pxl . MoE bow . MoE . . MoE hog . MoE tpc . CEN . . CEN pxl . CEN bow (cid:63) . VCEN . . CEN hog . CEN tpc (cid:63) . (cid:63) Best previous results for similar LSTMs: . (supervised) and . (semi-supervised) [10]. Finally, we showcase the insights one can get from explanations produced along with predictions.Particularly, we consider the problem of poverty prediction for household clusters in a Uganda fromsatellite imagery and survey data. The x representation of each household cluster is a collectionof × satellite images; z is represented by a vector of 65 categorical features from livingstandards measurement survey (LSMS). The goal is binary classiﬁcation of households in Ugandainto poor and not poor. In our methodology, we closely follow the original study of Jean et al. [9] anduse a pretrained VGG-F network for embedding the images into a 4096-dimensional space on top ofwhich we build our contextual models. Note that this datasets is fairly small (642 points), and hencewe keep the

VGG-F frozen to avoid overﬁtting. We note that quantitatively, by conditioning on theVGG features of the satellite imagery, CENs are able to signiﬁcantly improve upon the sparse linearmodels on the survey features only (known as the gold standard in remote sensing techniques).After training CEN with a dictionary of size 32, we discover that the encoder tends to sharply selectone of the two explanations (M1 and M2) for different household clusters in Uganda (see Fig. 3aand also Fig. 4a in appendix). In the survey data, each household cluster is marked as either urbanor rural; we notice that, conditional on a satellite image, CEN tends to pick M1 for urban areasand M2 for rural (Fig. 3b). Notice that explanations weigh different categorical features, such asreliability of the water source or the proportion of houses with walls made of unburnt brick, quitedifferently. When visualized on the map, we see that CEN selects M1 more frequently around themajor city areas, which also correlates with high nightlight intensity in those areas (Fig. 3c,3d). Highperformance of the model makes us conﬁdent in the produced explanations (contrary to LIME asdiscussed in Sec. 3.1) and allows us to draw conclusions about what causes the model to classifycertain households in different neighborhoods as poor.

M1 M2

Water: UnreliableWater src: Public tapWalls: Unburnt bricksRoof: Thatch, StrawIs water payedVegetationHas electricityNightlight intensity − . − . . . . (a) . . . . . T i m e s m o d e l s e l ec t e d ( % ) M1M2Rural Urban . . . . HH t y p e : T e n e m e n t ( % ) M1M2 (b)

AruaGulu Kampala (capital)IgangaMasakaKasese

Uganda: Contextual Models

M1M2 (c)

AruaGulu Kampala (capital)IgangaMasakaKasese

Uganda: Nightlight Intensity (d)Fig. 3: Qualitative results for the Satellite dataset: (a) Weights given to a subset of features by the two models(M1 and M2) discovered by CEN. (b) How frequently M1 and M2 are selected for areas marked rural or urban(top) and the average proportion of Tenement-type households in an urban/rural area for which M1 or M2 wasselected. (c) M1 and M2 models selected for different areas on the Uganda map. M1 tends to be selected formore urbanized areas while M2 is picked for the rest. (d) Nightlight intensity of different areas of Uganda. eferences [1] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks:Visualising image classiﬁcation models and saliency maps. arXiv preprint arXiv:1312.6034 ,2013.[2] Jason Yosinski, Jeff Clune, Anh Nguyen, Thomas Fuchs, and Hod Lipson. Understandingneural networks through deep visualization. arXiv preprint arXiv:1506.06579 , 2015.[3] Aravindh Mahendran and Andrea Vedaldi. Understanding deep image representations by invert-ing them. In Proceedings of the IEEE conference on computer vision and pattern recognition ,pages 5188–5196, 2015.[4] Rich Caruana, Hooshang Kangarloo, JD Dionisio, Usha Sinha, and David Johnson. Case-basedexplanation of non-case-based learning methods. In

Proceedings of the AMIA Symposium , page212, 1999.[5] Been Kim, Cynthia Rudin, and Julie A Shah. The bayesian case model: A generative approachfor case-based reasoning and prototype classiﬁcation. In

Advances in Neural InformationProcessing Systems , pages 1952–1960, 2014.[6] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. Why Should I Trust You?: Explainingthe predictions of any classiﬁer. In

Proceedings of the 22nd ACM SIGKDD InternationalConference on Knowledge Discovery and Data Mining , pages 1135–1144. ACM, 2016.[7] Maruan Al-Shedivat, Avinava Dubey, and Eric P Xing. Contextual explanation networks. arXivpreprint arXiv:1705.10301 , 2017.[8] Andrew L Maas, Raymond E Daly, Peter T Pham, Dan Huang, Andrew Y Ng, and ChristopherPotts. Learning word vectors for sentiment analysis. In

Proceedings of the 49th Annual Meetingof the Association for Computational Linguistics: Human Language Technologies-Volume 1 ,pages 142–150. Association for Computational Linguistics, 2011.[9] Neal Jean, Marshall Burke, Michael Xie, W Matthew Davis, David B Lobell, and StefanoErmon. Combining satellite imagery and machine learning to predict poverty.

Science , 353(6301):790–794, 2016.[10] Rie Johnson and Tong Zhang. Supervised and semi-supervised text categorization using lstm forregion embeddings. In

Proceedings of The 33rd International Conference on Machine Learning ,pages 526–534, 2016. 5

Appendix

M1 M2

16 HH type: BQ15 Is water payed14 Water usage p/ day13 Dist. to water src.12 Num. of rooms11 Avg. dist. to road10 Avg. dist. to market09 Avg. vegetation dec.08 Avg. vegetation inc.07 Vegetation06 Avg. percipitation05 Avg. temperature04 Has generator03 Has electricity02 Is urban01 Nightlight intensity -0.9 -0.7-0.3 -0.3-0.1 0.10.3 0.4-0.3 -0.10.4 0.40.1 0.00.1 0.2-0.4 -0.2-0.1 0.4-0.2 0.3-0.0 -0.1-0.1 0.1-0.2 -0.8-0.0 -0.6-0.7 -0.7

M1 M2

32 Roof: Wood, Planks31 Roof: Tin30 Roof: Tiles29 Roof: Thatch, Straw28 Roof: Other27 Roof: Mud26 Roof: Iron sheets25 Roof: Concrete24 Roof: Asbestos23 HH type: Uniport22 HH type: Tenement21 HH type: Shared house20 HH type: Other19 HH type: Private house18 HH type: Private apt17 HH type: Hut -0.7 -0.4-0.2 0.0-0.6 -0.30.5 0.20.3 0.50.6 0.5-0.5 -0.4-0.5 -0.3-0.1 -0.3-0.0 -0.0-0.6 -0.6-0.6 -0.7-0.5 -0.6-0.2 -0.3-0.4 -0.70.3 0.5

M1 M2

48 Floor: Stone47 Floor: Other46 Floor: Mosaic/tiles45 Floor: Cow dung44 Floor: Earth43 Floor: Cement42 Floor: Bricks41 Walls: Stone40 Walls: Unburnt bricks39 Walls: Timber38 Walls: Thatch, Straw37 Walls: Other36 Walls: Mud, poles35 Walls: Cement blocks34 Walls: Brick w/ mud33 Walls: Brick w/ cement -0.9 -1.1-0.5 -0.5-0.9 -0.90.3 0.40.3 -0.1-0.8 -0.8-0.0 0.0-0.3 0.60.3 -0.20.0 -0.9-0.2 0.4-0.1 0.10.3 0.4-0.7 -1.0-0.3 -0.3-0.6 -0.7

M1 M2

64 Water: Unreliable63 Water: Contribution62 Water: Bad taste61 Water: Unprotect. OK60 Water: Long queues59 Water: Far away58 Water src: Vendor truck57 Water src: Unprotected well56 Water src: River/lake/pond55 Water src: Rain water54 Water src: Public tap53 Water src: Protected well52 Water src: Private tap51 Water src: Other50 Water src: Gravity ﬂow49 Water src: Bore-hole − . − . . . . (a) Full visualization of explanations M1 and M2 learned by CEN on the poverty prediction task. . . C o rr e l a t i o n

32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64