Explaining Data-Driven Decisions made by AI Systems: The Counterfactual Approach
EExplaining Data-Driven Decisions
Explaining Data-Driven Decisions made by AI Systems:The Counterfactual Approach
Carlos Fern´andez-Lor´ıa [email protected]
New York University
Foster Provost [email protected]
New York University
Xintian Han [email protected]
New York University
Abstract
Lack of understanding of the decisions made by model-based AI systems is one of the mainbarriers for their adoption. We examine counterfactual explanations, which are becomingan increasingly accepted alternative for explaining AI decisions. The counterfactual ap-proach defines an explanation as a set of the system’s data inputs that causally drives thedecision (meaning that removing them changes the decision) and is irreducible (meaningthat removing any subset of the inputs in the explanation does not change the decision).We generalize previous work on counterfactual explanations, resulting in a framework that(a) is model-agnostic, (b) can address features with arbitrary data types, (c) can explaindecisions made by complex AI systems that incorporate multiple models, and (d) is scalableto very large numbers of features. We also propose a heuristic procedure to find the mostuseful explanations depending on the context. We contrast counterfactual explanationswith another alternative that has become popular—methods that explain model predic-tions by weighting features according to their importance (e.g., SHAP, LIME). This paperpresents two fundamental reasons why explaining model predictions is not the same asexplaining the decisions made using those predictions, suggesting that we should carefullyconsider whether importance-weight explanations are well-suited to explain decisions madeby AI systems. Specifically, we show through several examples that (1) features that havea large importance weight for a model prediction may not actually affect the correspondingdecision, and (2) importance weights are insufficient to communicate whether and how thefeatures actually influence system decisions. We demonstrate this first using three simpleexamples. Then we present three detailed studies using real-world data to compare andcontrast the counterfactual approach with SHAP, a popular importance weighting method.The examples and case studies illustrate various conditions under which counterfactualexplanations explain data-driven decisions better than feature importance weights.
Keywords:
Explanations, System Decisions, Predictive Modeling a r X i v : . [ c s . L G ] M a y ern´andez-Lor´ıa, Provost and Han
1. Introduction
Data and predictive models are used by artificial intelligence (AI) systems to make decisionsacross many applications and industries. Yet, many data-rich organizations struggle whenadopting AI decision-making systems because of managerial and cultural challenges, ratherthan issues related to data and technology (LaValle et al., 2011). In fact, as predictivemodels become more complex and difficult to understand, stakeholders often become moreskeptical and reluctant to adopt or use them, even if the models have been shown to improvedecision-making performance (Arnold et al., 2006; Kayande et al., 2009).Explanations are also useful for other reasons beyond increasing adoption (Martens andProvost, 2014). For example, explanations may help customers understand the reasoningbehind automated decisions that affect them. Users of the model, such as managers oranalysts, may use explanations to obtain insights about the domain in which the system isbeing used. Data scientists and machine learning engineers may also use the explanationsto identify, debug, and address potential flaws in the system. Many researchers have triedto reduce the gap in stakeholders understanding of AI systems in recent years, most notablyby proposing methods for explaining predictive models and their predictions.Methods for explaining AI models and their predictions include extracting rules thatrepresent the inner workings (e.g., Craven and Shavlik, 1996; Jacobsson, 2005; Martenset al., 2007) and associating weights to each feature according to their importance for modelpredictions (e.g., Lundberg and Lee, 2017; Ribeiro et al., 2016). Importance weights,in particular, have become increasingly popular because “model-agnostic” methods thatproduce importance weights have been introduced: the weights explain predictions in termsof features, so users can understand any specific prediction without any knowledge of theunderlying model or the modeling method(s) used to produce the model. For example, twoof the most popular methods for explaining model predictions, LIME (Ribeiro et al., 2016)and SHAP (Lundberg and Lee, 2017), are model-agnostic and produce importance-weightexplanations.This paper points at two fundamental reasons why importance-weight explanations maynot be well-suited to explain data-driven decisions made by AI systems. First, importanceweights are designed to explain model predictions, but explaining model predictions is notthe same as explaining the decisions made using those predictions. Notably, and perhapscounter-intuitively, features that have a large impact on a prediction may not necessarilyhave an impact on the decision that was made using that prediction. The examples in thispaper illustrate this in detail. Therefore, importance weights that are obtained with respectto model predictions may portray an inaccurate picture of how features influence systemdecisions.Second, identifying (and quantifying) important features is not sufficient to explainsystem decisions, even when importance is assessed with respect to the decisions beingexplained. As an example, suppose that a credit scoring system denies credit to a loanapplicant, and that feature importance weights reveal that the two most important featuresin the credit denial decision were annual income and loan amount. While informative,this “explanation” does not in fact explain what it was that made the system decide todeny credit. Would changing either the annual income or the loan amount be enoughfor the system to approve credit? Would it be necessary to change both? Or perhaps xplaining Data-Driven Decisions even changing both would not be enough. From the weights alone, it is not clear howthe important features may influence the decision. To be fair, this is not an indictmentof methods that calculate feature importance; they were not designed to explain systemdecisions. However, we are not aware of papers or posts that clarify this in research or inpractice.An alternative to importance-weight explanations are counterfactual explanations—explanations explicitly designed to explain system decisions proposed by Martens andProvost (2014); Provost (2014). For the question why did the model-based system make aspecific decision?, the counterfactual approach asks specifically, which data inputs causedthe system to make its decision?. This approach is advantageous because (i) it explainsdecisions rather than the outputs of the model(s) on which the decisions are based; (ii) itstandardizes the form that an explanation can take; (iii) it does not require all features tobe part of the explanation, and (iv) the explanations can be separated from the specifics ofthe model.Martens and Provost (2014) originally applied this framework to explain documentclassifications, and although it has been applied to other contexts beyond document classi-fication (Moeyersoms et al., 2016; Chen et al., 2017; Ramon et al., 2019), researchers don’tall see how the framework can be generalized to settings beyond text (see, e.g., Molnar,2019; Wachter et al., 2017; Biran and Cotton, 2017). To our knowledge, this approach hasnot been extended beyond classification models using sparse features in high-dimensionalsettings. Therefore, we introduce a multi-faceted generalization that focuses on providingexplanations for general data-driven system decisions, resulting in a framework that (a) mayexplain decisions made by systems that incorporate multiple models, (b) is model-agnostic,(c) can address features with arbitrary data types, and (d) is scalable to very large numbersof features. We also propose and showcase a heuristic procedure that may be used to searchand sort counterfactual explanations according to their context-specific relevance.Finally, we illustrate the advantages of our proposed counterfactual approach by com-paring it to SHAP (Lundberg and Lee, 2017), an increasingly popular method to explainmodel predictions that unites several feature importance weighting methods. Via threebusiness case studies that use real-world data, we detail the ways in which counterfactualexplanations explain data-driven decisions better than the popular alternative of featureimportance weights.
2. AI Systems and Explanations
In this paper, we focus specifically on explaining decisions made by systems that use pre-dictive statistical models to support or automate decision-making (Shmueli and Koppius,2011), and in particular on systems that make or recommend discrete decisions. We referto these as artificial intelligence (AI) systems.
Discrete decision making is closely related to classification, and indeed the subtle distinctionoften can be overlooked safely—but for explaining system decisions it is important to beclear. First there is a definitional difference: a classification model might classify someoneas defaulting on credit or not; a corresponding decision-making system would use this model ern´andez-Lor´ıa, Provost and Han to make a decision on whether or not to grant credit. Deciding not to grant credit is notthe same (at all) as saying that the individual will default—which brings us to the technicaldifference.Classification tasks usually are modeled as scoring problems, where we want our predic-tive models to score the observations such that those more likely to have the “correct” classwill have higher scores. These scores may then be used by a system to make a decision that isrelated to (but usually not the same as) the classification. For example, for binary decisions(and corresponding classifications) typically the scores rank observations, and decisions aremade using a chosen threshold appropriate for the problem at hand (Provost and Fawcett,2013). In many cases, estimated probabilities of class membership are computed from themodels, which allows the use of decision theory to combine them with application-specificinformation on costs and benefits (Provost and Fawcett, 2013) to produce a next stage ofmore nuanced scores. Thus, decision-making problems are often modeled as “classificationtasks” by associating a class with each decision.However, it is important to emphasize that the final output of the system (i.e., thedecision) may not correspond to the labels in the training data. As another example, fora system deciding whether to target a customer with a promotion, scores could consistof expected profits. In this case, we could estimate a classification model to predict theprobability that the customer will make a purchase and a regression model to estimate thesize of the purchase (conditioned on the customer making a purchase); the expected profitswould be the multiplication of these two predictions (Provost and Fawcett, 2013)—and theranking of the customers by expected profit could be different from the ranking based simplyon the classification model score. The final output of the decision-making system wouldbe whether the customer should be targeted with a promotion (and because of selectionbias and other complications, we often patently would not want to learn models based ontraining data about who was targeted with a promotion).Explaining the decisions made by intelligent systems has received both practical andresearch attention for decades (Gregor and Benbasat, 1999). Prior work has shown thatthe ability for intelligent systems to explain their decisions is necessary for their effectiveuse: when users do not understand the workings of an intelligent system, they becomeskeptical and reluctant to use it, even if the system is known to improve decision-makingperformance (Arnold et al., 2006; Kayande et al., 2009). More recently, for example, afield study in a Department of Radiology showed that the use of AI systems slowed down,rather than sped up, the radiologists decision-making process because the AI systems oftenprovided recommendations that conflicted with the doctors judgement (Lebovitz et al.,2019). Lacking critical understanding of the opaque AI systems, the doctors often reliedon their own diagnoses, which did not concur with the system’s. Our paper provides amethodological framework to make the decisions of such AI systems more transparent. Over the past several decades, many researchers have worked on explaining predictivemodels—in contrast to explaining their predictions or decisions made using them. Becausesymbolic models, such as decision trees, are often considered straightforward to explain xplaining Data-Driven Decisions when they are small, most research has focused on explaining non-symbolic (black box)models or large models.Rule-based explanations have been a popular approach to explain black-box models. Forexample, in many credit scoring applications, banking regulatory entities require banks toimplement globally comprehensible predictive models (Martens et al., 2007). Typical tech-niques to provide rule-based explanations consist of approximating the black box modelwith a symbolic model (Craven and Shavlik, 1996), or extracting explicit if-then rules (An-drews et al., 1995). Proposed methods are often tailored to the specifics of the models beingexplained, and researchers have invested significant effort attempting to make state-of-the-art black box models more transparent. For example, Jacobsson (2005) offers a reviewof explanation techniques for deep learning models, and Martens et al. (2007) propose arule extraction method for SVMs. Importantly, these “global” explanations (Martens andProvost, 2014) attempt to explain the model as a whole, rather than explaining particulardecisions made. As Martens and Provost point out, this can be viewed as explaining ev-ery possible decision the model might make—but the methods are not tailored to explainindividual decisions. A different approach, that has become quite popular recently, is to explain the predictionsof complex models, framing the explanations in terms of feature importance by associatinga weight to each feature in the model. Each weight can be interpreted as the proportionof the information contributed by the corresponding feature to the model prediction. Themain strength of this approach is that the explanations are defined in terms of the domain(i.e., the features), separating them from the specifics of the model being explained. As aresult, models can be replaced without replacing the explanation method; end users (suchas customers or managers) do not need any knowledge of the underlying modeling methodsto understand the explanations, and different models may be compared in terms of theirexplanations in settings where transparency is critical.A common way of assessing feature importance is based on simulating lack of knowledgeabout features (Robnik-ˇSikonja and Kononenko, 2008; Lemaire et al., 2008). For example,one could compare the original models output with the output obtained when removinga specific feature from the data and the model (e.g., by imputing a default value for thefeature). If the output changes, it means that the feature was important for the modelprediction. Methods that use this approach often decompose each prediction into the indi-vidual contributions of each feature and use the decompositions as explanations, allowingone to visualize explanations at the instance level.Continuing with the earlier credit scoring example, Figure 1 shows an importance-weightexplanation for an individual who has an above-average probability of default. These im-portance weights were generated using SHAP (Lundberg and Lee, 2017), which we willdiscuss in more detail in the following sections. Each weight in the explanation representsthe impact that its respective feature had on the prediction. Thus, the weight of (roughly)
1. Recent work has been revisiting this assumption, working to produce models explicitly designed to beboth accurate and comprehensible; see Wang and Rudin (2015) for an illustrative example. ern´andez-Lor´ıa, Provost and Han Figure 1: Example of an importance-weight explanation for a model prediction2.5% that is attributed to the loan amount feature (‘loan amnt’) implies that the featureincreased the probability of default of that particular individual by 2.5%.A notable challenge, however, is that interactions between features may lead to ambigu-ous explanations, because the order in which features are removed may affect the impor-tance attributed to each feature. As a result, subsequent work proposed assessing featureimportance by removing all possible subsets of features (rather than only one feature ata time), retraining models without the removed features, and comparing how predictionschange (ˇStrumbelj et al., 2009). However, such approaches may take hours of computationtime even for a single prediction and have been reported to handle only up to about 200features. Alternative formulations (such as SHAP) have attempted to reduce computationtime by sampling the space of feature combinations and by using imputation to deal withremoved features, resulting in sampling-based approximations of the influence of each fea-ture on the prediction (ˇStrumbelj and Kononenko, 2010; Ribeiro et al., 2016; Lundberg andLee, 2017; Datta et al., 2016).Nevertheless, importance weights are tailored to explain model predictions and may notbe adequate to explain system decisions, namely because they dont communicate how thefeatures actually influence decisions. We will illustrate this with several examples below.Moreover, complex systems may incorporate many features in their decision making. Inthese settings, hundreds of features may have non-zero importance weights for any giveninstance, yet only a handful of the features may be critical for understanding the system’sdecisions (Martens and Provost, 2014; Chen et al., 2017). xplaining Data-Driven Decisions
3. Counterfactual explanations
The idea of using a causal perspective to explain model predictions with counterfactualswas first proposed (to our knowledge) by Martens and Provost (2014) (see also Provost(2014)). Other researchers followed with similar causal, counterfactual explanation ap-proaches (see Molnar, 2019, for examples). In this paper, we generalize the counterfactualexplanations originally proposed for document classification (Martens and Provost, 2014)and used subsequently to explain ad-targeting decisions (Moeyersoms et al., 2016), target-ing decisions based on Facebook Likes (Chen et al., 2017), and classifications based on otherhigh-dimensional, sparse data (Ramon et al., 2019). We provide a more precise definitionof counterfactual explanations below, but as with the prior work, we define explanations interms of input data—or evidence—that would change the decision if it were not present.
For illustration, suppose a credit card transaction was flagged for action by a data-drivenAI system after it was registered as occurring outside the country where the cardholderlives, and suppose the system would have not flagged the transaction absent this location. In this case, it is intuitive to consider the location of the transaction as an explanation forthe system decision. Of course, there could be other explanations. Perhaps the transactionalso involved a consumption category outside the profile of the cardholder (e.g., a purchaseat a casino), and excluding this information from the system would also change the decisionto “do not flag”. Both are counterfactual explanations—they comprise evidence withoutwhich the system would have made a different decision.A subtle implication of this perspective is that counterfactual explanations are gener-ally applied to “non-default” decisions, because data-driven systems usually make defaultdecisions in the absence of evidence suggesting that a different decision should be made. Inour example, a transaction would be considered legitimate unless there is enough evidencesuggesting fraud. As a result, explaining default decisions often corresponds to saying, “be-cause there was not enough evidence of a non-default class”. Thus, as with prior work, inthis paper we focus primarily on explaining non-default decisions.
Following Martens and Provost (2014) and Provost (2014), we define a counterfactual ex-planation for a system decision as a set of features that is causal and irreducible . Beingcausal means that removing the set of features from the instance causes the system decision
2. We should keep in mind the decision-rather-than-classification perspective. The decision is to flag thetransaction for one or more actions, such as sending a message to the account holder to verify. Flaggingmay be based on a threshold on the estimated likelihood of fraud, but may also consider the existenceof evidence from other transactions and the potential loss if the transaction were indeed fraudulent.3. However, this is not always the case. For example, if a credit card transaction was made in a foreigncountry, but the cardholder recently reported a trip abroad, the trip report could be a reasonableexplanation for the transaction being classified as legitimate. So, the evidence in favor of a non-defaultclassification may be cancelled out by other evidence in favor of a default classification. ern´andez-Lor´ıa, Provost and Han Explanation 1
Credit approved if { ‘loan amnt’ } is removed. Explanation 2
Credit approved if { ‘annual inc’ } is removed. Explanation 3
Credit approved if { ‘fico range high’, ‘fico range low’ } are removed.Table 1: Examples of counterfactual explanations for a system decisionto change. Irreducible means that removing any proper subset of the explanation wouldnot change the system decision. The importance of an explanation being causal is straight-forward: the decision would have been different if not for the presence of this set of features.The irreducibility condition serves to avoid including features that are superfluous, whichrelates to the fact that some of the features in a causal set may not be necessary for thedecision to change.More formally, consider an instance I consisting of a set of m features, I = { , , ..., m } ,for which the decision-making system C : I → { , , ..., k } gives decision c . A feature i isan attribute taking on a particular value, like income=$50,000 or country=FRANCE . Then,a set of features E is a counterfactual explanation for C ( I ) = c if and only if: E ⊆ I (the features are present in the instance) (1) C ( I − E ) (cid:54) = c (the explanation is causal) (2) ∀ E (cid:48) ⊂ E : C ( I − E (cid:48) ) = c (the explanation is irreducible) (3)As mentioned, our approach builds on the explanations proposed by Martens andProvost (2014), who developed and applied counterfactual explanations for document clas-sifications, defining an explanation as an irreducible set of words such that removing themfrom a document changes its classification. Our definition generalizes their counterfactualexplanations in three important ways. First, it makes explicit how the explanations may beused for broader system decisions, which may incorporate predictions from multiple predic-tive models. Second, their practical implementation of explanations (and subsequent work)consists of removing features by setting them to zero, whereas we generalize to arbitrarymethods for removing features (and note the important relationship to methods for dealingwith missing data). Third, while their approach has been applied in other contexts beyonddocument classification (Chen et al., 2017; Moeyersoms et al., 2016; Ramon et al., 2019),these applications all have the same data structure: high-dimensional, sparse features. Ourgeneralization applies to features with arbitrary data types.Going back to our credit scoring example, suppose a decision-making system usingthe model prediction explained in Figure 1 decides not to grant credit to that individual.Table 1 shows some possible counterfactual explanations for the credit denial decision. Eachexplanation represents a counterfactual world in which specific evidence is not consideredwhen making the decision, resulting in a default decision (approving credit in this case).
4. It is critical to differentiate what is causing the data-driven system to make its decisions from causalinfluences in the actual data-generating processes in the “real” world. Our definition of counterfactualexplanations relates to the former. xplaining Data-Driven Decisions A vital practical question that is raised by the counterfactual approach discussed here iswhat does it mean to “remove” evidence (i.e., features) from a data instance that willbe input to a model-based decision-making procedure? Prior methods for counterfactualexplanations and model sensitivity analyses have replaced input feature values with someother specified value. For example, Martens and Provost (2014) replace the presence (binaryindicator, count, TFIDF value, etc.) of a word in a document with a zero. This makes sensein the context of their application, because if we consider the presence of a word as evidencefor a document classification, removing that evidence—that word—would be representedby a zero for that feature. More generally, we should consider carefully the notion of removing features from theinput to a data-driven model. If we step away for a moment from explaining AI systems, wecan think of explaining other sorts of evidence-driven decisions within the same framework.For instance, in a murder case, we might explain our decision to bring in the suspect basedon the fact that the murder weapon was found in her apartment; if there were no murderweapon, we would not have brought her in. If we would have brought her in anyway,then the presence of the weapon does not suffice as an explanation for our decision. So,in this case, we are imagining our collection of evidence with the focal piece of evidencemissing. We can do the same in principle with data-driven decisions: we can make thefeature in question be missing and ask if we would still make the same decision. Thus, wecan generalize to data inputs of any kind: removing the feature means “making it missing”in the data instance.We emphasize that we can do this “in principle” because in practice it may or may notbe practicable to simply make a feature be missing. Some AI models and systems deal withmissing features naturally and some do not. Importantly, note that here we are talkingabout dealing with missing values at the time of use of the model, not dealing with missingvalues during machine learning. There are different ways for dealing with missing featureswhen applying (as opposed to learning) a predictive model (Saar-Tsechansky and Provost,2007), such as imputing default values for the missing features, using an alternative modeltrained with only the available features, etc.Therefore, the generalized explanation framework we present is agnostic to which methodis used to deal with the removed features—taking the position that this decision is domainand problem dependent. Within a particular domain and explanation context, the usershould choose the method for dealing with missing values. For example, in settings wherefeatures are often missing at prediction time, replacing the value of a feature with a “miss-ing” categorical value might make the most sense to simulate missingness, whereas in caseswhere all attributes must have values specified in order to make the decision, replacing thevalue with the mean or the mode might make more sense. What matters is that the decisionmay change when some of the features are not present at the time of decision making, andthat the method for dealing with missing values allows the change in the decision to beattributed to the absence of these features.This framework naturally incorporates other techniques used in prior counterfactual ap-proaches: the common case of replacing a feature in a sparse setting with a zero corresponds
5. They discuss the case where absence of a word would be evidence as well; see the original paper. ern´andez-Lor´ıa, Provost and Han to mode imputation; replacing a numeric feature with the mean value for that attributecorresponds to mean imputation. In the empirical examples presented below, we use meanimputation for continuous variables and mode imputation for sparse numeric, binary, andcategorical variables. Saar-Tsechansky and Provost (2007) discuss other alternatives fordealing with missing values when applying predictive models; any of them could be used inconjunction with this counterfactual explanation framework. This definition of counterfactual explanations for system decisions allows any procedure forfinding such explanations. For example, fast solvers for combinatorial problems may beused to find counterfactual explanations (Schreiber et al., 2018). For this paper, and for theexamples that follow, we adopt a heuristic procedure to find the most useful explanationsdepending on the context.The algorithm proposed by Martens and Provost (2014) finds counterfactual explana-tions by using a heuristic search that requires the decision to be based on a scoring function,such as a probability estimate from a predictive model. We also will presume that the de-cision making is based on comparing some score to a threshold. This scoring function isused by the search algorithm to first consider features that, when removed, reduce thescore of the predicted class the most. This heuristic may be desirable when the goal is tofind the smallest explanations, such as when explaining the decisions of models that usethousands of features. Another possible heuristic is to remove features according to theiroverall importance for the prediction, where the importance may be computed by a featureimportance explanation technique (Ramon et al., 2019).However, the shortest explanations are not necessarily the best explanations. For in-stance, users may want to use the explanations as guidelines for what to change in order toaffect the system decision. As an example, suppose that a system decides to warn a manthat he is at high risk of having a heart attack. An explanation that the system would havenot made the warning if the patient were not male is of very little use as a guide for whatto do about it. In practice, some features are easier to change than others, and some maybe practically impossible to change.Therefore, we allow the incorporation of a cost function as part of the heuristic procedurein order to search first for the most relevant explanations. The underlying idea is that thecost function may be used to associate costs to the removal (or adjustment) of features, sothat sets of features that satisfy desirable characteristics are searched first. Importantly, thecost function is meant to be used as a mechanism to capture the relevance of explanations,so the cost of changing or removing the features might not represent an actual cost (we willshow an example of this in one of the case studies below). For example, the cost may befixed (e.g., when removing a word from a document), may be contingent on the value ofthe variable (e.g., when adjusting a continuous variable), contingent on the value of otherfeatures, or may even be practically infinite.Subsequently, instead of searching for the feature combinations that change the scoreof the predicted class the most, the heuristic could search for the feature combinations forwhich the output score changes the most per unit of cost. The motivation behind thisnew heuristic is to find first the explanations with the lowest costs. Returning to the heart xplaining Data-Driven Decisions attack example, if we assign an infinite cost to changing the gender feature, the heuristicwould not select feature combinations that include it, regardless of its high impact on theoutput score. Instead, the heuristic would prefer explanations with many modest but cheapchanges, such as changing several daily habits. To the extent that the system also has ascoring function (which could be the result of combining several predictive models), theprocedure proposed by Martens and Provost (2014) could be easily adjusted to find themost useful explanations for the problem at hand. A similar approach has been suggestedfor classifiers that have a known and differentiable scoring function (Lash et al., 2017). Counterfactual explanations have other benefits as well. First, as with importance weights,they are defined in terms of domain knowledge (features) rather than in terms of modelingtechniques. As mentioned above, this is of critical importance to explain individual deci-sions made by such models to users. More importantly, these explanations can be usedto understand how features affect decisions, which (as we will show in next sections) isnot captured well by feature importance methods. Also, because only a fraction of thefeatures will be present in any single explanation, the present approach may be used toexplain decisions from models with thousands of features (or many more). Studies showcases where such explanations can be obtained in seconds for models with tens or hundredsof thousands of features and that the explanations typically consisted of a handful to a fewdozen of features at the most (Martens and Provost, 2014; Moeyersoms et al., 2016; Chenet al., 2017).
4. Limitations of importance weights
In this section, we use three simple, synthetic (but illustrative) examples to highlight twofundamental reasons why importance-weight explanations may not be well-suited to explaindata-driven decisions made by AI systems. The first example (Example 1) is meant to il-lustrate that features that have a large impact on a prediction (and thus large importanceweights) may not have any impact on the decision made using that prediction. The nexttwo examples show that importance weights are insufficient to communicate how featuresactually affect decisions (even when importance is determined with respect to system deci-sions rather than model predictions). More specifically, we show cases in which importanceweights remain the same despite substantial changes to decision making (Examples 1, 2,and 3) and in which features deemed unimportant by the weights actually affect the deci-sion (Example 3). Similar examples to the ones discussed in this section will come up againin the case studies in Section 5, when comparing importance weights with counterfactualexplanations using real-world data.Throughout this section, the examples assume that we want to explain the binary deci-sion made for three-feature instance I and decision procedure C i as defined here: I = { F = 1 , F = 1 , F = 1 } , (4) C i ( I ) = (cid:40) , if ˆ Y i ( I ) ≥ , otherwise , (5) ern´andez-Lor´ıa, Provost and Han where { F , F , F } are binary features, and C i is the decision-making procedure (an AIsystem) that uses the scoring (or prediction) function ˆ Y i to make decisions. The examplesthat follow will employ different ˆ Y i . We assume that domain knowledge has guided us toreplace the values of missing features with a default value of zero.We compute importance weights using SHAP (Lundberg and Lee, 2017), a popularapproach to explain the output of machine learning models. Before we focus on the disad-vantages of importance weights for explaining system decisions, let us point out that SHAPhas several advantages for explaining data-driven model predictions: (i) it produces numeric“importance weights” for each feature at an instance-level, (ii) it is model-agnostic, (iii) itsimportance weights tie instance-level explanations to cooperative game theory, providinga solid theoretical foundation, (iv) and SHAP unites several feature importance weightingmethods, including the relatively well-known LIME (Ribeiro, Singh and Guestrin, 2016).In the case of SHAP, importance weights consist of the (approximated) Shapley valuesof the features for a model prediction. Shapley values correspond to the impact each featurehas on the prediction, averaged over all possible joining orders of the features. A majorlimitation of Shapley values is that computing them becomes intractable as the numberof features grows. SHAP circumvents this limitation by sampling the space of featurecombinations, resulting in a sampling-based approximation of the Shapley values. Thereare only 3 features in the examples that follow, so the approximations are not necessaryhere, but they will be for the case studies discussed in Section 5, where the number offeatures is much larger. We illustrate the computation of Shapley values in more detail inthe examples below. All importance weighting methods (that we are aware of) are designed to explain the outputof scoring functions, not system decisions. This is problematic because a large impact on thescoring function does not necessarily translate to an impact on the decision. This exampleillustrates this by defining ˆ Y as follows:ˆ Y ( I ) = F + F + 10 F F + 10 F F , (6)so the prediction and the decision for instance I are ˆ Y ( I ) = 22 and C ( I ) = 1 respectively.Table 2 shows how to compute the Shapley values of the features with respect to ˆ Y .Each row represents one of the six possible joining orders of the features, and each columncorresponds to the impact of one of the three features across those joining orders. The lastrow shows the average impact of the features, which corresponds to the Shapley values.According to Table 2, SHAP gives F a larger weight than F or F due to its largeimpact on ˆ Y . However, if we take a closer look at C and ˆ Y simultaneosuly, we can seethat F does not affect the decision-making procedure at all! More specifically F onlyaffects ˆ Y if F or F are already present, but if those features are present, then increasingthe score does not affect the decision because ˆ Y ≥ C = 1 regardless of F ). Therefore, the large “importance” of a feature for a model prediction may not implyan impact on a decision made with that prediction.As we mentioned at the outset, SHAP was not designed to explain system decisions—sothis is not an indictment of SHAP. It is an illustration that explaining model predictions and xplaining Data-Driven Decisions Joining orders Impact of F Impact of F Impact of F F , F , F F , F , F F , F , F F , F , F
11 1 10 F , F , F
11 11 0 F , F , F
11 11 0
Shapley values Y and all the joining orders used in their computation. Joining orders Impact of F Impact of F Impact of F F , F , F F , F , F F , F , F F , F , F F , F , F F , F , F Shapley values { F , F } Table 3: Shapley values and joining orders for C , as well as all counterfactual explanationsfor this decision.explaining system decisions are two different tasks. We might conclude then that we couldadapt SHAP to compute feature importance weights for system decisions, for example, bytransforming the output of the decision system into a “scoring function” that returns 1 if thedecision is the same after removing features and returns 0 otherwise. This transformation,originally introduced by Moeyersoms et al. (2016) (also in the context of using Shapleyvalues for instance-level explanations), would allow us to use SHAP to obtain importanceweights for the system decision—even decisions with multiple, unordered alternatives thatcannot normally be represented as a single numeric score.Table 3 shows the Shapley values of the features with respect to the decision-makingprocedure C (when applying the suggested transformation). It illustrates that F indeeddoes not affect the decision at all. However, the next examples show that, even when ern´andez-Lor´ıa, Provost and Han Joining orders Impact of F Impact of F Impact of F F , F , F F , F , F F , F , F F , F , F F , F , F F , F , F Shapley values { F } and { F } Table 4: Shapley values for C , as well as all counterfactual explanations for this decision.importance weights are computed with respect to the decision-making procedure ratherthan the model predictions, the weights do not capture well how features affect decisions. In Example 1, the decision changes when we remove (or change) F and F simultaneously,and removing any of the features individually does not change the decision. So, according toour definition in Section 3.2, there is a single counterfactual explanation, { F , F } . However,suppose we were to use the following scoring function to make decisions instead:ˆ Y = F F (7)Table 4 shows the Shapley values for C , which are the same as for C (see Table 3)because features F and F are equally important in both cases. However, the decision-making procedure is different because the new scoring function implies that removing eitherfeature would change the decision. Therefore, with the new scoring function, there wouldbe two counterfactual explanations, { F } and { F } , but the importance weights do notcapture this. This implies that (in general) importance weights do not communicate howremoving (or changing) the features may change the decision. In Example 1, we showed that even if a feature has a large, positive importance weight fora model’s instance-level prediction, changing the feature may have no effect on the decisionmade for that instance. Importance weights can also be misleading if we use them to explainsystem decisions, because a feature with an importance weight of zero may have a positive
6. Note that Ramon et al. (2019) show a way to use importance weighting methods (such as LIME andSHAP) to search for counterfactual explanations; this is different from computing importance weightsfor system decisions. xplaining Data-Driven Decisions Joining orders Impact of F Impact of F Impact of F F , F , F F , F , F F , F , F -1 1 1 F , F , F F , F , F F , F , F Shapley values { F } , { F } , and { F } Table 5: Shapley values for C , as well as all counterfactual explanations for this decision.effect on the decision! We illustrate this with a third example, for which we use the followingscoring function: ˆ Y = F + F − F F − F F − F F + 3 F F F (8)Table 5 shows the Shapley values with respect to C , and we can see that the values arethe same as in the previous examples, but the decision-making process has changed onceagain. Notably, removing (or changing) F can change the decision from C = 1 to C = 0,as evidenced by the impact of F in the first and third joining orders, but the importanceweight of F is 0. The counterfactual explanation framework, on the other hand, revealsthat there are three counterfactual explanations in this example: { F } , { F } , and { F } .Thus, a feature that we might mistakenly deem as irrelevant due to its non-positive weight,is in fact as important as the other features with positive weights (at least for the purposesof explaining the decision C ( I ) = 1). While the previous examples were deliberately constructed to illustrate the limitations ofimportance weights (and thus may seem contrived), they reveal an important insight: it isdifficult to capture the impact of features on decisions with a single number, especially whenfeatures interact with each other. This is particularly relevant when explaining black-boxmodels (such as neural networks), which are well-known for learning complex interactionsbetween features. Moreover, we will show in Section 5 how the hypothetical examples weillustrated in this section also occur in real-world scenarios.The main reason why importance weights are problematic for explaining system decisionsis that they essentially aggregate across potential explanations (i.e., feature sets) to providea single explanation per decision. Thus, each decision is explained using a single vector ofweights. Typically, the importance weighting methods summarize the impact of featuresin a single vector by averaging across multiple feature orderings. The problem is that ern´andez-Lor´ıa, Provost and Han the average impact of a feature is not fine-grained enough to describe dynamics betweenfeatures, and more importantly, it is difficult to interpret: why should the average acrossfeature orderings be relevant to explain a decision? After all, it might not be representativeof the potential impact that features have (as in the case of F in Example 3).Counterfactual explanations circumvent the drawbacks of using averages because theexplanations are defined at the counterfactual level, meaning that each explanation repre-sents a counterfactual world in which the decision would be different. This allows a singledecision to have multiple explanations, allowing a richer interpretation of how the featuresmay influence the decision.
5. Case Studies
We now present three case studies to illustrate the phenomena discussed above using real-world data. The first case study contrasts counterfactual explanations with explanationsbased on importance weights, showing fundamental differences. The second case studyshowcases the power of counterfactual explanations for very high-dimensional data andshows how the heuristic procedure that generates counterfactual explanations may be ad-justed to search and sort explanations according to their relevance to the decision maker.The third case study shows the application of counterfactual explanations to AI systemsthat are more complex than just applying a threshold to the output of a single predictivemodel—specifically, to systems that integrate multiple models predicting different things. Inall case studies, we use SHAP to compute importance weights with respect to the decision-making procedure rather than model predictions (as discussed above).
To showcase the advantages of counterfactual explanations over feature importance weightswhen explaining data-driven decisions, we explain decisions made by a system that makesdecisions to accept or deny credit, based on real data from Lending Club, a peer lendingplatform. The data is publicly available and contains comprehensive information on allloans issued starting in 2007. The data set includes hundreds of features for each loan,including the interest rate, the loan amount, the monthly installment, the loan status (e.g.,fully paid, charged-off), and several other attributes related to the borrower, such as typeof house ownership and annual income. To simplify the setting, we use a sample of thedata used by Cohen et al. (2018) and focus on loans with a 13% annual interest rateand a duration of three years (the most common loans), resulting in 71,938 loans. Theloan decision making is simulated but is in line with consumer credit decision making asdescribed in the literature (see Baesens et al., 2003). We use 70% of this data set to train a logistic regression model that predicts the probabil-ity of borrowers defaulting using the following features: loan amount (loan amnt), monthlyinstallment (installment), annual income (annual inc), debt-to-income ratio (dti), revolvingbalance (revol bal), incidences of delinquency (delinq 2yrs), number of open credit lines(open acc), number of derogatory public records (pub rec), upper boundary range of FICO
7. Note that the Lending Club data contains a substantial number of loans for which traditional modelsestimate moderately high likelihoods of default, despite these all being issued loans. This may be due toLending Clubs particular business model, where external parties choose to fund (invest in) the loans. xplaining Data-Driven Decisions Figure 2: Feature importance weights according to SHAPscore (fico range high), lower boundary range of FICO score (fico range low), revolving lineutilization rate (revol util), and months of credit history (cr hist). The model is used by a(simulated) system that denies credit to loan applicants with a probability of default above20%. We use the system to decide which of the held-out 30% of loans should be approved.By comparing counterfactual explanations to explanations based on feature importanceweights, we can see counterfactual explanations have several advantages. First, importanceweights do not communicate which features would need to change in order for the decisionto changeso their role as explanations for decisions is incomplete. Figure 2 shows the featureimportance weights assigned by SHAP to four loans (different colors) that are denied creditby the system. For instance, according to SHAP, loan amnt was the most important featurefor the credit denial of all four loans. However, this information does not fully explain anyof the decisions. The credit applicant of Loan 1, for example, cannot use the explanationto understand what would need to be different to obtain credit; the feature importanceweights do not explain why he or she was denied credit. Was it the amount of the loan?The annual income? Both?Table 6, in contrast, shows all counterfactual explanations for the credit denial decisionof Loan 1. Each column represents an explanation, and the arrows in each cell show whichfeatures are present in each explanation (recall that a counterfactual explanation is a set offeatures). The last column shows the difference between the original value of each featureand the value that was imputed to simulate missingness (the mean in our case), illustratinghow our generalized counterfactual explanations may be applied to numeric features. ern´andez-Lor´ıa, Provost and Han Features Explanations Distancefrom mean1 2 3 4 5 6 loan amnt ↑ +$16,122installment ↑ +$540annual inc ↓ ↓ ↓ ↓ ↓ -$9,065revol bal ↓ -$4,825fico range high ↓ -16fico range low ↓ -16revol util ↑ +12%cr hist ↓ -92 months ↑ means feature is too large to grant credit. ↓ means feature is too small to grant credit.Table 6: Counterfactual explanations for Loan 1For example, as shown in column 1, one possible explanation for the credit denial ofLoan 1 is that the loan amount is too large (or more specifically, $16,122 larger than theaverage) given the other aspects of the application. The data indeed shows that the amountfor Loan 1 is $28,000, but the average loan amount in our sample is $11,878. In this instance,one could explain the decision in several other ways. The explanation in column 4 suggeststhat the $28,000 credit would be approved if the applicant had a higher annual income anda longer credit history, which are below average in the case of the applicant. Therefore,from these explanations, it is immediately apparent how the features influenced the decision.This highlights two additional advantages of counterfactual explanations: they give a deeperinsight into why the credit was denied and provide various alternatives that could changethe decision.Table 7 shows the counterfactual explanations of Loan 4 to emphasize this last point.From Figure 2, we can see that Loan 1 and Loan 4 have similar importance weights. Thus,from this figure alone, one may conclude that these two credit denial decisions should havesimilar counterfactual explanations. Yet, comparing Table 6 and Table 7 reveals this in factis not the case. Loan 4 has many more explanations, and even though the explanations inboth loans have similar features, the only explanation that the loans have in common is thefirst one (i.e., loan amount is too large); there is no other match.Importantly, the number of potential counterfactual explanations grows exponentiallywith respect to the number of features, and we know of no algorithm with better thanexponential worst-case time complexity for finding all explanations. Therefore, finding all xplaining Data-Driven Decisions Features Explanations Distancefrom mean1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 loan amnt ↑ +$16,122installment ↑ ↑ ↑ +$540annual inc ↓ -$9,065dti ↑ ↑ ↑ +5open acc ↑ ↑ +1pub rec ↑ +1fico range high ↓ ↓ ↓ ↓ ↓ ↓ -16fico range low ↓ ↓ ↓ ↓ ↓ ↓ ↓ -16revol util ↑ ↑ ↑ +12%cr hist ↓ ↓ ↓ -92 months ↑ means feature is too large to grant credit. ↓ means feature is too small to grant credit. Table 7: Counterfactual explanations for Loan 4counterfactual explanations may be intractable when the number of features is large. Inthe case of the loans discussed in this case study, we were able to conduct an exhaustivesearch because the number of features is relatively small; thus Tables 6-7 show all possiblecounterfactual explanations for the credit denials of Loan 1 and Loan 4. In other settings,we may need to be satisfied with an approximation to the set of all explanations.In cases where the number of explanations is large, additional steps to improve inter-pretability may be helpful, such as defining measures to rank explanations according totheir usefulness. One such measure is the number of features present in the explanation(the fewer, the better). In fact, the heuristic we used to find explanations in this example,the same introduced by Martens and Provost (2014), tries to find the shortest explanationsfirst. However, there could be other more relevant measures depending on the particulardecision-making problem—such as the individual’s ability to change the features in the ex-planation. As mentioned above, our generalized framework would allow incorporating thecost of changing features as part of the heuristic procedure, resulting in an algorithm de-signed to (try to) find the cheapest or more relevant explanations first. Because finding allpossible explanations was tractable in this case, we did not incorporate costs in the heuristicwe used to find explanations in this empirical example, but we do so in the next case study.
8. Ramon et al. (2019) demonstrates the effectiveness of starting the importance weights in order to ef-ficiently generate a counterfactual explanation, but this does not reduce the worst case complexity forfinding all explanations. Furthermore, as noted above, computing the importance weights itself is com-putationally expensive. ern´andez-Lor´ıa, Provost and Han Nonetheless, one can see that not all features shown in Figure 2 and Tables 6-7 wouldbe relevant for loan applicants looking for recommendations to get their credit approved.So, SHAP may be adjusted further to compute weights only for a subset of features. SinceSHAP deals with missing features by imputing default values, we can easily extend SHAPto only consider certain (relevant) features by setting the default values of the irrelevantfeatures equal to the current values of the instance. Then, SHAP will compute importanceweights only for the features that have a value different from the default. We do thisfor Loan 4 and define loan amount and annual income as the only relevant features. Thiswould make sense in our context if customers can only ask for less money or show additionalsources of income to get their credit approved.After doing this, SHAP computes an importance weight of 0.5 for both the loan amountand the annual income, and there are two counterfactual explanations: the applicant caneither reduce the loan amount or increase the annual income to get the loan approved(columns 1 and 2 in Table 7). However, consider a different scenario. Suppose the bankwere stricter with the loans it approves and used a decision threshold 2.5 percentage pointslower. Now, in order to get credit approved, the applicant of Loan 4 would need both toreduce the loan amount and to increase her (or his) annual income.This situation is directly analogous to Example 2 in Section 4.2. With this differentdecision system, there is a single counterfactual explanation (instead of two) consisting ofboth features, so the counterfactual framework captures the fact that the decision-makingprocedure changed. However, SHAP would still show an importance weight of 0.5 for eachfeature. Thus, the counterfactual explanations and the SHAP explanations exhibit differentbehavior. SHAP explanations suggest that the two decisions are essentially the same. Thecounterfactual explanations suggest that they are quite different. We argue that the latteris preferable in many settings. It may well be that the former is preferable in some settings,but we havent found a credible and compelling example.
We use Facebook data to showcase the advantages of counterfactual explanations whenexplaining data-driven decisions in high-dimensional settings. The data, which was col-lected through a Facebook application called myPersonality, has also been used by otherresearchers to compare the performance of various counterfactual explanation methods (Ra-mon et al., 2019). We use a sample that contains information on 587,745 individuals fromthe United States, including their Facebook Likes and a subset of their Facebook profiles.In general, Facebook users do not necessarily reveal all their personal characteristics, buttheir Facebook Likes are available to the platform. For this case study, in order to simulatea decision-making system, we assume there is a (fictitious) firm that wants to launch a mar-keting campaign to promote a new product to users who are more than 50 years old. Giventhat not all users share their age in their Facebook profile, the firm could use a predictivemodel to predict who is over-50 (using Facebook Likes) and use the predictions to decidewhom to target with the campaign.The Facebook Likes of a user are the set of Facebook pages that the user chose to“Like” on the platform (we capitalize “Like”, as have prior authors, to distinguish the act
9. Thanks to the authors of the prior study, Kosinski et al. (2013), for sharing the data. xplaining Data-Driven Decisions on Facebook). So, we represent each Facebook page as a binary feature that takes a valueof 1 if the user Liked the page and a value of 0 otherwise. We kept only the pages that wereLiked by at least 1,000 users, leaving us with 10,822 binary features. The target variablefor modeling is also binary and takes a value of 1 if the user is more than 50 years old, anda value of 0 otherwise. We use 70% of the data to train a logistic regression model. In ourfictitious setting, the model is used by a decision system that targets the top 1% of userswith the highest probability of being an older person, which (in our sample) implies sendingpromotional content to the users with a probability greater than 41.1%. We use the systemto decide which of the held out 30% of users to target.Importantly, while the system could generate a lot of value to the firm, we need toconsider users sense of privacy and how they might feel about being targeted with thepromotional campaign. For example, some users may feel threatened by highly personalizedoffers (“How do they know this about me?”) and thus may be interested in knowing whythey were targeted (see Chen et al. (2017) for a more detailed discussion). Such users maybe unlikely to be interested in the intricacies of the model but rather in the data abouttheir behavior that was used to target them with promotional content. If that is the case,framing explanations in terms of comprehensible input features (e.g., Facebook Likes) iscritical.One approach is to use importance weights to rank Facebook pages according to theirfeature importance (as computed by a technique such as SHAP) and then show the userthe topmost predictive pages that she (or he) Liked. However, given the large numberof features (Facebook pages), computing weights in a deterministic fashion is intractable.SHAP circumvents this issue by sampling the space of feature combinations, resulting insampling-based approximations of the influence of each feature on the prediction. However,the downside is that the estimates may be far from the real values, which may lead toinconsistent results. For example, if we were to use the topmost important features toexplain a decision, we should consider whether different runs of a non-deterministic methodrepeatedly rank the same pages as the most important ones. Unfortunately, as we willshow, the set of the topmost important features becomes increasingly inconsistent (acrossdifferent runs of SHAP) as the number of features increases.For instance, in our holdout data set there is a 34-year-old user who would be targetedwith an ad for older persons (the model predicts a 42% probability that this user is atleast 50 years old). So, as an example, suppose this user wants to know why he or she isbeing targeted. Let’s say that we have determined that showing the top-3 most importantfeatures makes sense for this application. Table 8 shows the top-3 most predictive pagesaccording to their SHAP values (importance weights) for the system decision. The tableshows the result of running SHAP five times to compute the importance weights, each timesampling 4,100 observations of the space of feature combinations. Because SHAP usessampling-based approximations, we can see that SHAP values vary every time we computethem, resulting in different topmost predictive pages. Importantly, while some pages appearrecurrently, only Paul McCartney appears in all 5 approximations.
10. We use the SHAP implementation provided here: https://github.com/slundberg/shap/ . At the mo-ment of writing, the default sample size is 2048+2 m , where m is the number of features with a non-defaultvalue. Our choice of 4 ,
100 is larger than the SHAP implementation’s default sample size for all of theexperiments we run. ern´andez-Lor´ıa, Provost and Han Approximation 1 Approximation 2 Approximation 3 Approximation 4 Approximation 5
Elvis Presley Paul McCartney Paul McCartney Paul McCartney Elvis Presley(0.1446) (0.1471) (0.1823) (0.1541) (0.1582)Bruce Springsteen William Shakespeare Neil Young Elvis Presley Paul McCartney(0.1302) (0.1321) (0.1676) (0.1425) (0.1489)Paul McCartney Brain Pickings The Hobbit Leonard Cohen Bruce Springsteen(0.1268) (0.1319) (0.1417) (0.1359) (0.1303)Importance weights (SHAP values) shown in parentheses.
Table 8: Topmost predictive pages and their SHAP values for a single decision to targetour example user with the over-50 ad.As we will show in more detail below, this inconsistency is the consequence of usingSHAP to estimate importance weights for too many features. This specific user Liked 64pages, which is not an unsually large number of Likes—more than a third of the targetedusers in the holdout data set have at least that many Likes. There are (at most) 64 non-zeroSHAP values to estimate, making the task significantly simpler than if we had to estimateimportance weights for all 10,822 features. However, SHAP proves unreliable to find themost predictive pages (let alone to estimate the importance weights for each page). Weincreased the sample size for SHAP to observe when the estimates became stable for thisparticular task (note that we already were running SHAP with a larger sample size than thedefault). For this specific user, it took 8 times more samples from the feature space for thesame topmost pages to show consistently across all approximations, increasing computationtime substantially (from 3 to 21 seconds per approximation on a standard laptop). Thistime would increase dramatically for data settings with hundreds of non-zero features, whichare not uncommon (e.g., see Chen et al., 2017; Perlich et al., 2014).In contrast, counterfactual explanations were found in a tenth of a second (on thesame laptop), five of which we show in Table 9. Each explanation consists of a subset ofFacebook pages that would change the targeting decision if it were removed from the set ofpages Liked by the user. In other words, each of the sets shown in Table 9 is an explanationin its own right, representing a minimum amount of evidence that (if removed) changes thedecision. Importantly, these explanations are short, consistent (because they are generatedin a deterministic fashion), and directly tied to the decision-making procedure.As an additional systematic demonstration of the negative impact that an increasingnumber of features may have on the consistency of sampling-based feature-importance ap-proximations, we show how the more pages a user has Liked, the more inconsistent the setof the top three most important pages becomes. The process we used is as follows. First,we picked a random sample of 500 users in the holdout data that would be targeted bythe system (as described above). Then, we applied SHAP five times to approximate theimportance weights of the features used for each of the 500 targeting decisions (sampling4,100 observations of the feature space each time). Finally, for each targeting decision, wecounted the number of pages that appeared consistently in the top three most importantpages across all five approximations. We call this the number of matches. Thus, if the xplaining Data-Driven Decisions Explanation 1
The user would not be targeted if { Paul McCarney } were removed. Explanation 2
The user would not be targeted if { Elvis Presley } were removed. Explanation 3
The user would not be targeted if { Neil Young } were removed. Explanation 4
The user would not be targeted if { Leonard Cohen } were removed. Explanation 5
The user would not be targeted if { Brain Pickings } were removed.Table 9: Counterfactual explanations for a single decision to target our example user withthe over-50 ad. < > . . .
53 Average Matches (SHAP) (a) Average matches by quantile < > (b) Average size by quantile Figure 3: Variations in explanations by number of Likesapproximations were consistent, we would expect the same three pages to appear in thetop three pages of all approximations, and there would be three matches. In contrast, ifthe approximations were completely inconsistent, no pages would appear in the top threepages of all five approximations and there would be no matches. It took about an hour torun this experiment on a standard laptop.The result of the experiment is in Figure 3a, which shows the average number of matchesby quantile. As predicted, SHAP approximations are not consistent for users who have Likedmany pages. For the largest instances, most cases have only one page that appears in allfive SHAP runs. In order to contrast SHAP with counterfactual explanations, we ran ouralgorithm to find one counterfactual explanation for each of the 500 targeting decisions,which took 15 seconds on a standard laptop. The results are shown in Figure 3b, which ern´andez-Lor´ıa, Provost and Han shows the average size of counterfactual explanations by quantile. From the figure, wecan see that explanations are larger for users who Liked many pages but remain relativelysmall considering the number of features present, which concurs with the findings by Chenet al. (2017).Finally, in this case study we also adjust our method to incorporate domain-specificpreferences (“costs”) and showcase how they can lead to more comprehensible explana-tions. The explanations we have shown so far (in both case studies) were generated usingthe heuristic search procedure proposed by Martens and Provost (2014), which does notconsider the relevance of the various possible explanations and was designed to find thesmallest explanations first. Nonetheless, short explanations may include Likes of relativelyuncommon pages, which may be unfamiliar to the person analyzing the explanation. Toillustrate how domain preferences can be taken into account when generating explanationsof decisions, let’s say that for our problem, explanations with highly specific Likes are prob-lematic for a feature-based explanation. The recipient of the explanation is much less likelyto know these pages, so he or she would be better served with explanations using popularpages. To this end, we can adjust the heuristic search (as discussed in Section 3.4) to findexplanations that include more relevant—viz., more popular—pages by associating lowercosts to their removal from an instance’s input data. Specifically, we adjust the heuristicsearch so that it penalizes less-popular pages (those with fewer total Likes) by assigningthem a higher cost.Table 10 shows some examples of how the first explanation found by the algorithmchanges depending on whether the relevance heuristic is used. As expected, the explanationsfound when using the relevance heuristic can include more pages than the “shortest first”search; however, those pages are also more popular (as evidenced by their total number ofLikes). Importantly, these examples show how the search procedure can be easily adaptedto find context-specific explanations. In this case, the user may be interested in findingexplanations with popular pages, but the search could also be adjusted to show first theexplanations with pages that were recently Liked by the user or that have pages more closelyrelated to the advertised product.
For our third case study, we illustrate the advantages of our proposed approach when ap-plied to complex systems, including ones that use multiple models to make decisions. Weuse the data set from the KDD Cup 1998, which is available at the UCI Machine LearningRepository. The data set was originally provided by a national veterans organization thatwanted to maximize the profits of a direct mailing campaign requesting donations. There-fore, the business problem consisted of deciding which households to target with directmails. Importantly, one could approach this problem in several ways, such as:1. Using a regression model to predict the amount that a potential target will donate sothat we can target her if that amount is larger than the break-even point.
11. Recall that targeting decisions may have several counterfactual explanations. The numbers we reporthere are the average sizes of the first explanation we found for each targeting decision. xplaining Data-Driven Decisions User ID First explanation found(WITHOUT the relevance heuristic) First explanation found(WITH the relevance heuristic)
11 ‘It’s a Wonderful Life’ (1,181 Likes) ‘Reading’ (47,288 Likes)‘JESUS IS LORD!!!!!!!!!!!!!!!!!!!!!!!!!!! if youknow this is true press like. :)’ (1,291 Likes) ‘American Idol’ (15,792 Likes)‘Classical’ (8,632 Likes)38 ‘The Hollywood Gossip’ (1,353 Likes) ‘Pink Floyd’ (43,045 Likes)‘Remember those who have passed. PressLike if you’ve lost a loved one’ (2,248 Likes) ‘Dancing With The Stars’ (5,379 Likes)‘The Ellen DeGeneres Show’ (16,944 Likes)‘American Idol’ (15,792 Likes)108 ‘Six Degrees Of Separation - TheExperiment’ (3,373 Likes) ‘Star Trek’ (11,683 Likes)‘Turn Facebook Pink For 1 Week ForBreast Cancer Awareness’ (12,942 Likes)‘They’re, Their, and There have 3 distinctmeanings. Learn Them.’ (3,842 Likes)413 ‘Sarcasm as a second language’ (1,540 Likes) ‘Reading’ (47,288 Likes)‘RightChange’ (3,842 Likes) ‘Pink Floyd’ (43,045 Likes)‘Where the Wild Things Are’ (13,781 Likes)‘Proud to be an American’ (3,938 Likes)
Table 10: First counterfactual explanations found ern´andez-Lor´ıa, Provost and Han
2. Using a classification model to predict whether a potential target will donate morethan the break-even point so that we can target her if this is the case.3. Using a classification model to predict the probability that a potential target willdonate and a regression model to predict the amount if the potential target were todonate. By multiplying together the results of these two models, one could obtain theexpected donation amount and send a direct mail if the expected donation is largerthan the break-even point.To showcase system decisions that incorporate multiple models, we illustrate our gen-eralized framework using the third approach, which is also the one that was used by thewinners of the KDD Cup 1998.We use XGBoost for both regression and classification using 70% of the data and thefollowing subset of features: Age of Household Head (AGE), Wealth Rating (WEALTH2),Mail Order Response (HIT), Male active in the Military (MALEMILI), Male Veteran(MALEVET), Vietnam Veteran (VIETVETS), World War two Veteran (WWIIVETS),Employed by Local Government (LOCALGOV), Employed by State Government (STATE-GOV), Employed by Federal Government (FEDGOV), Percent Japanese (ETH7), PercentKorean (ETH10), Percent Vietnamese (ETH11), Percent Adult in Active Military Service(AFC1), Percent Male in Active Military Service (AFC2), Percent Female in Active Mili-tary Service (AFC3), Percent Adult Veteran Age 16+ (AFC4), Percent Male Veteran Age16+ (AFC5), Percent Female Veteran Age 16+ (AFC6), Percent Vietnam Veteran Age16+ (VC1), Percent Korean Veteran Age 16+ (VC2), Percent WW2 Veteran Age 16+(VC3), Percent Veteran Serving After May 1975 Only (VC4), Number of promotions re-ceived in the last 12 months (NUMPRM12), Number of lifetime gifts to card promotionsto date (CARDGIFT), Number of months between first and second gift (TIMELAG), Av-erage dollar amount of gifts to date (AVGGIFT), and Dollar amount of most recent gift(LASTGIFT).In order to motivate the problem, suppose that a system uses the classification andregression models on the holdout 30% of data to target the 5% of households with thelargest (estimated) expected donations, essentially targeting the most profitable householdswith a limited budget. In this case, both the targeters and the targeted may be interestedin explanations for why the system decided to send any particular direct mail. This is aparticularly challenging problem for methods designed to explain model predictions (notdecisions), since the system makes decisions using more than one model. Therefore, it ispossible that the most important features for predicting the probability of donation arenot the same as the most important features for predicting the donation amount, and sodetermining which features led to the targeting decision is not straightforward.To illustrate this better, consider one targeted household in the holdout data, for whichwe computed SHAP values for its predicted probability of donating (given by the classi-fication model) and its predicted donation amount (given by the regression model). Wenormalized the SHAP values for each model prediction so that the sum of the values addsup to 1. The top 5 most important features for the probability prediction and the regres-sion prediction are shown in Figure 4a and Figure 4b respectively. Interestingly, only VC3(percent of 16+ WW2 veterans in the household) is part of the most important featuresfor both the classification model and the regression model. Importantly, we cannot explain xplaining Data-Driven Decisions (a) Top features for probability (b) Top features for amount Figure 4: Features with largest importance weightsthe targeting decision from these figures alone: even though we know the most importantfeatures for each prediction, there is no way of telling what was actually vital for the sys-tem to make the targeting decision. Was the household targeted because of the size of thelast gift (LASTGIFT)? Or would the households high probability of donating justify thetargeting decision even if LASTGIFT had a smaller value?As per our earlier discussions, SHAP may be repurposed to compute feature importanceweights for system decisions that incorporate multiple models by transforming the outputof the system into a scoring function that returns 1 if the household is targeted and returns0 otherwise. However, as we have similarly shown for other problems, acquiring featureimportance weights for decisions made based on expected donations (rather than amountsor probabilities) would still not explain the system decisions. In contrast, counterfactualexplanations can transparently be applied to system decisions that involve more than onemodel. Specifically, by defining the predicted expected donation as a scoring function(which is the result of multiplying the predictions of the two models), we can use the sameprocedures showcased in the previous examples to find explanations for targeting decisions.Table 11 shows the explanations found for the targeted household discussed above.Interestingly, some of the highest-scoring SHAP features, shown in Figures 4, are notpresent in any of the explanations (e.g., MALEVET), whereas some features that are presentin some explanations do not have large SHAP values (e.g., AVGGIFT). In fact, AVGGIFThad a negative SHAP value in the regression model (meaning we would expect its impact onthe non-default decision to be negative), but it appears in all explanations! This exampleillustrates the importance of defining explanations in terms of decisions and not predictions,particularly when dealing with complex, non-linear models, such as XGBoost. ern´andez-Lor´ıa, Provost and Han Features Explanations1 2 3 4 5 6
AGE ↓ WWIIVETS ↑ VC1 ↓ VC2 ↑ VC3 ↑ NUMPRM12 ↑ ↑ ↑ ↑ ↑
CARDGIFT ↑ AVGGIFT ↑ ↑ ↑ ↑ ↑ ↑
LASTGIFT ↑ ↑ ↑ ↑ ↑ ↑↑ means household was targeted because feature is above average. ↓ means household was targeted because feature is below average.Table 11: Explanations for targeting decisionMore specifically, because SHAP attempts to evaluate the overall impact of featureson the model prediction, it averages out the negative and positive impacts that featureshave on the prediction when removed alongside all other feature combinations. Hence, if afeature has a large negative impact in one case and several small positive impacts in othercases, that feature may have a negative SHAP value (if the single negative impact is greaterthan the sum of the small positive impacts). This behavior is the same that we illustratedin Section 4.3 (Example 3), which of course would be counterproductive when trying tounderstand the influence of features on the decision making. Averaging out the impact offeatures over all feature combinations hides the fact that (in non-linear models) featuresmay provide evidence in favor or against a decision depending on what other features areremoved, which explains why AVGGIFT had a negative SHAP value but is present in theexplanations shown in Table 11.
6. Discussion
The previous studies illustrate various advantages of counterfactual explanations over im-portance weighting methods. The first study shows that knowing the importance weight offeatures is not enough to determine how the features affect system decisions. The secondstudy demonstrates the strengths of counterfactual explanations in the presence of high-dimensional data. In particular, the study shows that sampling-based approximations ofimportance weights get worse as the number of features increases. Counterfactual expla-nations sidestep this issue because small subsets of features are usually enough to explain xplaining Data-Driven Decisions decisions. Moreover, the study showcased a heuristic procedure to search and sort coun-terfactual explanations according to their relevance. Finally, the third study shows thatimportance weights may be misleading when decisions are made using multiple (and com-plex) models. More specifically, we see a real instance of the phenomenon we showed inSection 4.3, in which features with negative SHAP weights may in fact have a positive effecton system decisions.It has been argued that a disadvantage of counterfactual explanations is that eachinstance (decision) usually has multiple explanations (Molnar, 2019); this is also referred toas the Rashomon effect. The argument is that this is inconvenient because people may prefersimple explanations over the complexity of the real world. This issue may be exacerbatedas the number of features increases because the number of counterfactual explanations maygrow exponentially. In contrast, most importance weighting methods converge to a uniquesolution (e.g., Shapley values in the case of SHAP), regardless of the number of features.However, our second case study suggests that importance weighting methods may ac-tually not scale well when the number of features increases because their approximationsmay become inconsistent. Moreover, objective measures of relevance (e.g., number of Likesin our Facebook case study) may be incorporated as part of the heuristic procedures usedto find counterfactual explanations. Thus, the fact that the number of counterfactual ex-planations may grow exponentially is not necessarily problematic. Our study shows thatshort, consistent, and relevant explanations are significantly faster to find than computingimportance weights, even when the number of features is large.Something that was not explored in the case studies was the sensitivity of the counter-factual explanations to the method used to deal with missing values. This is an interestingdirection for future research, as we would expect distinct alternatives for dealing with miss-ing features to affect explanations differently. For example, if features are correlated, meanimputation and retraining the model without the removed feature may produce differentresults. For instance, a decision may change when imputing the mean for a removed feature,but if instead the missing feature is dealt with by using a model trained without that fea-ture (Saar-Tsechansky and Provost, 2007), the decision may not change when removing thefeature because other features may capture most of the information given by the removedfeature. Therefore, while our proposed framework would work with either approach, futureresearch should assess the advantages of each approach in different settings.Moreover, this study compared importance weights with a specific type of counterfactualexplanations (formally defined in Section 3.2). Specifically, our explanations are definedin terms of counterfactual worlds in which some of the features are absent when makingdecisions. Nonetheless, there are other types of counterfactual worlds that may be of interestwhen explaining decisions. For example, in our first case study, we showed that some loanapplicants were denied credit because the amount they requested was too large (i.e., thedecision changed when we removed the loan amount feature). While this explains the creditdenial decision, these applicants may instead be interested in the maximum amount theycould ask for, so that they are no longer denied credit. Such a counterfactual explanationcould be defined as a set of “minimal” feature adjustments that changes the decision.Other researchers have proposed various methods to obtain such counterfactual expla-nations. For example, in the context of explaining predictions (not decisions), Wachteret al. (2017) define counterfactual explanations as the smallest change to feature values ern´andez-Lor´ıa, Provost and Han that changes the prediction to a predefined output. Thus, they address explanations as aminimization problem in which larger (user-defined) distances between counterfactual in-stances and the original instance are penalized more. Their method, however, focuses ongradient-based models, does not work with categorical features, and may require access tothe machine learning method used to learn the model (which usually is not available for de-ployed systems). Tolomei et al. (2017) define counterfactual explanations in a similar way,but instead propose how to find such explanations when using tree-based methods. Othercounterfactual methods have also been implemented in the Python package Alibi. Thepackage includes a simple counterfactual method loosely based on Wachter et al. (2017), aswell as an extended method that uses class prototypes to improve the interpretability andconvergence of the algorithm (Van Looveren and Klaise, 2019).Another key assumption behind all the instance-level explanation methods discussed inthis paper (feature importance as well as counterfactual) is that examining an instance’sfeatures will make sense to the user. This presumes at least that the features themselves arecomprehensible. This would not be the case, for example, if the features are too low-level orfor cases where the features have been obfuscated, for example to address privacy concerns(see e.g., the discussion of “doubly deidentified data” by Provost et al. (2009)).Relatedly, another promising direction for future research is to study how users actuallyperceive these different sorts of explanations in practice. In particular, it would be inter-esting to analyze the impact that various types of explanations have on users’ adoption ofAI systems and their decision-making performance. Settings where the decisions made bydeployed AI systems are closely monitored by users (see Lebovitz et al. (2019) for a clearexample) would be ideal for such a study.
7. Conclusion
This paper examines the problem of explaining data-driven decisions made by AI decision-making systems from a causal perspective: if the question we seek to answer is why didthe system make a specific decision, we can ask which inputs caused the system to makeits decision? This approach is advantageous because (a) it standardizes the form that anexplanation can take; (b) it does not require all features to be part of the explanation, and(c) the explanations can be separated from the specifics of the model. Thus, we define a(counterfactual) explanation as a set of features that is causal (meaning that removing theset from the instance changes the decision) and irreducible (meaning that removing anysubset of the features in the explanation would not change the decision).Importantly, this paper shows that explaining model predictions is not the same asexplaining system decisions, because features that have a large impact on predictions maynot have an important influence on decisions. Moreover, we show through various examplesand case studies that the increasingly popular approach of explaining model predictionsusing importance weights has significant drawbacks when repurposed to explain systemdecisions. In particular, we demonstrate that importance weights may be ambiguous oreven misleading when the goal is to understand how features affect a specific decision.Our work generalizes previous work on counterfactual explanations in at least threeimportant ways: (i) we explain system decisions (which may incorporate predictions from
12. See https://github.com/SeldonIO/alibi xplaining Data-Driven Decisions several predictive models) rather than model predictions, (ii) we do not enforce any specificmethod to remove features, and (iii) our explanations can deal with feature sets with ar-bitrary dimensionality and data types. Finally, we also propose a heuristic procedure thatallows the tailoring of explanations to domain needs by introducing costs—for example, thecosts of changing the features responsible for the decision. References
Robert Andrews, Joachim Diederich, and Alan B Tickle. Survey and critique of techniquesfor extracting rules from trained artificial neural networks.
Knowledge-Based Systems , 8(6):373–389, 1995.Vicky Arnold, Nicole Clark, Philip A Collier, Stewart A Leech, and Steve G Sutton. Thedifferential use and effect of knowledge-based system explanations in novice and expertjudgment decisions.
MIS Quarterly , pages 79–97, 2006.Bart Baesens, Tony Van Gestel, Stijn Viaene, Maria Stepanova, Johan Suykens, and JanVanthienen. Benchmarking state-of-the-art classification algorithms for credit scoring.
Journal of the Operational Research Society , 54(6):627–635, 2003.Or Biran and Courtenay Cotton. Explanation and justification in machine learning: Asurvey. In
IJCAI-17 Workshop on Explainable AI (XAI) , volume 8, page 1, 2017.Daizhuo Chen, Samuel P Fraiberger, Robert Moakler, and Foster Provost. Enhancingtransparency and control when drawing data-driven inferences about individuals.
BigData , 5(3):197–212, 2017.Maxime C Cohen, C Daniel Guetta, Kevin Jiao, and Foster Provost. Data-driven investmentstrategies for peer-to-peer lending: A case study for teaching data science.
Big Data , 6(3):191–213, 2018.Mark Craven and Jude W Shavlik. Extracting tree-structured representations of trainednetworks. In
Advances in Neural Information Processing Systems , pages 24–30, 1996.Anupam Datta, Shayak Sen, and Yair Zick. Algorithmic transparency via quantitative inputinfluence: Theory and experiments with learning systems. In , pages 598–617. IEEE, 2016.Shirley Gregor and Izak Benbasat. Explanations from intelligent systems: Theoreticalfoundations and implications for practice.
MIS Quarterly , pages 497–530, 1999.Henrik Jacobsson. Rule extraction from recurrent neural networks: A taxonomy and review.
Neural Computation , 17(6):1223–1263, 2005.Ujwal Kayande, Arnaud De Bruyn, Gary L Lilien, Arvind Rangaswamy, and Gerrit HVan Bruggen. How incorporating feedback mechanisms in a DSS affects DSS evaluations.
Information Systems Research , 20(4):527–546, 2009. ern´andez-Lor´ıa, Provost and Han Michal Kosinski, David Stillwell, and Thore Graepel. Private traits and attributes arepredictable from digital records of human behavior.
Proceedings of the National Academyof Sciences , 110(15):5802–5805, 2013.Michael T Lash, Qihang Lin, W Nick Street, and Jennifer G Robinson. A budget-constrainedinverse classification framework for smooth classifiers. In , pages 1184–1193. IEEE, 2017.Steve LaValle, Eric Lesser, Rebecca Shockley, Michael S Hopkins, and Nina Kruschwitz.Big data, analytics and the path from insights to value.
MIT Sloan Management Review ,52(2):21–32, 2011.Sarah Lebovitz, Natalia Levina, and Hila Lifshitz-Assaf. Doubting the diagnosis: Howartificial intelligence increases ambiguity during professional decision making.
Availableat SSRN 3480593 , 2019.Vincent Lemaire, Raphael F´eraud, and Nicolas Voisine. Contact personalization using ascore understanding method. In , pages 649–654. IEEE,2008.Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. In
Advances in Neural Information Processing Systems , pages 4765–4774, 2017.David Martens and Foster Provost. Explaining data-driven document classifications.
MISQuarterly , 38(1):73–100, 2014.David Martens, Bart Baesens, Tony Van Gestel, and Jan Vanthienen. Comprehensible creditscoring models using rule extraction from support vector machines.
European Journal ofOperational Research , 183(3):1466–1476, 2007.Julie Moeyersoms, Brian d’Alessandro, Foster Provost, and David Martens. Explainingclassification models built on high-dimensional sparse data. In
Workshop on HumanInterpretability in Machine Learning: WHI 2016, June 23, 2016, New York, USA/Kim,Been [edit.] , pages 36–40, 2016.Christoph Molnar. Interpretable machine learning, see 18.1 counterfactual explana-tions. https://christophm.github.io/interpretable-ml-book/counterfactual.html , 2019. Accessed: 2019-12-11.Claudia Perlich, Brian Dalessandro, Troy Raeder, Ori Stitelman, and Foster Provost. Ma-chine learning for targeted display advertising: Transfer learning in action.
MachineLearning , 95(1):103–127, 2014.Foster Provost. Understanding decisions driven by big data, 2014. URL .Foster Provost and Tom Fawcett.
Data Science for Business: What you need to know aboutdata mining and data-analytic thinking . O’Reilly Media, Inc., 2013. xplaining Data-Driven Decisions Foster Provost, Brian Dalessandro, Rod Hook, Xiaohan Zhang, and Alan Murray. Audienceselection for on-line brand advertising: privacy-friendly social network targeting. In
Pro-ceedings of the 15th ACM SIGKDD International Conference on Knowledge Discoveryand Data Mining , pages 707–716. ACM, 2009.Yanou Ramon, David Martens, Foster Provost, and Theodoros Evgeniou. Counterfactualexplanation algorithms for behavioral and textual data. arXiv preprint arXiv:1912.01819 ,2019.Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. Why should I trust you?: Ex-plaining the predictions of any classifier. In
Proceedings of the 22nd ACM SIGKDDInternational Conference on Knowledge Discovery and Data Mining , pages 1135–1144.ACM, 2016.Marko Robnik-ˇSikonja and Igor Kononenko. Explaining classifications for individual in-stances.
IEEE Transactions on Knowledge and Data Engineering , 20(5):589–600, 2008.Maytal Saar-Tsechansky and Foster Provost. Handling missing values when applying clas-sification models.
Journal of Machine Learning Eesearch , 8(Jul):1623–1657, 2007.Ethan L Schreiber, Richard E Korf, and Michael D Moffitt. Optimal multi-way numberpartitioning.
Journal of the ACM (JACM) , 65(4):24, 2018.Galit Shmueli and Otto R Koppius. Predictive analytics in Information Systems research.
MIS Quarterly , pages 553–572, 2011.Erik ˇStrumbelj and Igor Kononenko. An efficient explanation of individual classificationsusing game theory.
Journal of Machine Learning Research , 11:1–18, 2010.Erik ˇStrumbelj, Igor Kononenko, and M Robnik ˇSikonja. Explaining instance classificationswith interactions of subsets of feature values.
Data & Knowledge Engineering , 68(10):886–904, 2009.Gabriele Tolomei, Fabrizio Silvestri, Andrew Haines, and Mounia Lalmas. Interpretablepredictions of tree-based ensembles via actionable feature tweaking. In
Proceedings of the23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining ,pages 465–474. ACM, 2017.Arnaud Van Looveren and Janis Klaise. Interpretable counterfactual explanations guidedby prototypes. arXiv preprint arXiv:1907.02584 , 2019.Sandra Wachter, Brent Mittelstadt, and Chris Russell. Counterfactual explanations withoutopening the black box: Automated decisions and the GPDR.
Harv. JL & Tech. , 31:841,2017.Fulton Wang and Cynthia Rudin. Falling rule lists. In
Artificial Intelligence and Statistics ,pages 1013–1022, 2015.,pages 1013–1022, 2015.