Cognitive Perspectives on Context-based Decisions and Explanations
aa r X i v : . [ c s . A I] J a n Cognitive Perspectives on Context-based Decisions and Explanations ∗ Marcus Westberg , Kary Fr¨amling
Computer Science Department, Ume˚a University, [email protected], [email protected]
Abstract
When human cognition is modeled in Philosophyand Cognitive Science, there is a pervasive idea thathumans employ mental representations in order tonavigate the world and make predictions about out-comes of future actions. By understanding howthese representational structures work, we not onlyunderstand more about human cognition but alsogain a better understanding for how humans ratio-nalise and explain decisions. This has an influenc-ing effect on explainable AI, where the goal is toprovide explanations of computer decision-makingfor a human audience. We show that the Contex-tual Importance and Utility method for XAI sharean overlap with the current new wave of action-oriented predictive representational structures, inways that makes CIU a reliable tool for creatingexplanations that humans can relate to and trust.
Both making a decision and explaining a decision involvesthe structuring of concepts to provide a model for the con-text within which the decision took place and how the deci-sion changed the factors present in this contest. For example,searching for a pair of scissors in a kitchen can involve thedecision to open a drawer. The decision involves the appli-cation of the concept of ’scissors’, ’drawer’ as well as theirrelationship to each other. The localised understanding thatscissors tend to be in kitchen drawers and that drawers arecontainers that can be pulled to have their contents revealedestablishes affordances which aid the world navigation pro-cess. These applications of concepts need not be part of de-liberate reasoning but could rather be an intuitive reaction asthe result of ingrained patterns of knowledge and action, assuggested by [Kahneman, 2011]. Explanations of decisionsin turn attempt to retread these deliberations, or create post-hoc narratives to explain deliberations hidden to us. Similarapproaches in relation to eXplainable AI (XAI) have been ex-plored by [Miller, 2019], based on the idea of explanation as ∗ The work is partially supported by the Wallenberg AI, Au-tonomous Systems and Software Program (WASP) funded by theKnut and Alice Wallenberg Foundation. conversation [Hilton, 1990], where the task is to ”resolve apuzzle in the explainee’s mind about why the event happenedby closing a gap in his or her knowledge.” In this paper weargue that explanations, while not always perfectly accuratein regards to reality (due to the existence of hidden factors),are best structured and involve the same conceptual basis asdecisions, and that when trying to understand an explanationwe do so by simulating the decision-making process throughthe explanation provided. In other words, what makes a goodexplanation of an agent-based action is that it presents us witha reasoning structure that we can follow and relate to our owndecision-making processes. It is thus imperative that the ex-planations provided by artificial agents, in the context of XAI,not only provide deliberations that we can follow, but moreimportantly provide them in a conceptual framework whichfacilitates retreading the deliberation and is context-sensitive.When looking for a method to provide explanations ofAI decisions, it is thus imperative that the method em-ployed is one that produces explanations that are mean-ingful. Contextual Importance and Utility (CIU) as devel-oped by [Fr¨amling and Graillot, 1995; Fr¨amling, 1996] pro-vides a method for explaining decisions that utilises context-sensitivity. The benefit of CIU is that it provides a model-neutral approach to XAI and, as we will argue in this paper,complements current trends in cognitive philosophy whichutilise both predictive models and action-based processes inthe form of embodiment or action-oriented representations toexplain cognitive processes. In HCI and XAI, parity betweenartificial processes and human processes can be beneficialfor engendering trust and encouraging interaction, as well asopening up the possibility of collaborative or mutualistic de-velopments [Westberg et al. , 2019]. In the following sectionswe will have a look at the evolution of (mental) representa-tions in both cognition and computation. We will show howCIU can be linked to current theories of world navigation anddecision-making.
CIU employs the concepts of Importance and Ulility as amethod of explaining AI decisions. Importance here refers tothe importance of an input, while Utility refers to the value ofan input in terms of how well it approximates the desired cri-teria. Both these concepts are employed on a contextual basisas Contextual Importance (CI) and Contextual Utility (CU),ogether forming CIU. Formally CI is defined as follows: CI j ( ~C, { i } ) = Cmax j ( ~C, { i } ) − Cmin j ( ~C, { i } ) absmax j − absmin j (1)where { i } is a set of inputs and j is a specific output in thecontext of ~C . Consequently CU is formally defined as: CU j ( ~C, { i } ) = out j ( ~C ) − Cmin j ( ~C, { i } ) Cmax j ( ~C, { i } ) − Cmin j ( ~C, { i } ) (2)When answering questions such as ”why did you do X ” or”why did you not choose Y ”, CIU evaluates decisions in ex-planation by looking at the importance and utility of factorsin the decision-making process. Both importance and utilityare context-sensitive and may change depending on the situ-ation, but for any given context the definition of importanceand utility are as follows:Importance highlights what factors are important for agiven decision. When deciding which pair of trousers to buyin a store, size will definitely be an important factor, as maydesign and price. After all, we want trousers that fit us , de-sign that we like or feel comfortable in, and a price that wefind affordable. The utility values of a given pair of trouser’ssize, design and price determine how well these individualfactors fulfil the criteria of what we want out of our decision(so for a person with a 30 inch waist, trousers of waist size 30would carry a high utility value). Utility in this sense is notlinear, because trousers that are too large and too small bothhave low utility values, while a pair that fit perfectly have avery high utility value. Predictive vision in philosophy [Clark, 2015;Clark, 2016; Hohwy, 2013] and cognitive science[Hinton, 2007; Friston, 2008; Knill and Pouget, 2005;Moreno-Bote et al. , 2011] promotes a pro-active account ofperception where mind and world meet halfway in order toform a more efficient way of gathering and sorting inputsignals, i.e. information about the world. This is done bycreating an internal representational model of the worldthrough which an individual forms predictions about futureinput given a certain planned or expected adjustment to theworld (either through movement of the body through theworld, thus adjusting perceptual perspective, or throughmodifying aspects of the world by moving objects or makingother alterations). These predictions are then matched withthe incoming input, comparing the expectations (seeing apair of scissors then opening a drawer) to the result. Errorsignals are then generated where expectations and realitymismatch (absence of a pair of scissors would generate amuch stronger error signal than the scissors being in an oddor otherwise unexpected position within the drawer). These This is of course also context-sensitive. Oversized trouserswould gain importance in the context of a clown buying trousersfor a show. error signals are then explained away by updating our under-standing of the world and creating a new model (hypothesis)for where to find the scissors. In this way, most kinds ofdecision-making involve this type of prediction-based modelgeneration. The concepts of importance and utility can beeasily incorporated into predictive vision to highlight howthese concepts operate.When looking for a pair of scissors, I model my expecta-tions of the scissors being in the kitchen, inside a drawer. Tolocate the scissors I will navigate into the kitchen and startopening drawers, looking for scissor-shapes. When doing sothe importance and utility of the visual input changes. Myvisual processing will prioritize focus on scissor-like shapes(pointy object with two round holes) and objects that havefeatures in common with my scissors. Thus the importanceof such objects in my visual field will rise, their weights willbe greater and my focus will be drawn to them. If my pair ofscissors have an orange handle, then orange objects will alsogain some amount of importance. Objects that fit all the cri-teria will then gain the highest weight of all, and thus becomethe anchoring of my focus when found.When choosing between options during a decision process,utility values come into play. Again, these can be predictivein the sense that we have a (detailed or approximate) modelfor the ideal candidate in our mind. We may have an idea ofwhat the ideal size of a mug is when we’re looking for one,and depending on the type of beverage and how much wewant of it, the ideal mug size will vary. For example, a cupof latte and a cup of espresso have very different ideal sizevalues. In our decision-making process our focus will thusdisregard mugs that fall too far outside the ideal range, i.e.are too big or too small. This range may alter depending onour available options. For example, if when looking throughthe cupboard we realise that all the big mugs are missing, wemay instead start looking at the biggest available out of thesmall mugs. Alternatively we may adjust our world modeland start looking for the ideal mug in the dish rack or dish-washer. Our decision capacity is thus not split into separatelocal importance values of a latte-context mug to be of size x and an espresso-context cup to be of size y , but rather weexploit a global importance of mug size in beverage contextsthat is regulated by the contextual utility values, allowing forflexibility when reevaluating the environment. A representative system is defined by [Haugeland, 1991] asfollows: In acting upon the environment, the system makesuse of features that are not always reliably present to the sys-tem. What this means is that when navigating the world, thesystem makes use of features that it is aware of but that arenot in the system’s immediate vicinity (i.e. no input signalsof such a feature are currently accessible to the system). Forexample, when a human decides to walk to a museum, thehuman’s cognitive system will coordinate its navigation withthe environmental feature of a museum (including location ofthe nearest museum, or a specific one depending on context)even if the museum is physically out of reach and view. Inturn this navigation will employ other features such as knownaths to the museum, turns to take etc. In order to do so,there must be something that the system can make use of thatstands in for the actual input signal of a museum. This stand-in feature is a mental representation. By employing a repre-sentation of a museum, a human can think about and makedecisions involving museums without actively perceiving amuseum. By contrast, flowers do not need a representationof the sun in order to track it with their leaves and adjust foroptimal sunlight intake, for whenever such coordination withthe environment is happening the input signals (sun rays) arepresent. Haugeland further states that for a system to be rep-resentational, these stand-in processes need to be part of asystematic approach where representations are employed in avariety of states, such that a representation is not unique to asingular situation but can be employed in general ways, andthat the representations carry meaning within the system insuch a way that they could be employed improperly (mistak-enly going to Anna’s house when intending to go to Hanna’shouse) or misrepresent (thinking of the moon as being madeof cheese).While this constitutes a general account of what represen-tations are , the proposed structure and dynamics of represen-tations has been iterated upon from their original conception,and old iterations are still relevant in certain contexts and ap-plications. We will go through the most prominent accountsof representations, starting with what could be referred to asthe ’classical’ account and then follow the growing influenceof embodiment in cognition.
Historically speaking, it is difficult to determine the exact ori-gin of what can be classified as the classical form of repre-sentational systems. On a contemporary scale, a good exam-ple of classical representations can be found in [Fodor, 1975;Fodor, 1981]. Classical representational systems are primar-ily focused on globally effective representations possessinggeneral properties that define a class of objects or conceptsthat require representing. In a classical system, searching foror thinking about a pair of scissors involves deploying a rep-resentational concept that structurally consists of the featuresan object would possess that identifies it as a pair of scissors.This would involve shape, feel, capacity etc. [Clark, 1997].Thus, when thinking about a pair of scissors, we would bethinking about an object with two sharp blades, two ring-shaped holes, a capacity to be pulled apart and pushed to-gether as a means of cutting, and so on. More abstract repre-sentations could involve entities such as ”Wednesday” or even”next Wednesday”. Representations are employed in think-ing about the things that they relate to, and does so globallyin that they apply equally in all contexts involving them. Theworry about a deadline next Wednesday employs the repre-sentation of ”Wednesday” just as much as the happy memo-ries of the date last Wednesday, or the plan to go to the doctoron Wednesday the 23rd.There is an enduring debate over how the content of repre-sentations is determined. There are internalists who view thecontent of a mental representation to be private and dependenton individually intrinsic properties, and there are externalists[Burge, 1979; Putnam, 1975] who view representational con- tent as being public and determined (to varying extents) byenvironmental factors. To an externalist, one person who be-lieves that the moon is made of cheese and another personwho believes the moon to be made of rock would still possessthe same representation of ”moon”, because this representa-tion is informed by the combination of natural (the moon inthe sky) and social (interactions with others talking about orotherwise referring to the moon) environmental factors. Thefact that their beliefs are different is simply due to one ofthese individuals being misinformed (and thus misrepresent-ing). By contrast, an internalist would argue that the represen-tational contents would be different in virtue of these differentbeliefs about the composition of the moon.
While the classical account for representational systems pro-vides a robust model for thinking about things in the world,its proposed global structure seems inefficient when it comesto picking out specific objects from a group. A person maybe looking for their mug on a shelf full of other mugs. In thiscontext, many objects in the visual field can be identified asmugs as per the representation. This person’s mug may beblue, giving it a unique qualifier, but even then the mug’s fea-ture of being blue is a relatively minor feature of the globalrepresentation ”my mug”, which also includes all distinctivefeatures identifying the representation ”mug”. Following this,it seems that the majority of computational effort spent iden-tifying ”mug” features is redundant. Research on animatevision systems by [Ballard, 1991] provided an alternative tothe theory on visual representations proposed by [Marr, 1980;Marr, 1982] and produced a more efficient approach to com-putational load in vision, involving personalised local repre-sentations. These personalised representations put extra em-phasis on features that are helpful in local identification, butnot necessarily in global identification. For example, in therepresentation of ”my mug” the importance of the colour bluemay be much greater for the identification process than othertraits common to all mugs, even if the feature of colour itselfhas low importance in identifying an object as a mug. Thisputs personalised, or agent-dependent, representations in an-imate vision in stark contrast with agent-independent prop-erties of classical systems. In this sense, personalised rep-resentations employ context-sensitivity: the representation of”my mug is blue” is only effective in the context of lookingfor one’s mug, but it is also preferable over global descrip-tions during search tasks and, especially, in communicationand explanation. After all, when we produce explanationsof our world navigation process, for example answering thequestion ”what are you looking for”, we are unlikely to state”I’m looking for my mug” and then proceed to describe gen-eral mug descriptions, instead our explanatory focus is on thepersonalised aspects such as ”I’m looking for my mug, it’sblue”. It is also important to note that even if personalised an-imate vision representations produce a stronger alternative toclassical representations in some contexts, this does not makethe latter obsolete. In fact, a complete account of human cog-nitive capacities involving representations would likely em-ploy both global and local ways of representing, as purelylocalised methods fall short when it comes to context-neutralr open-ended tasks [Clark, 1997]. [Clark, 1997] argues that the complex internal states that con-stitute representations in some way function as specific in-formation carriers through their correlation both within thesystem as well as with the body and world, suggesting thatrepresentations may be entangled with the world through ac-tion. Building upon ideas of embodied couplings with theworld [Brooks, 1991b; Brooks, 1991a; Varela et al. , 1991],action-oriented representations stress the entanglement be-tween mind, body and world by presenting representationalstructure that not only describes the world but prescribes pos-sible actions and responses [Clark, 2016]. Instead of percep-tion being data set up for exploitation by independent actionsystems, action-oriented representations create a representa-tional bridge between perception and action.Clark argues that the Mataric robot [Mataric, 1990;Mataric, 1992], which registers landmarks in a maze as acombination of sensory input and current motion, displaysthe use of exactly these types of representations. A narrowcorridor represents both the visual image of the corridor, aswell as “forward motion”. As the robot creates a map of themaze, it is thus full of visual information as well as recipesfor action [Clark, 2015].A football can be represented as a round object of certainhardness and visual pattern, but in action-oriented terms afootball can also be represented as ”affords kicking”. Further-more, contextual weights can determine what kind of actionsare appropriate in a given environment. During a game a foot-ball, the prescribed action structure would involve ”passing”,”dribbling” and ”scoring”, while in the context of friendlyplay or a game without rules these action affordances maybe much more general or basic. Additionally, certain con-texts may change prescriptive actions to proscriptive, suchas ”pick up” or ”throw” during a game with enforced no-hands rules. Such prescriptions and proscriptions, and therecognition of contextual shifts, are part of a learning processwhere the dynamic bonds between mind and environment areformed. Individuals who are unfamiliar or new to the contextof a football game may during play momentarily regress toa more global set of affordances and perform a proscriptiveaction causing a foul (such as blocking a suddenly incomingball with their hands) and then exclaim ”oh, I forgot!”.Just like personalised representations above, action-oriented representations also show descriptive power in com-munication and explanation. For example, when Dan is askedwhy he gave Stacy a glass of water, an appropriate answerwould be ”because she was thirsty” or ”so that she couldhave a drink”. This refers to the action affordance of water todrink it, and its capacity to slake thirst, thus the important fea-tures of water in this explanatory description in this contextare action-oriented. Even though it would be accurate to say”because water is a liquid and Stacy requires water to slakethirst”, this is an inefficient and awkward explanation of theaction. In the context of XAI, a robot providing the formertype of explanation would generate far more social connec-tion and understanding from a human user than the latter.
To put these affordances in terms of contextual importanceand utility, the importance of the ”pick up” action may bevery high during a game of football, with its utility beingvery low, i.e. it is very important not to pick up the ball.Meanwhile, while passing around a ball in a backyard withfriends, the importance of ”pick up” would decrease becauseof a lack of purpose or benefit of this action. Incidentally, theutility would also likely go up slightly because touching theball with one’s hands is neither a negative (proscribed) act norespecially beneficial. Through this perspective, CIU perfectlyillustrates in these cases how the weights system of represen-tational structures may alter between contexts, in a way thatmakes sense for human behaviour.In american football, the affordances of throwability andcatchability are very important when selecting a ball. How-ever, ”throwability” and ”catchability” are intermediate con-cepts made up of more basic features that a ball can have,e.g. size, shape, inflatedness etc. [Fr¨amling, 1996]. Thesebasic values may change individually to affect the intermedi-ate value. ”Deflategate” was an NFL controversy involving ateam using deflated balls in order to gain an advantage in play.Slightly deflated balls of 10.5 PSI are easier to grip comparedto the rules-legal standard of 12.5 PSI. This means that theutility value for a 10.5 PSI ball is very good for inflatednessand thus also throwability, but in the context of wanting tofollow the rules, this value is very bad. Thus when attempt-ing to pick out a ’good’ ball, the situation (context) has a greatimpact on how the intermediate concepts that make the ball’good’ are interpreted.
The primary strength of CIU as a method of explanation in AIis the integration of utility into the decision-making narrative.This gives it the upper hand over exclusively importance-focused methods such as LIME [Ribeiro et al. , 2016] be-cause, as shown in this paper, to create a complete accountof the human decision-making process, representational ac-counts on human cognition not only recognise importance,but utility as well. The problem to solve about AI is not onlya computational problem, but a social one as well. Ultimately,AI interfaces with human lives, and as such it is important forcommunication between humans and AI to be meaningful. Interms of XAI, meaningful explanations carry a requirement ofrelatability to its audience. CIU is a proof-of-concept for thenew cognitive landscape of context-sensitive representationsin decision-making and world navigation, and fills in the me-chanical gaps that philosophical theories don’t fully provide.In turn, this means that CIU is not only a good method of ex-planation, but also a method of explanation that complementshuman mechanisms of understanding and organising the envi-ronment. From a philosophical standpoint, we argue that thismakes CIU a strong candidate in XAI for communicating ex-planations not only to experts but for layperson sensibilitiesas well. eferences [Ballard, 1991] Dana Ballard. Animate vision.
Artificial In-telligence , 48:57–86, 1991.[Brooks, 1991a] R. A. Brooks. Intelligence without reason.In
Proceedings of the Twelveth Intarnationl Joint Confer-ence on Artificial Intelligence , pages 569–595, 1991.[Brooks, 1991b] R. A. Brooks. Intelligence without repre-sentation.
Artificial Intelligence , 47:139–159, 1991.[Burge, 1979] Tyler Burge. Individualism and the mental.
Midwest Studies in Philosophy , 4(1):73–122, 1979.[Clark, 1997] Andy Clark.
Being There: Putting Brain,Body, and World Together Again . MIT Press, 1997.[Clark, 2015] Andy Clark. Radical predictive processing.
Southern Journal of Philosophy , 53(S1):3–27, 2015.[Clark, 2016] Andy Clark.
Surfing Uncertainty: Prediction,Action, and the Embodied Mind . Oxford University Press,2016.[Fodor, 1975] Jerry A. Fodor.
The Language of Thought .Harvard University Press, 1975.[Fodor, 1981] Jerry A. Fodor.
Representations: Philosophi-cal Essays on the Foundations of Cognitive Science . MITPress, 1981.[Fr¨amling and Graillot, 1995] Kary Fr¨amling and DidierGraillot. Extracting Explanations from Neural Networks.In
ICANN’95 Conference , page 6, 1995.[Fr¨amling, 1996] Kary Fr¨amling. Explaining Results ofNeural Networks by Contextual Importance and Utility.
Proceedings of the AISB’96 conference , 1996.[Friston, 2008] Karl Friston. Hierarchical models in thebrain.
PLOS Computational Biology , 4(11):1–24, 11 2008.[Haugeland, 1991] John Haugeland. Representational gen-era. In
Philosophy and connectionist theory. , Develop-ments in connectionist theory., pages 61–89. LawrenceErlbaum Associates, Inc, Hillsdale, NJ, US, 1991.[Hilton, 1990] Denis Hilton. Conversational processes andcausal explanation.
Psychological Bulletin , 107:65–81, 011990.[Hinton, 2007] Geoffrey Hinton. To recognize shapes, firstlearn to generate images.
Progress in brain research ,165:535–47, 02 2007.[Hohwy, 2013] Jakob Hohwy.
The Predictive Mind . OxfordUniversity Press, 2013.[Kahneman, 2011] Daniel Kahneman.
Thinking, Fast andSlow . Farrar, Straus and Giroux, 2011.[Knill and Pouget, 2005] David Knill and Alexandre Pouget.The bayesian brain: the role of uncertainty in neural cod-ing and computation. trends neurosci. 27, 712-719.
Trendsin neurosciences , 27:712–9, 01 2005.[Marr, 1980] David Marr. Visual information processing: thestructure and creation of visual representations.
Philo-sophical Transactions of the Royal Society of London. B,Biological Sciences , 290(1038):199–218, 1980. [Marr, 1982] David Marr.
Vision: A Computational Investi-gation into the Human Representation and Processing ofVisual Information . Henry Holt and Co., Inc., USA, 1982.[Mataric, 1990] Maja J. Mataric. Navigating with a rat brain:A neurobiologically-inspired model for robot spatial rep-resentation. In
Proceedings of the First International Con-ference on Simulation of Adaptive Behavior on From An-imals to Animats , pages 169–175, Cambridge, MA, USA,1990. MIT Press.[Mataric, 1992] Maja J. Mataric. Integration of representa-tion into goal-driven behavior-based robots.
IEEE Trans-actions on Robotics and Automation , 8(3):304–312, June1992.[Miller, 2019] Tim Miller. Explanation in artificial intelli-gence: Insights from the social sciences.
Artificial Intelli-gence , 267:1–38, 02 2019.[Moreno-Bote et al. , 2011] Rub´en Moreno-Bote, DavidKnill, and Alexandre Pouget. Bayesian sampling in visualperception.
Proceedings of the National Academy ofSciences of the United States of America , 108:12491–6,07 2011.[Putnam, 1975] Hillary Putnam. The meaning of ’meaning’.
Minnesota Studies in the Philosophy of Science , 7:131–193, 1975.[Ribeiro et al. , 2016] Marco Tulio Ribeiro, Sameer Singh,and Carlos Guestrin. Why should i trust you?: Explain-ing the predictions of any classifier, 2016.[Varela et al. , 1991] Francisco Varela, Evan Thompson, andEleanor Rosch.
The Embodied Mind: Cognitive Scienceand Human Experience . MIT Press, 1991.[Westberg et al. , 2019] Marcus Westberg, Amber Zelvelder,and Amro Najjar. A historical perspective on cognitivescience and its influence on xai research. In DavideCalvaresi, Amro Najjar, Michael Schumacher, and KaryFr¨amling, editors,