Mehdi Keramati
University College London
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Mehdi Keramati.
Proceedings of the National Academy of Sciences of the United States of America | 2016
Mehdi Keramati; Peter Smittenaar; R. J. Dolan; Peter Dayan
Significance Solving complex tasks often requires estimates of the future consequences of current actions. Estimates could be learned from past experience, but they then risk being out of date, or they could be calculated by a form of planning into the future, a process that is computationally taxing. We show that humans integrate learned estimates into their planning calculations, saving mental effort and time. We also show that increasing time pressure leads to reliance on learned estimates after fewer steps of planning. We suggest a normative rationale for this effect using a computational model. Our results provide a perspective on how the brain combines different decision processes collaboratively to exploit their comparative computational advantages. Behavioral and neural evidence reveal a prospective goal-directed decision process that relies on mental simulation of the environment, and a retrospective habitual process that caches returns previously garnered from available choices. Artificial systems combine the two by simulating the environment up to some depth and then exploiting habitual values as proxies for consequences that may arise in the further future. Using a three-step task, we provide evidence that human subjects use such a normative plan-until-habit strategy, implying a spectrum of approaches that interpolates between habitual and goal-directed responding. We found that increasing time pressure led to shallower goal-directed planning, suggesting that a speed-accuracy tradeoff controls the depth of planning with deeper search leading to more accurate evaluation, at the cost of slower decision-making. We conclude that subjects integrate habit-based cached values directly into goal-directed evaluations in a normative manner.
eLife | 2014
Mehdi Keramati; Boris Gutkin
Efficient regulation of internal homeostasis and defending it against perturbations requires adaptive behavioral strategies. However, the computational principles mediating the interaction between homeostatic and associative learning processes remain undefined. Here we use a definition of primary rewards, as outcomes fulfilling physiological needs, to build a normative theory showing how learning motivated behaviors may be modulated by internal states. Within this framework, we mathematically prove that seeking rewards is equivalent to the fundamental objective of physiological stability, defining the notion of physiological rationality of behavior. We further suggest a formal basis for temporal discounting of rewards by showing that discounting motivates animals to follow the shortest path in the space of physiological variables toward the desired setpoint. We also explain how animals learn to act predictively to preclude prospective homeostatic challenges, and several other behavioral patterns. Finally, we suggest a computational role for interaction between hypothalamus and the brain reward system. DOI: http://dx.doi.org/10.7554/eLife.04811.001
Psychological Review | 2017
Mehdi Keramati; Audrey Durand; Paul Girardeau; Boris Gutkin; Serge H. Ahmed
Drug addiction implicates both reward learning and homeostatic regulation mechanisms of the brain. This has stimulated 2 partially successful theoretical perspectives on addiction. Many important aspects of addiction, however, remain to be explained within a single, unified framework that integrates the 2 mechanisms. Building upon a recently developed homeostatic reinforcement learning theory, the authors focus on a key transition stage of addiction that is well modeled in animals, escalation of drug use, and propose a computational theory of cocaine addiction where cocaine reinforces behavior due to its rapid homeostatic corrective effect, whereas its chronic use induces slow and long-lasting changes in homeostatic setpoint. Simulations show that our new theory accounts for key behavioral and neurobiological features of addiction, most notably, escalation of cocaine use, drug-primed craving and relapse, individual differences underlying dose-response curves, and dopamine D2-receptor downregulation in addicts. The theory also generates unique predictions about cocaine self-administration behavior in rats that are confirmed by new experimental results. Viewing addiction as a homeostatic reinforcement learning disorder coherently explains many behavioral and neurobiological aspects of the transition to cocaine addiction, and suggests a new perspective toward understanding addiction.
Nature Human Behaviour | 2017
Joaquin Navajas; Chandni Hindocha; Hebah Foda; Mehdi Keramati; P.E. Latham; Bahador Bahrami
Confidence is the ‘feeling of knowing’ that accompanies decision-making. Bayesian theory proposes that confidence is a function solely of the perceived probability of being correct. Empirical research has suggested, however, that different individuals may perform different computations to estimate confidence from uncertain evidence. To test this hypothesis, we collected confidence reports in a task in which subjects made categorical decisions about the mean of a sequence. We found that for most individuals, confidence did indeed reflect the perceived probability of being correct. However, in approximately half of them, confidence also reflected a different probabilistic quantity: the perceived uncertainty in the estimated variable. We found that the contribution of both quantities was stable over weeks. We also observed that the influence of the perceived probability of being correct was stable across two tasks, one perceptual and one cognitive. Overall, our findings provide a computational interpretation of individual differences in human confidence.Using behavioural experiments and computational modelling, Navajas and colleagues provide a systematic characterization of individual differences in human confidence.
PLOS Computational Biology | 2017
Julie J. Lee; Mehdi Keramati
Decision-making in the real world presents the challenge of requiring flexible yet prompt behavior, a balance that has been characterized in terms of a trade-off between a slower, prospective goal-directed model-based (MB) strategy and a fast, retrospective habitual model-free (MF) strategy. Theory predicts that flexibility to changes in both reward values and transition contingencies can determine the relative influence of the two systems in reinforcement learning, but few studies have manipulated the latter. Therefore, we developed a novel two-level contingency change task in which transition contingencies between states change every few trials; MB and MF control predict different responses following these contingency changes, allowing their relative influence to be inferred. Additionally, we manipulated the rate of contingency changes in order to determine whether contingency change volatility would play a role in shifting subjects between a MB and MF strategy. We found that human subjects employed a hybrid MB/MF strategy on the task, corroborating the parallel contribution of MB and MF systems in reinforcement learning. Further, subjects did not remain at one level of MB/MF behaviour but rather displayed a shift towards more MB behavior over the first two blocks that was not attributable to the rate of contingency changes but rather to the extent of training. We demonstrate that flexibility to contingency changes can distinguish MB and MF strategies, with human subjects utilizing a hybrid strategy that shifts towards more MB behavior over blocks, consequently corresponding to a higher payoff.
Current Opinion in Neurobiology | 2017
Mehdi Keramati; Serge H. Ahmed; Boris Gutkin
Drug addiction is a complex behavioral and neurobiological disorder which, in an emergent brain-circuit view, reflects a loss of prefrontal top-down control over subcortical circuits governing drug-seeking and drug-taking. We first review previous computational accounts of addiction, focusing on cocaine addiction and on prevalent dopamine-based positive-reinforcement and negative-reinforcement computational models. Then, we discuss a recent computational proposal that the progression to addiction is unlikely to result from a complete withdrawal of the goal-oriented decision system in favor the habitual one. Rather, the transition to addiction would arise from a drug-induced alteration in the structure of organismal needs which reorganizes the goal structure, ultimately favoring predominance of drug-oriented goals. Finally, we outline unmet challenges for future computational research on addiction.
PLOS ONE | 2018
Uri Hertz; Bahador Bahrami; Mehdi Keramati
Every day we make choices under uncertainty; choosing what route to work or which queue in a supermarket to take, for example. It is unclear how outcome variance, e.g. uncertainty about waiting time in a queue, affects decisions and confidence when outcome is stochastic and continuous. How does one evaluate and choose between an option with unreliable but high expected reward, and an option with more certain but lower expected reward? Here we used an experimental design where two choices’ payoffs took continuous values, to examine the effect of outcome variance on decision and confidence. We found that our participants’ probability of choosing the good (high expected reward) option decreased when the good or the bad options’ payoffs were more variable. Their confidence ratings were affected by outcome variability, but only when choosing the good option. Unlike perceptual detection tasks, confidence ratings correlated only weakly with decisions’ time, but correlated with the consistency of trial-by-trial choices. Inspired by the satisficing heuristic, we propose a “stochastic satisficing” (SSAT) model for evaluating options with continuous uncertain outcomes. In this model, options are evaluated by their probability of exceeding an acceptability threshold, and confidence reports scale with the chosen option’s thus-defined satisficing probability. Participants’ decisions were best explained by an expected reward model, while the SSAT model provided the best prediction of decision confidence. We further tested and verified the predictions of this model in a second experiment. Our model and experimental results generalize the models of metacognition from perceptual detection tasks to continuous-value based decisions. Finally, we discuss how the stochastic satisficing account of decision confidence serves psychological and social purposes associated with the evaluation, communication and justification of decision-making.
European Journal of Neuroscience | 2018
Arsham Afsardeir; Mehdi Keramati
Goal‐directed planning in behavioural and neural sciences is theorized to involve a prospective mental simulation that, starting from the animals current state in the environment, expands a decision tree in a forward fashion. Backward planning in the artificial intelligence literature, however, suggests that agents expand a mental tree in a backward fashion starting from a certain goal state they have in mind. Here, we show that several behavioural patterns observed in animals and humans, namely outcome‐specific Pavlovian‐to‐instrumental transfer and differential outcome effect, can be parsimoniously explained by backward planning. Our basic assumption is that the presentation of a cue that has been associated with a certain outcome triggers backward planning from that outcome state. On the basis of evidence pointing to forward and backward planning models, we discuss the possibility of brain using a bidirectional planning mechanism where forward and backward trees are expanded in parallel to achieve higher efficiency.
bioRxiv | 2017
Uri Hertz; Bahador Bahrami; Mehdi Keramati
Every day we make choices under uncertainty; choosing what route to work or which queue in a supermarket to take, for example. It is unclear how outcome variance, e.g. waiting time in a queue, affects decisions when outcome is stochastic and continuous. For example, how does one choose between an option with unreliable but high expected reward, and an option with more certain but lower expected reward? Here we used an experimental design where two choices’ payoffs took continuous values, to examine the effect of outcome variance on decisions and confidence. Inconsistent with expected utility predictions, our participants’ probability of choosing the good option decreased when both better and worse options’ payoffs were more variable. Confidence ratings were affected by outcome variability only when choosing the good option. Inspired by the satisficing heuristic, we propose a “stochastic satisficing” (SSAT) model for choosing between options with continuous uncertain outcomes. In this model, decisions are made by comparing the available options’ probability of exceeding an acceptability threshold and confidence reports scale with the chosen option’s satisficing probability. The SSAT model best explained choice behaviour and most successfully predicted confidence ratings. We further tested the model’s prediction in a second experiment where choice and confidence behaviours were found to be consistent with the SSAT simulations. Our model and experimental results generalize the cognitive heuristic of satisficing to stochastic contexts and thus provide an account of bounded rationality in the face of uncertainty.
Current Biology | 2017
Armin Lak; Kensaku Nomoto; Mehdi Keramati; Masamichi Sakagami; Adam Kepecs