Mehdi Khamassi
University of Paris
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Mehdi Khamassi.
Frontiers in Neuroscience | 2012
Mark D. Humphries; Mehdi Khamassi; Kevin N. Gurney
We continuously face the dilemma of choosing between actions that gather new information or actions that exploit existing knowledge. This “exploration-exploitation” trade-off depends on the environment: stability favors exploiting knowledge to maximize gains; volatility favors exploring new options and discovering new outcomes. Here we set out to reconcile recent evidence for dopamine’s involvement in the exploration-exploitation trade-off with the existing evidence for basal ganglia control of action selection, by testing the hypothesis that tonic dopamine in the striatum, the basal ganglia’s input nucleus, sets the current exploration-exploitation trade-off. We first advance the idea of interpreting the basal ganglia output as a probability distribution function for action selection. Using computational models of the full basal ganglia circuit, we showed that, under this interpretation, the actions of dopamine within the striatum change the basal ganglia’s output to favor the level of exploration or exploitation encoded in the probability distribution. We also found that our models predict striatal dopamine controls the exploration-exploitation trade-off if we instead read-out the probability distribution from the target nuclei of the basal ganglia, where their inhibitory input shapes the cortical input to these nuclei. Finally, by integrating the basal ganglia within a reinforcement learning model, we showed how dopamine’s effect on the exploration-exploitation trade-off could be measurable in a forced two-choice task. These simulations also showed how tonic dopamine can appear to affect learning while only directly altering the trade-off. Thus, our models support the hypothesis that changes in tonic dopamine within the striatum can alter the exploration-exploitation trade-off by modulating the output of the basal ganglia.
Robotics and Autonomous Systems | 2005
Jean-Arcady Meyer; Agnès Guillot; Benoît Girard; Mehdi Khamassi; Patrick Pirim; Alain Berthoz
Abstract Drawing inspiration from biology, the Psikharpax project aims at endowing a robot with a sensory-motor equipment and a neural control architecture that will afford some of the capacities of autonomy and adaptation that are exhibited by real rats. The paper summarizes the current state of achievement of the project. It successively describes the robots future sensors and actuators, and several biomimetic models of the anatomy and physiology of structures in the rats brain, like the hippocampus and the basal ganglia, which have already been at work on various robots, and that make navigation and action selection possible. Preliminary results on the implementation of learning mechanisms in these structures are also presented. Finally, the article discusses the potential benefits that a biologically inspired approach affords to traditional autonomous robotics.
Journal of Computational Neuroscience | 2010
Adrien Peyrache; Karim Benchenane; Mehdi Khamassi; Sidney I. Wiener; Francesco P. Battaglia
Simultaneous recordings of many single neurons reveals unique insights into network processing spanning the timescale from single spikes to global oscillations. Neurons dynamically self-organize in subgroups of coactivated elements referred to as cell assemblies. Furthermore, these cell assemblies are reactivated, or replayed, preferentially during subsequent rest or sleep episodes, a proposed mechanism for memory trace consolidation. Here we employ Principal Component Analysis to isolate such patterns of neural activity. In addition, a measure is developed to quantify the similarity of instantaneous activity with a template pattern, and we derive theoretical distributions for the null hypothesis of no correlation between spike trains, allowing one to evaluate the statistical significance of instantaneous coactivations. Hence, when applied in an epoch different from the one where the patterns were identified, (e.g. subsequent sleep) this measure allows to identify times and intensities of reactivation. The distribution of this measure provides information on the dynamics of reactivation events: in sleep these occur as transients rather than as a continuous process.
Frontiers in Neurorobotics | 2011
Mehdi Khamassi; Stephane Lall'ee; Pierre Enel; Emmanuel Procyk; Peter Ford Dominey
A major challenge in modern robotics is to liberate robots from controlled industrial settings, and allow them to interact with humans and changing environments in the real-world. The current research attempts to determine if a neurophysiologically motivated model of cortical function in the primate can help to address this challenge. Primates are endowed with cognitive systems that allow them to maximize the feedback from their environment by learning the values of actions in diverse situations and by adjusting their behavioral parameters (i.e., cognitive control) to accommodate unexpected events. In such contexts uncertainty can arise from at least two distinct sources – expected uncertainty resulting from noise during sensory-motor interaction in a known context, and unexpected uncertainty resulting from the changing probabilistic structure of the environment. However, it is not clear how neurophysiological mechanisms of reinforcement learning and cognitive control integrate in the brain to produce efficient behavior. Based on primate neuroanatomy and neurophysiology, we propose a novel computational model for the interaction between lateral prefrontal and anterior cingulate cortex reconciling previous models dedicated to these two functions. We deployed the model in two robots and demonstrate that, based on adaptive regulation of a meta-parameter β that controls the exploration rate, the model can robustly deal with the two kinds of uncertainties in the real-world. In addition the model could reproduce monkey behavioral performance and neurophysiological data in two problem-solving tasks. A last experiment extends this to human–robot interaction with the iCub humanoid, and novel sources of uncertainty corresponding to “cheating” by the human. The combined results provide concrete evidence for the ability of neurophysiologically inspired cognitive systems to control advanced robots in the real-world.
PLOS Computational Biology | 2014
Olivier Sigaud; Shelly B. Flagel; Terry E. Robinson; Mehdi Khamassi
Reinforcement Learning has greatly influenced models of conditioning, providing powerful explanations of acquired behaviour and underlying physiological observations. However, in recent autoshaping experiments in rats, variation in the form of Pavlovian conditioned responses (CRs) and associated dopamine activity, have questioned the classical hypothesis that phasic dopamine activity corresponds to a reward prediction error-like signal arising from a classical Model-Free system, necessary for Pavlovian conditioning. Over the course of Pavlovian conditioning using food as the unconditioned stimulus (US), some rats (sign-trackers) come to approach and engage the conditioned stimulus (CS) itself – a lever – more and more avidly, whereas other rats (goal-trackers) learn to approach the location of food delivery upon CS presentation. Importantly, although both sign-trackers and goal-trackers learn the CS-US association equally well, only in sign-trackers does phasic dopamine activity show classical reward prediction error-like bursts. Furthermore, neither the acquisition nor the expression of a goal-tracking CR is dopamine-dependent. Here we present a computational model that can account for such individual variations. We show that a combination of a Model-Based system and a revised Model-Free system can account for the development of distinct CRs in rats. Moreover, we show that revising a classical Model-Free system to individually process stimuli by using factored representations can explain why classical dopaminergic patterns may be observed for some rats and not for others depending on the CR they develop. In addition, the model can account for other behavioural and pharmacological results obtained using the same, or similar, autoshaping procedures. Finally, the model makes it possible to draw a set of experimental predictions that may be verified in a modified experimental protocol. We suggest that further investigation of factored representations in computational neuroscience studies may be useful.
Adaptive Behavior | 2005
Mehdi Khamassi; Loic Lacheze; Benoît Girard; Alain Berthoz; Agnès Guillot
Since 1995, numerous Actor–Critic architectures for reinforcement learning have been proposed as models of dopamine-like reinforcement learning mechanisms in the rat’s basal ganglia. However, these models were usually tested in different tasks, and it is then difficult to compare their efficiency for an autonomous animat. We present here the comparison of four architectures in an animat as it per forms the same reward-seeking task. This will illustrate the consequences of different hypotheses about the management of different Actor sub-modules and Critic units, and their more or less autono mously determined coordination. We show that the classical method of coordination of modules by mixture of experts, depending on each module’s performance, did not allow solving our task. Then we address the question of which principle should be applied efficiently to combine these units. Improve ments for Critic modeling and accuracy of Actor–Critic models for a natural task are finally discussed in the perspective of our Psikharpax project—an artificial rat having to survive autonomously in unpre dictable environments.
Frontiers in Behavioral Neuroscience | 2012
Mehdi Khamassi; Mark D. Humphries
Behavior in spatial navigation is often organized into map-based (place-driven) vs. map-free (cue-driven) strategies; behavior in operant conditioning research is often organized into goal-directed vs. habitual strategies. Here we attempt to unify the two. We review one powerful theory for distinct forms of learning during instrumental conditioning, namely model-based (maintaining a representation of the world) and model-free (reacting to immediate stimuli) learning algorithms. We extend these lines of argument to propose an alternative taxonomy for spatial navigation, showing how various previously identified strategies can be distinguished as “model-based” or “model-free” depending on the usage of information and not on the type of information (e.g., cue vs. place). We argue that identifying “model-free” learning with dorsolateral striatum and “model-based” learning with dorsomedial striatum could reconcile numerous conflicting results in the spatial navigation literature. From this perspective, we further propose that the ventral striatum plays key roles in the model-building process. We propose that the core of the ventral striatum is positioned to learn the probability of action selection for every transition between states of the world. We further review suggestions that the ventral striatal core and shell are positioned to act as “critics” contributing to the computation of a reward prediction error for model-free and model-based systems, respectively.
European Journal of Neuroscience | 2008
Mehdi Khamassi; Antonius B. Mulder; Eiichi Tabuchi; Vincent Douchamps; Sidney I. Wiener
It has been proposed that the striatum plays a crucial role in learning to select appropriate actions, optimizing rewards according to the principles of ‘Actor–Critic’ models of trial‐and‐error learning. The ventral striatum (VS), as Critic, would employ a temporal difference (TD) learning algorithm to predict rewards and drive dopaminergic neurons. This study examined this model’s adequacy for VS responses to multiple rewards in rats. The respective arms of a plus‐maze provided rewards of varying magnitudes; multiple rewards were provided at 1‐s intervals while the rat stood still. Neurons discharged phasically prior to each reward, during both initial approach and immobile waiting, demonstrating that this signal is predictive and not simply motor‐related. In different neurons, responses could be greater for early, middle or late droplets in the sequence. Strikingly, this activity often reappeared after the final reward, as if in anticipation of yet another. In contrast, previous TD learning models show decremental reward‐prediction profiles during reward consumption due to a temporal‐order signal introduced to reproduce accurate timing in dopaminergic reward‐prediction error signals. To resolve this inconsistency in a biologically plausible manner, we adapted the TD learning model such that input information is nonhomogeneously distributed among different neurons. By suppressing reward temporal‐order signals and varying richness of spatial and visual input information, the model reproduced the experimental data. This validates the feasibility of a TD‐learning architecture where different groups of neurons participate in solving the task based on varied input information.
Nature Communications | 2015
Stefano Palminteri; Mehdi Khamassi; Mateus Joffily; Giorgio Coricelli
Compared with reward seeking, punishment avoidance learning is less clearly understood at both the computational and neurobiological levels. Here we demonstrate, using computational modelling and fMRI in humans, that learning option values in a relative—context-dependent—scale offers a simple computational solution for avoidance learning. The context (or state) value sets the reference point to which an outcome should be compared before updating the option value. Consequently, in contexts with an overall negative expected value, successful punishment avoidance acquires a positive value, thus reinforcing the response. As revealed by post-learning assessment of options values, contextual influences are enhanced when subjects are informed about the result of the forgone alternative (counterfactual information). This is mirrored at the neural level by a shift in negative outcome encoding from the anterior insula to the ventral striatum, suggesting that value contextualization also limits the need to mobilize an opponent punishment learning system.
Cerebral Cortex | 2015
Mehdi Khamassi; René Quilodran; Pierre Enel; Peter Ford Dominey; Emmanuel Procyk
To explain the high level of flexibility in primate decision-making, theoretical models often invoke reinforcement-based mechanisms, performance monitoring functions, and core neural features within frontal cortical regions. However, the underlying biological mechanisms remain unknown. In recent models, part of the regulation of behavioral control is based on meta-learning principles, for example, driving exploratory actions by varying a meta-parameter, the inverse temperature, which regulates the contrast between competing action probabilities. Here we investigate how complementary processes between lateral prefrontal cortex (LPFC) and dorsal anterior cingulate cortex (dACC) implement decision regulation during exploratory and exploitative behaviors. Model-based analyses of unit activity recorded in these 2 areas in monkeys first revealed that adaptation of the decision function is reflected in a covariation between LPFC neural activity and the control level estimated from the animals behavior. Second, dACC more prominently encoded a reflection of outcome uncertainty useful for control regulation based on task monitoring. Model-based analyses also revealed higher information integration before feedback in LPFC, and after feedback in dACC. Overall the data support a role of dACC in integrating reinforcement-based information to regulate decision functions in LPFC. Our results thus provide biological evidence on how prefrontal cortical subregions may cooperate to regulate decision-making.