Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Amir Dezfouli is active.

Publication


Featured researches published by Amir Dezfouli.


European Journal of Neuroscience | 2012

Habits, action sequences, and reinforcement learning

Amir Dezfouli; Bernard W. Balleine

It is now widely accepted that instrumental actions can be either goal‐directed or habitual; whereas the former are rapidly acquired and regulated by their outcome, the latter are reflexive, elicited by antecedent stimuli rather than their consequences. Model‐based reinforcement learning (RL) provides an elegant description of goal‐directed action. Through exposure to states, actions and rewards, the agent rapidly constructs a model of the world and can choose an appropriate action based on quite abstract changes in environmental and evaluative demands. This model is powerful but has a problem explaining the development of habitual actions. To account for habits, theorists have argued that another action controller is required, called model‐free RL, that does not form a model of the world but rather caches action values within states allowing a state to select an action based on its reward history rather than its consequences. Nevertheless, there are persistent problems with important predictions from the model; most notably the failure of model‐free RL correctly to predict the insensitivity of habitual actions to changes in the action–reward contingency. Here, we suggest that introducing model‐free RL in instrumental conditioning is unnecessary, and demonstrate that reconceptualizing habits as action sequences allows model‐based RL to be applied to both goal‐directed and habitual actions in a manner consistent with what real animals do. This approach has significant implications for the way habits are currently investigated and generates new experimental predictions.


PLOS Computational Biology | 2013

Actions, Action Sequences and Habits: Evidence That Goal-Directed and Habitual Action Control Are Hierarchically Organized

Amir Dezfouli; Bernard W. Balleine

Behavioral evidence suggests that instrumental conditioning is governed by two forms of action control: a goal-directed and a habit learning process. Model-based reinforcement learning (RL) has been argued to underlie the goal-directed process; however, the way in which it interacts with habits and the structure of the habitual process has remained unclear. According to a flat architecture, the habitual process corresponds to model-free RL, and its interaction with the goal-directed process is coordinated by an external arbitration mechanism. Alternatively, the interaction between these systems has recently been argued to be hierarchical, such that the formation of action sequences underlies habit learning and a goal-directed process selects between goal-directed actions and habitual sequences of actions to reach the goal. Here we used a two-stage decision-making task to test predictions from these accounts. The hierarchical account predicts that, because they are tied to each other as an action sequence, selecting a habitual action in the first stage will be followed by a habitual action in the second stage, whereas the flat account predicts that the statuses of the first and second stage actions are independent of each other. We found, based on subjects choices and reaction times, that human subjects combined single actions to build action sequences and that the formation of such action sequences was sufficient to explain habitual actions. Furthermore, based on Bayesian model comparison, a family of hierarchical RL models, assuming a hierarchical interaction between habit and goal-directed processes, provided a better fit of the subjects behavior than a family of flat models. Although these findings do not rule out all possible model-free accounts of instrumental conditioning, they do show such accounts are not necessary to explain habitual actions and provide a new basis for understanding how goal-directed and habitual action control interact.


Neuron | 2015

Medial Orbitofrontal Cortex Mediates Outcome Retrieval in Partially Observable Task Situations

Laura A. Bradfield; Amir Dezfouli; Mieke van Holstein; Billy Chieng; Bernard W. Balleine

Choice between actions often requires the ability to retrieve action consequences in circumstances where they are only partially observable. This capacity has recently been argued to depend on orbitofrontal cortex; however, no direct evidence for this hypothesis has been reported. Here, we examined whether activity in the medial orbitofrontal cortex (mOFC) underlies this critical determinant of decision-making in rats. First, we simulated predictions from this hypothesis for various tests of goal-directed action by removing the assumption that rats could retrieve partially observable outcomes and then tested those predictions experimentally using manipulations of the mOFC. The results closely followed predictions; consistent deficits only emerged when action consequences had to be retrieved. Finally, we put action selection based on observable and unobservable outcomes into conflict and found that whereas intact rats selected actions based on the value of retrieved outcomes, mOFC rats relied solely on the value of observable outcomes.


Philosophical Transactions of the Royal Society B | 2014

Habits as action sequences: hierarchical action control and changes in outcome value

Amir Dezfouli; Nura W. Lingawi; Bernard W. Balleine

Goal-directed action involves making high-level choices that are implemented using previously acquired action sequences to attain desired goals. Such a hierarchical schema is necessary for goal-directed actions to be scalable to real-life situations, but results in decision-making that is less flexible than when action sequences are unfolded and the decision-maker deliberates step-by-step over the outcome of each individual action. In particular, from this perspective, the offline revaluation of any outcomes that fall within action sequence boundaries will be invisible to the high-level planner resulting in decisions that are insensitive to such changes. Here, within the context of a two-stage decision-making task, we demonstrate that this property can explain the emergence of habits. Next, we show how this hierarchical account explains the insensitivity of over-trained actions to changes in outcome value. Finally, we provide new data that show that, under extended extinction conditions, habitual behaviour can revert to goal-directed control, presumably as a consequence of decomposing action sequences into single actions. This hierarchical view suggests that the development of action sequences and the insensitivity of actions to changes in outcome value are essentially two sides of the same coin, explaining why these two aspects of automatic behaviour involve a shared neural structure.


Nature Communications | 2014

Action-value comparisons in the dorsolateral prefrontal cortex control choice between goal-directed actions

Richard W. Morris; Amir Dezfouli; Kristi R. Griffiths; Bernard W. Balleine

It is generally assumed that choice between different actions reflects the difference between their action values yet little direct evidence confirming this assumption has been reported. Here we assess whether the brain calculates the absolute difference between action values or their relative advantage, that is, the probability that one action is better than the other alternatives. We use a two-armed bandit task during functional magnetic resonance imaging and modelled responses to determine both the size of the difference between action values (D) and the probability that one action value is better (P). The results show haemodynamic signals corresponding to P in right dorsolateral prefrontal cortex (dlPFC) together with evidence that these signals modulate motor cortex activity in an action-specific manner. We find no significant activity related to D. These findings demonstrate that a distinct neuronal population mediates action-value comparisons, and reveals how these comparisons are implemented to mediate value-based decision-making.


Current opinion in behavioral sciences | 2015

Hierarchical control of goal-directed action in the cortical–basal ganglia network

Bernard W. Balleine; Amir Dezfouli; Makato Ito; Kenji Doya

Goal-directed control depends on constructing a model of the world that maps actions onto specific outcomes, allowing choice to remain adaptive when the values of outcomes change. In complex environments, however, such models can become computationally unwieldy. One solution to this problem is to develop a hierarchical control structure within which more complex, or abstract, actions are built from simpler ones. Here we review findings suggesting that the acquisition, evaluation and execution of goal-directed actions accords well with predictions from hierarchical models. We describe recent evidence that hierarchical action control is implemented in a series of feedback loops integrating secondary motor areas with the basal ganglia and describe how such a structure not only overcomes issues of dimensionality, but also helps to explain the formation of actions sequences, action chunking and the relationship between goal-directed actions and habits.


bioRxiv | 2017

The Algorithmic Neuroanatomy Of Action-Outcome Learning

Richard Morris; Amir Dezfouli; Kristi R. Griffiths; Mike E. Le Pelley; Bernard W. Balleine

Although it is well known that animals can encode the consequences of their actions and can use this information to control action selection and evaluation, it is not known what learning rules control action-outcome (AO) learning. Here we trained participants to encode specific AO associations whilst undergoing functional imaging (fMRI) and used computational modelling to evaluate competing models. This analysis revealed that a Kalman filter, which learned the unique causal effect of each action, best characterized AO learning and found the medial prefrontal cortex differentiated the unique effect of actions from background effects. We subsequently extended these findings to show that mPFC participated in a circuit with parietal cortex and caudate nucleus to segregate distinct contributions to AO learning. The results extend our understanding of goal-directed learning and demonstrate that sensitivity to the causal relationship between actions and outcomes guides goal-directed learning rather than contiguous state-action relations.


neural information processing systems | 2018

Integrated accounts of behavioral and neuroimaging data using flexible recurrent neural network models

Amir Dezfouli; Richard Morris; Fabio Ramos; Peter Dayan; Bernard W. Balleine

Neuroscience studies of human decision-making abilities commonly involve subjects completing a decision-making task while BOLD signals are recorded using fMRI. Hypotheses are tested about which brain regions mediate the effect of past experience, such as rewards, on future actions. One standard approach to this is model-based fMRI data analysis, in which a model is fitted to the behavioral data, i.e., a subject’s choices, and then the neural data are parsed to find brain regions whose BOLD signals are related to the model’s internal signals. However, the internal mechanics of such purely behavioral models are not constrained by the neural data, and therefore might miss or mischaracterize aspects of the brain. To address this limitation, we introduce a new method using recurrent neural network models that are flexible enough to be fitted jointly to the behavioral and neural data. We trained a model so that its internal states were suitably related to neural activity during the task, while at the same time its output predicted the next action a subject would execute. We then used the fitted model to create a novel visualization of the relationship between the activity in brain regions at different times following a reward and the choices the subject subsequently made. Finally, we validated our method using a previously published dataset. We showed that the model was able to recover the underlying neural substrates that were discovered by explicit model engineering in the previous work, and also derived new results regarding the temporal pattern of brain activity.


bioRxiv | 2018

Models that learn how humans learn: the case of depression and bipolar disorders

Amir Dezfouli; Kristi R. Griffiths; Fabio Ramos; Peter Dayan; Bernard W. Balleine

Popular computational models of decision-making make specific assumptions about learning processes that may cause them to underfit observed behaviours. Here we suggest an alternative method using recurrent neural networks (RNNs) to generate a flexible family of models that have sufficient capacity to represent the complex learning and decision-making strategies used by humans. In this approach, an RNN is trained to predict the next action that a subject will take in a decision-making task and, in this way, learns to imitate the processes underlying subjects’ choices and their learning abilities. We demonstrate the benefits of this approach using a new dataset drawn from patients with either unipolar (n=34) or bipolar (n=33) depression and matched healthy controls (n=34) making decisions on a two-armed bandit task. The results indicate that this new approach is better than baseline reinforcement-learning methods in terms of overall performance and its capacity to predict subjects’ choices. We show that the model can be interpreted using off-policy simulations and thereby provides a novel clustering of subjects’ learning processes – something that often eludes traditional approaches to modelling and behavioural analysis.


Psychonomic Bulletin & Review | 2018

Optimal response vigor and choice under non-stationary outcome values

Amir Dezfouli; Bernard W. Balleine; Richard Nock

Within a rational framework, a decision-maker selects actions based on the reward-maximization principle, which stipulates that they acquire outcomes with the highest value at the lowest cost. Action selection can be divided into two dimensions: selecting an action from various alternatives, and choosing its vigor, i.e., how fast the selected action should be executed. Both of these dimensions depend on the values of outcomes, which are often affected as more outcomes are consumed together with their associated actions. Despite this, previous research has only addressed the computational substrate of optimal actions in the specific condition that the values of outcomes are constant. It is not known what actions are optimal when the values of outcomes are non-stationary. Here, based on an optimal control framework, we derive a computational model for optimal actions when outcome values are non-stationary. The results imply that, even when the values of outcomes are changing, the optimal response rate is constant rather than decreasing. This finding shows that, in contrast to previous theories, commonly observed changes in action rate cannot be attributed solely to changes in outcome value. We then prove that this observation can be explained based on uncertainty about temporal horizons; e.g., the session duration. We further show that, when multiple outcomes are available, the model explains probability matching as well as maximization strategies. The model therefore provides a quantitative analysis of optimal action and explicit predictions for future testing.

Collaboration


Dive into the Amir Dezfouli's collaboration.

Top Co-Authors

Avatar

Bernard W. Balleine

University of New South Wales

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Richard Nock

Australian National University

View shared research outputs
Top Co-Authors

Avatar

Peter Dayan

University College London

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge