A. Ross Otto
New York University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by A. Ross Otto.
Proceedings of the National Academy of Sciences of the United States of America | 2013
A. Ross Otto; Candace M. Raio; Alice Chiang; Elizabeth A. Phelps; Nathaniel D. Daw
Significance The physiological response evoked by short-lived stressful events, referred to as acute stress, impacts human decision-making. Past studies assume that stress causes people to fall back, from more cognitive or deliberative modes of choice, to more primitive or automatic modes of choice because stress impairs peoples’ capacity to process information (working memory). We directly examined how acute stress affects choice in a laboratory decision-making task for which the working memory demands of the two forms of decision-making are well understood, finding that stress impaired use of sophisticated choice strategies that require working memory but did not affect use of simpler, more primitive strategies. Further, this impairment was exacerbated in individuals with smaller working memory capacity, which is related to general intelligence. Accounts of decision-making have long posited the operation of separate, competing valuation systems in the control of choice behavior. Recent theoretical and experimental advances suggest that this classic distinction between habitual and goal-directed (or more generally, automatic and controlled) choice may arise from two computational strategies for reinforcement learning, called model-free and model-based learning. Popular neurocomputational accounts of reward processing emphasize the involvement of the dopaminergic system in model-free learning and prefrontal, central executive–dependent control systems in model-based choice. Here we hypothesized that the hypothalamic-pituitary-adrenal (HPA) axis stress response—believed to have detrimental effects on prefrontal cortex function—should selectively attenuate model-based contributions to behavior. To test this, we paired an acute stressor with a sequential decision-making task that affords distinguishing the relative contributions of the two learning strategies. We assessed baseline working-memory (WM) capacity and used salivary cortisol levels to measure HPA axis stress response. We found that stress response attenuates the contribution of model-based, but not model-free, contributions to behavior. Moreover, stress-induced behavioral changes were modulated by individual WM capacity, such that low-WM-capacity individuals were more susceptible to detrimental stress effects than high-WM-capacity individuals. These results enrich existing accounts of the interplay between acute stress, working memory, and prefrontal function and suggest that executive function may be protective against the deleterious effects of acute stress.
Psychological Science | 2013
A. Ross Otto; Samuel J. Gershman; Arthur B. Markman; Nathaniel D. Daw
A number of accounts of human and animal behavior posit the operation of parallel and competing valuation systems in the control of choice behavior. In these accounts, a flexible but computationally expensive model-based reinforcement-learning system has been contrasted with a less flexible but more efficient model-free reinforcement-learning system. The factors governing which system controls behavior—and under what circumstances—are still unclear. Following the hypothesis that model-based reinforcement learning requires cognitive resources, we demonstrated that having human decision makers perform a demanding secondary task engenders increased reliance on a model-free reinforcement-learning strategy. Further, we showed that, across trials, people negotiate the trade-off between the two systems dynamically as a function of concurrent executive-function demands, and people’s choice latencies reflect the computational expenses of the strategy they employ. These results demonstrate that competition between multiple learning systems can be controlled on a trial-by-trial basis by modulating the availability of cognitive resources.
Journal of Experimental Psychology: General | 2014
Samuel J. Gershman; Arthur B. Markman; A. Ross Otto
Recent computational theories of decision making in humans and animals have portrayed 2 systems locked in a battle for control of behavior. One system--variously termed model-free or habitual--favors actions that have previously led to reward, whereas a second--called the model-based or goal-directed system--favors actions that causally lead to reward according to the agents internal model of the environment. Some evidence suggests that control can be shifted between these systems using neural or behavioral manipulations, but other evidence suggests that the systems are more intertwined than a competitive account would imply. In 4 behavioral experiments, using a retrospective revaluation design and a cognitive load manipulation, we show that human decisions are more consistent with a cooperative architecture in which the model-free system controls behavior, whereas the model-based system trains the model-free system by replaying and simulating experience.
Journal of Cognitive Neuroscience | 2014
A. Ross Otto; Anya Skatova; Seth Madlon-Kay; Nathaniel D. Daw
Accounts of decision-making and its neural substrates have long posited the operation of separate, competing valuation systems in the control of choice behavior. Recent theoretical and experimental work suggest that this classic distinction between behaviorally and neurally dissociable systems for habitual and goal-directed (or more generally, automatic and controlled) choice may arise from two computational strategies for reinforcement learning (RL), called model-free and model-based RL, but the cognitive or computational processes by which one system may dominate over the other in the control of behavior is a matter of ongoing investigation. To elucidate this question, we leverage the theoretical framework of cognitive control, demonstrating that individual differences in utilization of goal-related contextual information—in the service of overcoming habitual, stimulus-driven responses—in established cognitive control paradigms predict model-based behavior in a separate, sequential choice task. The behavioral correspondence between cognitive control and model-based RL compellingly suggests that a common set of processes may underpin the two behaviors. In particular, computational mechanisms originally proposed to underlie controlled behavior may be applicable to understanding the interactions between model-based and model-free choice behavior.
Psychonomic Bulletin & Review | 2013
Darrell A. Worthy; Melissa J. Hawthorne; A. Ross Otto
The Iowa gambling task (IGT) has been used in numerous studies, often to examine decision-making performance in different clinical populations. Reinforcement learning (RL) models such as the expectancy valence (EV) model have often been used to characterize choice behavior in this work, and accordingly, parameter differences from these models have been used to examine differences in decision-making processes between different populations. These RL models assume a strategy whereby participants incrementally update the expected rewards for each option and probabilistically select options with higher expected rewards. Here we show that a formal model that assumes a win-stay/lose-shift (WSLS) strategy—which is sensitive only to the outcome of the previous choice—provides the best fit to IGT data from about half of our sample of healthy young adults, and that a prospect valence learning (PVL) model that utilizes a decay reinforcement learning rule provides the best fit to the other half of the data. Further analyses suggested that the better fits of the WSLS model to many participants’ data were not due to an enhanced ability of the WSLS model to mimic the RL strategy assumed by the PVL and EV models. These results suggest that WSLS is a common strategy in the IGT and that both heuristic-based and RL-based models should be used to inform decision-making behavior in the IGT and similar choice tasks.
Cognitive, Affective, & Behavioral Neuroscience | 2015
Claire M. Gillan; A. Ross Otto; Elizabeth A. Phelps; Nathaniel D. Daw
Studies in humans and rodents have suggested that behavior can at times be “goal-directed”—that is, planned, and purposeful—and at times “habitual”—that is, inflexible and automatically evoked by stimuli. This distinction is central to conceptions of pathological compulsion, as in drug abuse and obsessive-compulsive disorder. Evidence for the distinction has primarily come from outcome devaluation studies, in which the sensitivity of a previously learned behavior to motivational change is used to assay the dominance of habits versus goal-directed actions. However, little is known about how habits and goal-directed control arise. Specifically, in the present study we sought to reveal the trial-by-trial dynamics of instrumental learning that would promote, and protect against, developing habits. In two complementary experiments with independent samples, participants completed a sequential decision task that dissociated two computational-learning mechanisms, model-based and model-free. We then tested for habits by devaluing one of the rewards that had reinforced behavior. In each case, we found that individual differences in model-based learning predicted the participants’ subsequent sensitivity to outcome devaluation, suggesting that an associative mechanism underlies a bias toward habit formation in healthy individuals.
Journal of Experimental Psychology: Learning, Memory and Cognition | 2010
A. Ross Otto; Arthur B. Markman; Todd M. Gureckis; Bradley C. Love
This work explores the influence of motivation on choice behavior in a dynamic decision-making environment, where the payoffs from each choice depend on ones recent choice history. Previous research reveals that participants in a regulatory fit exhibit increased levels of exploratory choice and flexible use of multiple strategies over the course of an experiment. The present study placed promotion and prevention-focused participants in a dynamic environment for which optimal performance is facilitated by systematic exploration of the decision space. These participants either gained or lost points with each choice. Our experiment revealed that participants in a regulatory fit were more likely to engage in systematic exploration of the task environment than were participants in a regulatory mismatch and performed more optimally as a result. Implications for contemporary models of human reinforcement learning are discussed.
Cognition | 2011
A. Ross Otto; Arthur B. Markman
Probability matching is a suboptimal behavior that often plagues human decision-making in simple repeated choice tasks. Despite decades of research, recent studies cannot find agreement on what choice strategies lead to probability matching. We propose a solution, showing that two distinct local choice strategies-which make different demands on executive resources-both result in probability-matching behavior on a global level. By placing participants in a simple binary prediction task under dual- versus single-task conditions, we find that individuals with compromised executive resources are driven away from a one-trial-back strategy (utilized by participants with intact executive resources) and towards a strategy that integrates a longer window of past outcomes into the current prediction. Crucially, both groups of participants exhibited probability-matching behavior to the same extent at a global level of analysis. We suggest that these two forms of probability matching are byproducts of the operation of explicit versus implicit systems.
Frontiers in Psychology | 2012
W. Bradley Knox; A. Ross Otto; Peter Stone; Bradley C. Love
In non-stationary environments, there is a conflict between exploiting currently favored options and gaining information by exploring lesser-known options that in the past have proven less rewarding. Optimal decision-making in such tasks requires considering future states of the environment (i.e., planning) and properly updating beliefs about the state of the environment after observing outcomes associated with choices. Optimal belief-updating is reflective in that beliefs can change without directly observing environmental change. For example, after 10 s elapse, one might correctly believe that a traffic light last observed to be red is now more likely to be green. To understand human decision-making when rewards associated with choice options change over time, we develop a variant of the classic “bandit” task that is both rich enough to encompass relevant phenomena and sufficiently tractable to allow for ideal actor analysis of sequential choice behavior. We evaluate whether people update beliefs about the state of environment in a reflexive (i.e., only in response to observed changes in reward structure) or reflective manner. In contrast to purely “random” accounts of exploratory behavior, model-based analyses of the subjects’ choices and latencies indicate that people are reflective belief updaters. However, unlike the Ideal Actor model, our analyses indicate that people’s choice behavior does not reflect consideration of future environmental states. Thus, although people update beliefs in a reflective manner consistent with the Ideal Actor, they do not engage in optimal long-term planning, but instead myopically choose the option on every trial that is believed to have the highest immediate payoff.
Psychological Science | 2016
Johannes H. Decker; A. Ross Otto; Nathaniel D. Daw; Catherine A. Hartley
Theoretical models distinguish two decision-making strategies that have been formalized in reinforcement-learning theory. A model-based strategy leverages a cognitive model of potential actions and their consequences to make goal-directed choices, whereas a model-free strategy evaluates actions based solely on their reward history. Research in adults has begun to elucidate the psychological mechanisms and neural substrates underlying these learning processes and factors that influence their relative recruitment. However, the developmental trajectory of these evaluative strategies has not been well characterized. In this study, children, adolescents, and adults performed a sequential reinforcement-learning task that enabled estimation of model-based and model-free contributions to choice. Whereas a model-free strategy was apparent in choice behavior across all age groups, a model-based strategy was absent in children, became evident in adolescents, and strengthened in adults. These results suggest that recruitment of model-based valuation systems represents a critical cognitive component underlying the gradual maturation of goal-directed behavior.