Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Kenji Doya is active.

Publication


Featured researches published by Kenji Doya.


Science | 2005

Representation of Action-Specific Reward Values in the Striatum

Kazuyuki Samejima; Yasumasa Ueda; Kenji Doya; Minoru Kimura

The estimation of the reward an action will yield is critical in decision-making. To elucidate the role of the basal ganglia in this process, we recorded striatal neurons of monkeys who chose between left and right handle turns, based on the estimated reward probabilities of the actions. During a delay period before the choices, the activity of more than one-third of striatal projection neurons was selective to the values of one of the two actions. Fewer neurons were tuned to relative values or action choice. These results suggest representation of action values in the striatum, which can guide action selection in the basal ganglia circuit.


Trends in Neurosciences | 1999

Parallel neural networks for learning sequential procedures

Okihide Hikosaka; Hiroyuki Nakahara; Miya K. Rand; Katsuyuki Sakai; Xiaofeng Lu; Kae Nakamura; Shigehiro Miyachi; Kenji Doya

Recent studies have shown that multiple brain areas contribute to different stages and aspects of procedural learning. On the basis of a series of studies using a sequence-learning task with trial-and-error, we propose a hypothetical scheme in which a sequential procedure is acquired independently by two cortical systems, one using spatial coordinates and the other using motor coordinates. They are active preferentially in the early and late stages of learning, respectively. Both of the two systems are supported by loop circuits formed with the basal ganglia and the cerebellum, the former for reward-based evaluation and the latter for processing of timing. The proposed neural architecture would operate in a flexible manner to acquire and execute multiple sequential procedures.


Neural Computation | 2000

Reinforcement Learning in Continuous Time and Space

Kenji Doya

This article presents a reinforcement learning framework for continuous-time dynamical systems without a priori discretization of time, state, and action. Basedonthe Hamilton-Jacobi-Bellman (HJB) equation for infinite-horizon, discounted reward problems, we derive algorithms for estimating value functions and improving policies with the use of function approximators. The process of value function estimation is formulated as the minimization of a continuous-time form of the temporal difference (TD) error. Update methods based on backward Euler approximation and exponential eligibility traces are derived, and their correspondences with the conventional residual gradient, TD (0), and TD () algorithms are shown. For policy improvement, two methodsa continuous actor-critic method and a value-gradient-based greedy policyare formulated. As a special case of the latter, a nonlinear feedback control law using the value gradient and the model of the input gain is derived. The advantage updating, a model-free algorithm derived previously, is also formulated in the HJB-based framework. The performance of the proposed algorithms is first tested in a nonlinear control task of swinging a pendulum up with limited torque. It is shown in the simulations that (1) the task is accomplished by the continuous actor-critic method in a number of trials several times fewer than by the conventional discrete actor-critic method; (2) among the continuous policy update methods, the value-gradient-based policy with a known or learned dynamic model performs several times better than the actor-critic method; and (3) a value function update using exponential eligibility traces is more efficient and stable than that based on Euler approximation. The algorithms are then tested in a higher-dimensional task: cart-pole swing-up. This task is accomplished in several hundred trials using the value-gradient-based policy with a learned dynamic model.


Current Opinion in Neurobiology | 2000

Complementary roles of basal ganglia and cerebellum in learning and motor control

Kenji Doya

The classical notion that the basal ganglia and the cerebellum are dedicated to motor control has been challenged by the accumulation of evidence revealing their involvement in non-motor, cognitive functions. From a computational viewpoint, it has been suggested that the cerebellum, the basal ganglia, and the cerebral cortex are specialized for different types of learning: namely, supervised learning, reinforcement learning and unsupervised learning, respectively. This idea of learning-oriented specialization is helpful in understanding the complementary roles of the basal ganglia and the cerebellum in motor control and cognitive functions.


Neural Networks | 2002

Metalearning and neuromodulation

Kenji Doya

This paper presents a computational theory on the roles of the ascending neuromodulatory systems from the viewpoint that they mediate the global signals that regulate the distributed learning mechanisms in the brain. Based on the review of experimental data and theoretical models, it is proposed that dopamine signals the error in reward prediction, serotonin controls the time scale of reward prediction, noradrenaline controls the randomness in action selection, and acetylcholine controls the speed of memory update. The possible interactions between those neuromodulators and the environment are predicted on the basis of computational theory of metalearning.


Nature Neuroscience | 2008

Modulators of decision making

Kenji Doya

Human and animal decisions are modulated by a variety of environmental and intrinsic contexts. Here I consider computational factors that can affect decision making and review anatomical structures and neurochemical systems that are related to contextual modulation of decision making. Expectation of a high reward can motivate a subject to go for an action despite a large cost, a decision that is influenced by dopamine in the anterior cingulate cortex. Uncertainty of action outcomes can promote risk taking and exploratory choices, in which norepinephrine and the orbitofrontal cortex appear to be involved. Predictable environments should facilitate consideration of longer-delayed rewards, which depends on serotonin in the dorsal striatum and dorsal prefrontal cortex. This article aims to sort out factors that affect the process of decision making from the viewpoint of reinforcement learning theory and to bridge between such computational needs and their neurophysiological substrates.


Neural Computation | 2002

Multiple model-based reinforcement learning

Kenji Doya; Kazuyuki Samejima; Ken-ichi Katagiri; Mitsuo Kawato

We propose a modular reinforcement learning architecture for nonlinear, nonstationary control tasks, which we call multiple model-based reinforcement learning (MMRL). The basic idea is to decompose a complex task into multiple domains in space and time based on the predictability of the environmental dynamics. The system is composed of multiple modules, each of which consists of a state prediction model and a reinforcement learning controller. The responsibility signal, which is given by the softmax function of the prediction errors, is used to weight the outputs of multiple modules, as well as to gate the learning of the prediction models and the reinforcement learning controllers. We formulate MMRL for both discrete-time, finite-state case and continuous-time, continuous-state case. The performance of MMRL was demonstrated for discrete case in a nonstationary hunting task in a grid world and for continuous case in a nonlinear, nonstationary control task of swinging up a pendulum with variable physical parameters.


The Journal of Neuroscience | 2004

A neural correlate of reward-based behavioral learning in caudate nucleus : A functional magnetic resonance imaging study of a stochastic decision task

Masahiko Haruno; Tomoe Kuroda; Kenji Doya; Keisuke Toyama; Minoru Kimura; Kazuyuki Samejima; Hiroshi Imamizu; Mitsuo Kawato

Humans can acquire appropriate behaviors that maximize rewards on a trial-and-error basis. Recent electrophysiological and imaging studies have demonstrated that neural activity in the midbrain and ventral striatum encodes the error of reward prediction. However, it is yet to be examined whether the striatum is the main locus of reward-based behavioral learning. To address this, we conducted functional magnetic resonance imaging (fMRI) of a stochastic decision task involving monetary rewards, in which subjects had to learn behaviors involving different task difficulties that were controlled by probability. We performed a correlation analysis of fMRI data by using the explanatory variables derived from subject behaviors. We found that activity in the caudate nucleus was correlated with short-term reward and, furthermore, paralleled the magnitude of a subjects behavioral change during learning. In addition, we confirmed that this parallelism between learning and activity in the caudate nucleus is robustly maintained even when we vary task difficulty by controlling the probability. These findings suggest that the caudate nucleus is one of the main loci for reward-based behavioral learning.


Robotics and Autonomous Systems | 2001

Acquisition of stand-up behavior by a real robot using hierarchical reinforcement learning

Jun Morimoto; Kenji Doya

In this paper, we propose a hierarchical reinforcement learning architecture that realizes practical learning speed in real hardware control tasks. In order to enable learning in a practical number of trials, we introduce a low-dimensional representation of the state of the robot for higher-level planning. The upper level learns a discrete sequence of sub-goals in a low-dimensional state space for achieving the main goal of the task. The lower-level modules learn local trajectories in the original high-dimensional state space to achieve the sub-goal specified by the upper level. We applied the hierarchical architecture to a three-link, two-joint robot for the task of learning to stand up by trial and error. The upper-level learning was implemented by Q-learning, while the lower-level learning was implemented by a continuous actor–critic method. The robot successfully learned to stand up within 750 trials in simulation and then in an additional 170 trials using real hardware. The effects of the setting of the search steps in the upper level and the use of a supplementary reward for achieving sub-goals are also tested in simulation.


NeuroImage | 2004

Hierarchical Bayesian estimation for MEG inverse problem.

Masa-aki Sato; Taku Yoshioka; Shigeki Kajihara; Keisuke Toyama; Naokazu Goda; Kenji Doya; Mitsuo Kawato

Source current estimation from MEG measurement is an ill-posed problem that requires prior assumptions about brain activity and an efficient estimation algorithm. In this article, we propose a new hierarchical Bayesian method introducing a hierarchical prior that can effectively incorporate both structural and functional MRI data. In our method, the variance of the source current at each source location is considered an unknown parameter and estimated from the observed MEG data and prior information by using the Variational Bayesian method. The fMRI information can be imposed as prior information on the variance distribution rather than the variance itself so that it gives a soft constraint on the variance. A spatial smoothness constraint, that the neural activity within a few millimeter radius tends to be similar due to the neural connections, can also be implemented as a hierarchical prior. The proposed method provides a unified theory to deal with the following three situations: (1) MEG with no other data, (2) MEG with structural MRI data on cortical surfaces, and (3) MEG with both structural MRI and fMRI data. We investigated the performance of our method and conventional linear inverse methods under these three conditions. Simulation results indicate that our method has better accuracy and spatial resolution than the conventional linear inverse methods under all three conditions. It is also shown that accuracy of our method improves as MRI and fMRI information becomes available. Simulation results demonstrate that our method appropriately resolves the inverse problem even if fMRI data convey inaccurate information, while the Wiener filter method is seriously deteriorated by inaccurate fMRI information.

Collaboration


Dive into the Kenji Doya's collaboration.

Top Co-Authors

Avatar

Eiji Uchibe

Okinawa Institute of Science and Technology

View shared research outputs
Top Co-Authors

Avatar

Junichiro Yoshimoto

Nara Institute of Science and Technology

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Stefan Elfwing

Okinawa Institute of Science and Technology

View shared research outputs
Top Co-Authors

Avatar

Nicolas Schweighofer

University of Southern California

View shared research outputs
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge