Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Stefan Elfwing is active.

Publication


Featured researches published by Stefan Elfwing.


congress on evolutionary computation | 2005

Biologically inspired embodied evolution of survival

Stefan Elfwing; Eiji Uchibe; Kenji Doya; Henrik I. Christensen

Embodied evolution is a methodology for evolutionary robotics that mimics the distributed, asynchronous and autonomous properties of biological evolution. The evaluation, selection and reproduction are carried out by and between the robots, without any need for human intervention. In this paper, we propose a biologically inspired embodied evolution framework, which fully integrates self-preservation, recharging from external batteries in the environment, and self-reproduction, pair-wise exchange of genetic material, into a survival system. The individuals are explicitly evaluated for the performance of the battery capturing task, but also implicitly for the mating task by the fact that an individual that mates frequently has larger probability to spread its gene in the population. We have evaluated our method in simulation experiments and the simulation results show that the solutions obtained by our embodied evolution method were able to optimize the two survival tasks, battery capturing and mating, simultaneously. We have also performed preliminary experiments in hardware, with promising results.


Adaptive Behavior | 2011

Darwinian embodied evolution of the learning ability for survival

Stefan Elfwing; Eiji Uchibe; Kenji Doya; Henrik I. Christensen

In this article we propose a framework for performing embodied evolution with a limited number of robots, by utilizing time-sharing in subpopulations of virtual agents hosted in each robot. Within this framework, we explore the combination of within-generation learning of basic survival behaviors by reinforcement learning, and evolutionary adaptations over the generations of the basic behavior selection policy, the reward functions, and metaparameters for reinforcement learning. We apply a biologically inspired selection scheme, in which there is no explicit communication of the individuals’ fitness information. The individuals can only reproduce offspring by mating—a pair-wise exchange of genotypes—and the probability that an individual reproduces offspring in its own subpopulation is dependent on the individual’s ‘‘health,’’ that is, energy level, at the mating occasion. We validate the proposed method by comparing it with evolution using standard centralized selection, in simulation, and by transferring the obtained solutions to hardware using two real robots.


IEEE Transactions on Evolutionary Computation | 2007

Evolutionary Development of Hierarchical Learning Structures

Stefan Elfwing; Eiji Uchibe; Kenji Doya; Henrik I. Christensen

Hierarchical reinforcement learning (RL) algorithms can learn a policy faster than standard RL algorithms. However, the applicability of hierarchical RL algorithms is limited by the fact that the task decomposition has to be performed in advance by the human designer. We propose a Lamarckian evolutionary approach for automatic development of the learning structure in hierarchical RL. The proposed method combines the MAXQ hierarchical RL method and genetic programming (GP). In the MAXQ framework, a subtask can optimize the policy independently of its parent tasks policy, which makes it possible to reuse learned policies of the subtasks. In the proposed method, the MAXQ method learns the policy based on the task hierarchies obtained by GP, while the GP explores the appropriate hierarchies using the result of the MAXQ method. To show the validity of the proposed method, we have performed simulation experiments for a foraging task in three different environmental settings. The results show strong interconnection between the obtained learning structures and the given task environments. The main conclusion of the experiments is that the GP can find a minimal strategy, i.e., a hierarchy that minimizes the number of primitive subtasks that can be executed for each type of situation. The experimental results for the most challenging environment also show that the policies of the subtasks can continue to improve, even after the structure of the hierarchy has been evolutionary stabilized, as an effect of Lamarckian mechanisms


Neural Networks | 2015

Expected energy-based restricted Boltzmann machine for classification

Stefan Elfwing; Eiji Uchibe; Kenji Doya

In classification tasks, restricted Boltzmann machines (RBMs) have predominantly been used in the first stage, either as feature extractors or to provide initialization of neural networks. In this study, we propose a discriminative learning approach to provide a self-contained RBM method for classification, inspired by free-energy based function approximation (FE-RBM), originally proposed for reinforcement learning. For classification, the FE-RBM method computes the output for an input vector and a class vector by the negative free energy of an RBM. Learning is achieved by stochastic gradient-descent using a mean-squared error training objective. In an earlier study, we demonstrated that the performance and the robustness of FE-RBM function approximation can be improved by scaling the free energy by a constant that is related to the size of network. In this study, we propose that the learning performance of RBM function approximation can be further improved by computing the output by the negative expected energy (EE-RBM), instead of the negative free energy. To create a deep learning architecture, we stack several RBMs on top of each other. We also connect the class nodes to all hidden layers to try to improve the performance even further. We validate the classification performance of EE-RBM using the MNIST data set and the NORB data set, achieving competitive performance compared with other classifiers such as standard neural networks, deep belief networks, classification RBMs, and support vector machines. The purpose of using the NORB data set is to demonstrate that EE-RBM with binary input nodes can achieve high performance in the continuous input domain.


Adaptive Behavior | 2008

Co-evolution of Shaping Rewards and Meta-Parameters in Reinforcement Learning

Stefan Elfwing; Eiji Uchibe; Kenji Doya; Henrik I. Christensen

In this article, we explore an evolutionary approach to the optimization of potential-based shaping rewards and meta-parameters in reinforcement learning. Shaping rewards is a frequently used approach to increase the learning performance of reinforcement learning, with regards to both initial performance and convergence speed. Shaping rewards provide additional knowledge to the agent in the form of richer reward signals, which guide learning to high-rewarding states. Reinforcement learning depends critically on a few meta-parameters that modulate the learning updates or the exploration of the environment, such as the learning rate α, the discount factor of future rewards γ, and the temperature τ that controls the trade-off between exploration and exploitation in softmax action selection. We validate the proposed approach in simulation using the mountain-car task. We also transfer shaping rewards and meta-parameters, evolutionarily obtained in simulation, to hardware, using a robotic foraging task.


genetic and evolutionary computation conference | 2003

An evolutionary approach to automatic construction of the structure in hierarchical reinforcement learning

Stefan Elfwing; Eiji Uchibe; Kenji Doya

We plan to implement our method to the real hardware. A foreseeable extension of this study is to generalize the method as a model of cooperative and competitive mechanisms of the learning modules in the brain.


Frontiers in Neurorobotics | 2013

Scaled free-energy based reinforcement learning for robust and efficient learning in high-dimensional state spaces

Stefan Elfwing; Eiji Uchibe; Kenji Doya

Free-energy based reinforcement learning (FERL) was proposed for learning in high-dimensional state- and action spaces, which cannot be handled by standard function approximation methods. In this study, we propose a scaled version of free-energy based reinforcement learning to achieve more robust and more efficient learning performance. The action-value function is approximated by the negative free-energy of a restricted Boltzmann machine, divided by a constant scaling factor that is related to the size of the Boltzmann machine (the square root of the number of state nodes in this study). Our first task is a digit floor gridworld task, where the states are represented by images of handwritten digits from the MNIST data set. The purpose of the task is to investigate the proposed methods ability, through the extraction of task-relevant features in the hidden layer, to cluster images of the same digit and to cluster images of different digits that corresponds to states with the same optimal action. We also test the methods robustness with respect to different exploration schedules, i.e., different settings of the initial temperature and the temperature discount rate in softmax action selection. Our second task is a robot visual navigation task, where the robot can learn its position by the different colors of the lower part of four landmarks and it can infer the correct corner goal area by the color of the upper part of the landmarks. The state space consists of binarized camera images with, at most, nine different colors, which is equal to 6642 binary states. For both tasks, the learning performance is compared with standard FERL and with function approximation where the action-value function is approximated by a two-layered feedforward neural network.


international conference on neural information processing | 2010

Free-energy based reinforcement learning for vision-based navigation with high-dimensional sensory inputs

Stefan Elfwing; Makoto Otsuka; Eiji Uchibe; Kenji Doya

Free-energy based reinforcement learning was proposed for learning in high-dimensional state and action spaces, which cannot be handled by standard function approximation methods in reinforcement learning. In the free-energy reinforcement learning method, the actionvalue function is approximated as the negative free energy of a restricted Boltzmann machine. In this paper, we test if it is feasible to use freeenergy reinforcement learning for real robot control with raw, highdimensional sensory inputs through the extraction of task-relevant features in the hidden layer. We first demonstrate, in simulation, that a small mobile robot could efficiently learn a vision-based navigation and battery capturing task. We then demonstrate, for a simpler battery capturing task, that free-energy reinforcement learning can be used for online learning in a real robot. The analysis of learned weights showed that action-oriented state coding was achieved in the hidden layer.


Neural Networks | 2016

From free energy to expected energy: Improving energy-based value function approximation in reinforcement learning

Stefan Elfwing; Eiji Uchibe; Kenji Doya

Free-energy based reinforcement learning (FERL) was proposed for learning in high-dimensional state and action spaces. However, the FERL method does only really work well with binary, or close to binary, state input, where the number of active states is fewer than the number of non-active states. In the FERL method, the value function is approximated by the negative free energy of a restricted Boltzmann machine (RBM). In our earlier study, we demonstrated that the performance and the robustness of the FERL method can be improved by scaling the free energy by a constant that is related to the size of network. In this study, we propose that RBM function approximation can be further improved by approximating the value function by the negative expected energy (EERL), instead of the negative free energy, as well as being able to handle continuous state input. We validate our proposed method by demonstrating that EERL: (1) outperforms FERL, as well as standard neural network and linear function approximation, for three versions of a gridworld task with high-dimensional image state input; (2) achieves new state-of-the-art results in stochastic SZ-Tetris in both model-free and model-based learning settings; and (3) significantly outperforms FERL and standard neural network function approximation for a robot navigation task with raw and noisy RGB images as state input and a large number of actions.


genetic and evolutionary computation conference | 2018

Online meta-learning by parallel algorithm competition

Stefan Elfwing; Eiji Uchibe; Kenji Doya

The efficiency of reinforcement learning algorithms depends critically on a few meta-parameters that modulate the learning updates and the trade-off between exploration and exploitation. The adaptation of the meta-parameters is an open question, which arguably has become a more important issue recently with the success of deep reinforcement learning. The long learning times in domains such as Atari 2600 video games makes it not feasible to perform comprehensive searches of appropriate meta-parameter values. In this study, we propose the Online Meta-learning by Parallel Algorithm Competition (OMPAC) method, which is a novel Lamarckian evolutionary approach to online meta-parameter adaptation. The population consists of several instances of a reinforcement learning algorithm which are run in parallel with small differences in initial meta-parameter values. After a fixed number of learning episodes, the instances are selected based on their performance on the task at hand, i.e., the fitness. Before continuing the learning, Gaussian noise is added to the meta-parameters with a predefined probability. We validate the OMPAC method by improving the state-of-the-art results in stochastic SZ-Tetris and in 10x10 Tetris by 31% and 84%, respectively, and by improving the learning speed and performance for deep Sarsa(λ) agents in the Atari 2600 domain.

Collaboration


Dive into the Stefan Elfwing's collaboration.

Top Co-Authors

Avatar

Kenji Doya

Okinawa Institute of Science and Technology

View shared research outputs
Top Co-Authors

Avatar

Eiji Uchibe

Okinawa Institute of Science and Technology

View shared research outputs
Top Co-Authors

Avatar

Henrik I. Christensen

Georgia Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Makoto Otsuka

Okinawa Institute of Science and Technology

View shared research outputs
Researchain Logo
Decentralizing Knowledge