Steffen Udluft | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Steffen Udluft is active.

Explore More

Publication

Featured researches published by Steffen Udluft.

Neural Networks: Tricks of the Trade (2nd ed.) | 2012

Solving Partially Observable Reinforcement Learning Problems with Recurrent Neural Networks

Siegmund Duell; Steffen Udluft; Volkmar Sterzing

The aim of this chapter is to provide a series of tricks and recipes for neural state estimation, particularly for real world applications of reinforcement learning. We use various topologies of recurrent neural networks as they allow to identify the continuous valued, possibly high dimensional state space of complex dynamical systems. Recurrent neural networks explicitly offer possibilities to account for time and memory, in principle they are able to model any type of dynamical system. Because of these capabilities recurrent neural networks are a suitable tool to approximate a Markovian state space of dynamical systems. In a second step, reinforcement learning methods can be applied to solve a defined control problem. Besides the trick of using a recurrent neural network for state estimation, various issues regarding real world problems such as, large sets of observables and long-term dependencies are addressed.

international symposium on neural networks | 2007

A Neural Reinforcement Learning Approach to Gas Turbine Control

Anton Maximilian Schaefer; Daniel Schneegass; Volkmar Sterzing; Steffen Udluft

In this paper a new neural network based approach to control a gas turbine for stable operation on high load is presented. A combination of recurrent neural networks (RNN) and reinforcement learning (RL) is used. The authors start by applying an RNN to identify the minimal state space of a gas turbines dynamics. Based on this the optimal control policy is determined by standard RL methods. The authors proceed to the recurrent control neural network, which combines these two steps into one integrated neural network. This approach has the advantage that by using neural networks one can easily deal with the high dimensions of a gas turbine. Due to the high system-identification quality of RNN one can further cope with the only limited amount of available data. The proposed methods are demonstrated on an exemplary gas turbine model where, compared to standard controllers, it strongly improves the performance.

Neurocomputing | 2008

Learning long-term dependencies with recurrent neural networks

Anton Maximilian Schaefer; Steffen Udluft; Hans-Georg Zimmermann

Recurrent neural networks (RNN) unfolded in time are in theory able to map any open dynamical system. Still, they are often blamed to be unable to identify long-term dependencies in the data. Especially when they are trained with backpropagation it is claimed that RNN unfolded in time fail to learn inter-temporal influences more than 10 time steps apart. This paper refutes this often cited statement by giving counter-examples. We show that basic time-delay RNN unfolded in time and formulated as state space models are indeed capable of learning time lags of at least a 100 time steps. We point out that they even possess a self-regularisation characteristic, which adapts the internal error backflow, and analyse their optimal weight initialisation. In addition, we introduce the idea of inflation for modelling of long- and short-term memory and demonstrate that this technique further improves the performance of RNN.

international conference on artificial neural networks | 2007

Improving optimality of neural rewards regression for data-efficient batch near-optimal policy identification

Daniel Schneegaß; Steffen Udluft; Thomas Martinetz

In this paper we present two substantial extensions of Neural Rewards Regression (NRR) [1]. In order to give a less biased estimator of the Bellman Residual and to facilitate the regression character of NRR, we incorporate an improved, Auxiliared Bellman Residual [2] and provide, to the best of our knowledge, the first Neural Network based implementation of the novel Bellman Residual minimisation technique. Furthermore, we extend NRR to Policy Gradient Neural Rewards Regression (PGNRR), where the strategy is directly encoded by a policy network. PGNRR profits from both the data-efficiency of the Rewards Regression approach and the directness of policy search methods. PGNRR further overcomes a crucial drawback of NRR as it extends the accordant problem class considerably by the applicability of continuous action spaces.

2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning | 2007

A Recurrent Control Neural Network for Data Efficient Reinforcement Learning

Anton Maximilian Schaefer; Steffen Udluft; Hans-Georg Zimmermann

In this paper we introduce a new model-based approach for a data-efficient modelling and control of reinforcement learning problems in discrete time. Our architecture is based on a recurrent neural network (RNN) with dynamically consistent overshooting, which we extend by an additional control network. The latter has the particular task to learn the optimal policy. This approach has the advantage that by using a neural network we can easily deal with high-dimensions and consequently are able to break Bellmans curse of dimensionality. Further due to the high system-identification quality of RNN our method is highly data-efficient. Because of its properties we refer to our new model as recurrent control neural network (RCNN). The network is tested on a standard reinforcement learning problem, namely the cart-pole balancing, where it shows especially in terms of data-efficiency outstanding results

international conference on machine learning and applications | 2010

Ensembles of Neural Networks for Robust Reinforcement Learning

Alexander Hans; Steffen Udluft

Reinforcement learning algorithms that employ neural networks as function approximators have proven to be powerful tools for solving optimal control problems. However, their training and the validation of final policies can be cumbersome as neural networks can suffer from problems like local minima or over fitting. When using iterative methods, such as neural fitted Q-iteration, the problem becomes even more pronounced since the network has to be trained multiple times and the training process in one iteration builds on the network trained in the previous iteration. Therefore errors can accumulate. In this paper we propose to use ensembles of networks to make the learning process more robust and produce near-optimal policies more reliably. We name various ways of combining single networks to an ensemble that results in a final ensemble policy and show the potential of the approach using a benchmark application. Our experiments indicate that majority voting is superior to Q-averaging and using heterogeneous ensembles (different network topologies) is advisable.

international symposium on neural networks | 2008

Uncertainty propagation for quality assurance in Reinforcement Learning

Daniel Schneegass; Steffen Udluft; Thomas Martinetz

In this paper we address the reliability of policies derived by Reinforcement Learning on a limited amount of observations. This can be done in a principled manner by taking into account the derived Q-functionpsilas uncertainty, which stems from the uncertainty of the estimators used for the MDPpsilas transition probabilities and the reward function. We apply uncertainty propagation parallelly to the Bellman iteration and achieve confidence intervals for the Q-function. In a second step we change the Bellman operator as to achieve a policy guaranteeing the highest minimum performance with a given probability. We demonstrate the functionality of our method on artificial examples and show that, for an important problem class even an enhancement of the expected performance can be obtained. Finally we verify this observation on an application to gas turbine control.

international conference on artificial neural networks | 2009

Efficient Uncertainty Propagation for Reinforcement Learning with Limited Data

Alexander Hans; Steffen Udluft

In a typical reinforcement learning (RL) setting details of the environment are not given explicitly but have to be estimated from observations. Most RL approaches only optimize the expected value. However, if the number of observations is limited considering expected values only can lead to false conclusions. Instead, it is crucial to also account for the estimators uncertainties. In this paper, we present a method to incorporate those uncertainties and propagate them to the conclusions. By being only approximate, the method is computationally feasible. Furthermore, we describe a Bayesian approach to design the estimators. Our experiments show that the method considerably increases the robustness of the derived policies compared to the standard approach.

Engineering Applications of Artificial Intelligence | 2017

Particle swarm optimization for generating interpretable fuzzy reinforcement learning policies

Daniel Hein; Alexander Hentschel; Thomas A. Runkler; Steffen Udluft

Abstract Fuzzy controllers are efficient and interpretable system controllers for continuous state and action spaces. To date, such controllers have been constructed manually or trained automatically either using expert-generated problem-specific cost functions or incorporating detailed knowledge about the optimal control strategy. Both requirements for automatic training processes are not found in most real-world reinforcement learning (RL) problems. In such applications, online learning is often prohibited for safety reasons because it requires exploration of the problem’s dynamics during policy training. We introduce a fuzzy particle swarm reinforcement learning (FPSRL) approach that can construct fuzzy RL policies solely by training parameters on world models that simulate real system dynamics. These world models are created by employing an autonomous machine learning technique that uses previously generated transition samples of a real system. To the best of our knowledge, this approach is the first to relate self-organizing fuzzy controllers to model-based batch RL. FPSRL is intended to solve problems in domains where online learning is prohibited, system dynamics are relatively easy to model from previously generated default policy transition samples, and it is expected that a relatively easily interpretable control policy exists. The efficiency of the proposed approach with problems from such domains is demonstrated using three standard RL benchmarks, i.e., mountain car, cart-pole balancing, and cart-pole swing-up. Our experimental results demonstrate high-performing, interpretable fuzzy policies.

International Journal of Swarm Intelligence Research | 2016

Reinforcement Learning with Particle Swarm Optimization Policy PSO-P in Continuous State and Action Spaces

Daniel Hein; Alexander Hentschel; Thomas A. Runkler; Steffen Udluft

This article introduces a model-based reinforcement learning RL approach for continuous state and action spaces. While most RL methods try to find closed-form policies, the approach taken here employs numerical on-line optimization of control action sequences. First, a general method for reformulating RL problems as optimization tasks is provided. Subsequently, Particle Swarm Optimization PSO is applied to search for optimal solutions. This Particle Swarm Optimization Policy PSO-P is effective for high dimensional state spaces and does not require a priori assumptions about adequate policy representations. Furthermore, by translating RL problems into optimization tasks, the rich collection of real-world inspired RL benchmarks is made available for benchmarking numerical optimization techniques. The effectiveness of PSO-P is demonstrated on the two standard benchmarks: mountain car and cart pole.

Explore More