Alexander Hans | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Alexander Hans is active.

Explore More

Publication

Featured researches published by Alexander Hans.

international conference on machine learning and applications | 2010

Ensembles of Neural Networks for Robust Reinforcement Learning

Alexander Hans; Steffen Udluft

Reinforcement learning algorithms that employ neural networks as function approximators have proven to be powerful tools for solving optimal control problems. However, their training and the validation of final policies can be cumbersome as neural networks can suffer from problems like local minima or over fitting. When using iterative methods, such as neural fitted Q-iteration, the problem becomes even more pronounced since the network has to be trained multiple times and the training process in one iteration builds on the network trained in the previous iteration. Therefore errors can accumulate. In this paper we propose to use ensembles of networks to make the learning process more robust and produce near-optimal policies more reliably. We name various ways of combining single networks to an ensemble that results in a final ensemble policy and show the potential of the approach using a benchmark application. Our experiments indicate that majority voting is superior to Q-averaging and using heterogeneous ensembles (different network topologies) is advisable.

international conference on artificial neural networks | 2009

Efficient Uncertainty Propagation for Reinforcement Learning with Limited Data

Alexander Hans; Steffen Udluft

In a typical reinforcement learning (RL) setting details of the environment are not given explicitly but have to be estimated from observations. Most RL approaches only optimize the expected value. However, if the number of observations is limited considering expected values only can lead to false conclusions. Instead, it is crucial to also account for the estimators uncertainties. In this paper, we present a method to incorporate those uncertainties and propagate them to the conclusions. By being only approximate, the method is computationally feasible. Furthermore, we describe a Bayesian approach to design the estimators. Our experiments show that the method considerably increases the robustness of the derived policies compared to the standard approach.

ieee symposium on adaptive dynamic programming and reinforcement learning | 2011

Agent self-assessment: Determining policy quality without execution

Alexander Hans; Siegmund Duell; Steffen Udluft

With the development of data-efficient reinforcement learning (RL) methods, a promising data-driven solution for optimal control of complex technical systems has become available. For the application of RL to a technical system, it is usually required to evaluate a policy before actually applying it to ensure it operates the system safely and within required performance bounds. In benchmark applications one can use the system dynamics directly to measure the policy quality. In real applications, however, this might be too expensive or even impossible. Being unable to evaluate the policy without using the actual system hinders the application of RL to autonomous controllers. As a first step toward agent self-assessment, we deal with discrete MDPs in this paper. We propose to use the value function along with its uncertainty to assess a policys quality and show that, when dealing with an MDP estimated from observations, the value function itself can be misleading. We address this problem by determining the value functions uncertainty through uncertainty propagation and evaluate the approach using a number of benchmark applications.

Archive | 2010

Uncertainty in Reinforcement Learning - Awareness, Quantisation, and Control

Daniel Schneegass; Alexander Hans; Steffen Udluft

Reinforcement learning (RL) (Sutton & Barto, 1998) is the machine learning answer to the optimal control problem and has been proven to be a promising solution to a wide variety of industrial application domains (e.g., Schaefer et al., 2007; Stephan et al., 2000), including robot control (e.g., Merke & Riedmiller, 2001; Abbeel et al., 2006; Lee et al., 2006; Peters & Schaal, 2008). In contrast to many classical approaches, building upon extensive domain knowledge, RL aims to derive an optimal policy (i.e., control strategy) from observations only, acquired by the exploration of an unknown environment. For a limited amount of observations the collected information may not be sufficient to fully determine the environment’s properties. Assuming the environment to be a Markov decision process (MDP), it is in general only possible to create estimators for the MDP’s transition probabilities and the reward function. As the true parameters remain uncertain, the derived policy that is optimal w.r.t. the estimators is in general not optimal w.r.t. the real MDP and may even perform insufficiently. This is unacceptable in industrial environments with high requirements not only on performance, but also robustness and quality assurance. To overcome this problem, we incorporate the uncertainties of the estimators into the derived Q-function, which is utilised by many RL methods. In order to guarantee a minimal performance with a given probability, as a solution to quality assurance, we present an approach using statistical uncertainty propagation (UP) (e.g., D’Agostini, 2003) on the Bellman iteration to obtain Q-functions together with their uncertainty. In a second step, we introduce a modified Bellman operator, jointly optimising the Q-function and minimising its uncertainty. This method leads to a policy that is no more optimal in the conventional meaning, but maximizes the guaranteed minimal performance and hence optimises the quality requirements. In addition, we show that the approach can be used for efficient exploration as well. In the following we apply the technique exemplarily on discrete MDPs. This chapter is organised as follows. Within the introduction we give an overview of RL and uncertainty and report on related work. The key section 2 discusses how to bring the concepts of RL and uncertainty together. We explain the application of uncertainty propagation to the Bellman iteration for policy evaluation and policy iteration for discrete MDPs and proceed with section 3, where we introduce the concept of certain-optimality. We further discuss the important observation that certain-optimal policies are stochastic in general (section 4), having a direct impact on the algorithmic solution. Our approach provides a general framework for

the european symposium on artificial neural networks | 2008