Daniel Schneegass | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Daniel Schneegass is active.

Explore More

Publication

Featured researches published by Daniel Schneegass.

international symposium on neural networks | 2007

A Neural Reinforcement Learning Approach to Gas Turbine Control

Anton Maximilian Schaefer; Daniel Schneegass; Volkmar Sterzing; Steffen Udluft

In this paper a new neural network based approach to control a gas turbine for stable operation on high load is presented. A combination of recurrent neural networks (RNN) and reinforcement learning (RL) is used. The authors start by applying an RNN to identify the minimal state space of a gas turbines dynamics. Based on this the optimal control policy is determined by standard RL methods. The authors proceed to the recurrent control neural network, which combines these two steps into one integrated neural network. This approach has the advantage that by using neural networks one can easily deal with the high dimensions of a gas turbine. Due to the high system-identification quality of RNN one can further cope with the only limited amount of available data. The proposed methods are demonstrated on an exemplary gas turbine model where, compared to standard controllers, it strongly improves the performance.

IEEE Transactions on Neural Networks | 2009

SoftDoubleMaxMinOver: Perceptron-Like Training of Support Vector Machines

Thomas Martinetz; Kai Labusch; Daniel Schneegass

The well-known MinOver algorithm is a slight modification of the perceptron algorithm and provides the maximum-margin classifier without a bias in linearly separable two-class classification problems. DoubleMinOver as an extension of MinOver, which now includes a bias, is introduced. An O(t-1) convergence is shown, where t is the number of learning steps. The computational effort per step increases only linearly with the number of patterns. In its formulation with kernels, selected training patterns have to be stored. A drawback of MinOver and DoubleMinOver is that this set of patterns does not consist of support vectors only. DoubleMaxMinOver, as an extension of DoubleMinOver, overcomes this drawback by selectively forgetting all nonsupport vectors after a finite number of training steps. It is shown how this iterative procedure that is still very similar to the perceptron algorithm can be extended to classification with soft margins and be used for training least squares support vector machines (SVMs). On benchmarks, the SoftDoubleMaxMinOver algorithm achieves the same performance as standard SVM software.

international symposium on neural networks | 2008

Uncertainty propagation for quality assurance in Reinforcement Learning

Daniel Schneegass; Steffen Udluft; Thomas Martinetz

In this paper we address the reliability of policies derived by Reinforcement Learning on a limited amount of observations. This can be done in a principled manner by taking into account the derived Q-functionpsilas uncertainty, which stems from the uncertainty of the estimators used for the MDPpsilas transition probabilities and the reward function. We apply uncertainty propagation parallelly to the Bellman iteration and achieve confidence intervals for the Q-function. In a second step we change the Bellman operator as to achieve a policy guaranteeing the highest minimum performance with a given probability. We demonstrate the functionality of our method on artificial examples and show that, for an important problem class even an enhancement of the expected performance can be obtained. Finally we verify this observation on an application to gas turbine control.

Archive | 2010

Uncertainty in Reinforcement Learning - Awareness, Quantisation, and Control

Daniel Schneegass; Alexander Hans; Steffen Udluft

Reinforcement learning (RL) (Sutton & Barto, 1998) is the machine learning answer to the optimal control problem and has been proven to be a promising solution to a wide variety of industrial application domains (e.g., Schaefer et al., 2007; Stephan et al., 2000), including robot control (e.g., Merke & Riedmiller, 2001; Abbeel et al., 2006; Lee et al., 2006; Peters & Schaal, 2008). In contrast to many classical approaches, building upon extensive domain knowledge, RL aims to derive an optimal policy (i.e., control strategy) from observations only, acquired by the exploration of an unknown environment. For a limited amount of observations the collected information may not be sufficient to fully determine the environment’s properties. Assuming the environment to be a Markov decision process (MDP), it is in general only possible to create estimators for the MDP’s transition probabilities and the reward function. As the true parameters remain uncertain, the derived policy that is optimal w.r.t. the estimators is in general not optimal w.r.t. the real MDP and may even perform insufficiently. This is unacceptable in industrial environments with high requirements not only on performance, but also robustness and quality assurance. To overcome this problem, we incorporate the uncertainties of the estimators into the derived Q-function, which is utilised by many RL methods. In order to guarantee a minimal performance with a given probability, as a solution to quality assurance, we present an approach using statistical uncertainty propagation (UP) (e.g., D’Agostini, 2003) on the Bellman iteration to obtain Q-functions together with their uncertainty. In a second step, we introduce a modified Bellman operator, jointly optimising the Q-function and minimising its uncertainty. This method leads to a policy that is no more optimal in the conventional meaning, but maximizes the guaranteed minimal performance and hence optimises the quality requirements. In addition, we show that the approach can be used for efficient exploration as well. In the following we apply the technique exemplarily on discrete MDPs. This chapter is organised as follows. Within the introduction we give an overview of RL and uncertainty and report on related work. The key section 2 discusses how to bring the concepts of RL and uncertainty together. We explain the application of uncertainty propagation to the Bellman iteration for policy evaluation and policy iteration for discrete MDPs and proceed with section 3, where we introduce the concept of certain-optimality. We further discuss the important observation that certain-optimal policies are stochastic in general (section 4), having a direct impact on the algorithmic solution. Our approach provides a general framework for

Neurocomputing | 2009

Letters: On the bias of batch Bellman residual minimisation

Daniel Schneegass

This letter addresses the problem of Bellman residual minimisation in reinforcement learning for the model-free batch case. We prove the simple, but not necessarily obvious result, that no unbiased estimate of the Bellman residual exists for a single trajectory of observations. We further pick up the recent suggestion of Antos et al. [Learning near-optimal policies with Bellman-residual minimisation based fitted policy iteration and a single sample path, in: COLT, 2006, pp. 574-588] for approximative Bellman residual minimisation and discuss its properties concerning consistency, biasedness, and optimality. We finally give a suggestion to improve the optimality.

Archive | 2008