Is this you? Create Your Porfile

Michel Tokic

University of Applied Sciences Ravensburg-Weingarten

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Michel Tokic is active.

Explore More

Publication

Featured researches published by Michel Tokic.

IFAC Proceedings Volumes | 2012

Robust Exploration/Exploitation Trade-Offs in Safety-Critical Applications

Michel Tokic; Philipp Ertle; Günther Palm; Dirk Söffker; Holger Voos

Abstract With regard to future service robots, unsafe exceptional circumstances can occur in complex systems that are hardly to foresee. In this paper, the assumption of having no knowledge about the environment is investigated using reinforcement learning as an option for learning behavior by trial-and-error. In such a scenario, action-selection decisions are made based on future reward predictions for minimizing costs in reaching a goal. It is shown that the selection of safety-critical actions leading to highly negative costs from the environment is directly related to the exploration/exploitation dilemma in temporal-difference learning. For this, several exploration policies are investigated with regard to worst- and best-case performance in a dynamic environment. Our results show that in contrast to established exploration policies like e-Greedy and Softmax, the recently proposed VDBE-Softmax policy seems to be more appropriate for such applications due to its robustness of the exploration parameter for unexpected situations.

IAPR International Workshop on Partially Supervised Learning | 2013

Meta-Learning of Exploration and Exploitation Parameters with Replacing Eligibility Traces

Michel Tokic; Friedhelm Schwenker; Günther Palm

When developing autonomous learning agents, the performance depends crucially on the selection of reasonable learning parameters, for example learning rates or exploration parameters. In this work we investigate meta-learning of exploration parameters by using the “REINFORCE exploration control” (REC) framework, and combine REC with replacing eligibility traces, which are a basic mechanism for tackling the problem of delayed rewards in reinforcement learning. We show empirically for a robot example and the mountain–car problem with two goals how the proposed combination can help to improve learning performance. Furthermore, we also observe that the setting of time constant \(\lambda \) is not straightforward, because it is intimately interrelated with the learning rate \(\alpha \).

international conference on artificial neural networks | 2012

Adaptive exploration using stochastic neurons

Michel Tokic; Günther Palm

Stochastic neurons are deployed for efficient adaptation of exploration parameters by gradient-following algorithms. The approach is evaluated in model-free temporal-difference learning using discrete actions. The advantage is in particular memory efficiency, because memorizing exploratory data is only required for starting states. Hence, if a learning problem consist of only one starting state, exploratory data can be considered as being global. Results suggest that the presented approach can be efficiently combined with standard off- and on-policy algorithms such as Q-learning and Sarsa.

intelligent robots and systems | 2012

Towards learning of safety knowledge from human demonstrations

Philipp Ertle; Michel Tokic; Richard Cubek; Holger Voos; Dirk Söffker

Future autonomous service robots are intended to operate in open and complex environments. This in turn implies complications ensuring safe operation. The tenor of few available investigations is the need for dynamically assessing operational risks. Furthermore, a new kind of hazards being implicated by the robots capability to manipulate the environment occurs: hazardous environmental object interactions. One of the open questions in safety research is integrating safety knowledge into robotic systems, enabling these systems behaving safety-conscious in hazardous situations. In this paper a safety procedure is described, in which learning of safety knowledge from human demonstration is considered. Within the procedure, a task is demonstrated to the robot, which observes object-to-object relations and labels situational data as commanded by the human. Based on this data, several supervised learning techniques are evaluated used for finally extracting safety knowledge. Results indicate that Decision Trees allow interesting opportunities.

artificial neural networks in pattern recognition | 2012

Gradient algorithms for exploration/exploitation trade-offs: global and local variants

Michel Tokic; Günther Palm

Gradient-following algorithms are deployed for efficient adaptation of exploration parameters in temporal-difference learning with discrete action spaces. Global and local variants are evaluated in discrete and continuous state spaces. The global variant is memory efficient in terms of requiring exploratory data only for starting states. In contrast, the local variant requires exploratory data for each state of the state space, but produces exploratory behavior only in states with improvement potential. Our results suggest that gradient-based exploration can be efficiently used in combination with off- and on-policy algorithms such as Q-learning and Sarsa.

frontiers in education conference | 2011

Work in progress — Programming in a confined space — A case study in porting modern robot software to an antique platform

Stacey L. Montresor; Jennifer S. Kay; Michel Tokic; Jonathan M. Summerton

In a typical introductory AI class, the topic of reinforcement learning may be allocated only a few hours of class time. One engaging example of reinforcement learning uses a crawling robot that learns to use its two-degree-of-freedom arm to drag itself forward. Unfortunately, the cost of the required hardware is prohibitively expensive for many departments for what is typically a once-a-semester demonstration. So we decided to port the algorithm to a platform that many departments may already have on hand: the LEGO Mindstorms RCX 2.0. Initially the task seemed relatively straightforward: build a robot base out of LEGO parts and implement the algorithm in the Not Quite C language. However the challenges of designing a robot arm without servos and attempting to trim code down to a size that would fit on the RCX has proven to be as educational to the undergraduates working on the project as we hope the final product will be to students in AI classes. This paper describes the challenges we have faced and the solutions we have implemented, as well as the work that remains to be completed.

KI'10 Proceedings of the 33rd annual German conference on Advances in artificial intelligence | 2010