Herke van Hoof
Technische Universität Darmstadt
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Herke van Hoof.
IEEE Transactions on Robotics | 2014
Herke van Hoof; Oliver Kroemer; Jan Peters
Creating robots that can act autonomously in dynamic unstructured environments requires dealing with novel objects. Thus, an offline learning phase is not sufficient for recognizing and manipulating such objects. Rather, an autonomous robot needs to acquire knowledge through its own interaction with its environment, without using heuristics encoding human insights about the domain. Interaction also allows information that is not present in static images of a scene to be elicited. Out of a potentially large set of possible interactions, a robot must select actions that are expected to have the most informative outcomes to learn efficiently. In the proposed bottom-up probabilistic approach, the robot achieves this goal by quantifying the expected informativeness of its own actions in information-theoretic terms. We use this approach to segment a scene into its constituent objects. We retain a probability distribution over segmentations. We show that this approach is robust in the presence of noise and uncertainty in real-world experiments. Evaluations show that the proposed information-theoretic approach allows a robot to efficiently determine the composite structure of its environment. We also show that our probabilistic model allows straightforward integration of multiple modalities, such as movement data and static scene features. Learned static scene features allow for experience from similar environments to speed up learning for new scenes.
intelligent robots and systems | 2012
Herke van Hoof; Oliver Kroemer; Heni Ben Amor; Jan Peters
Creating robots that can act autonomously in dynamic, unstructured environments is a major challenge. In such environments, learning to recognize and manipulate novel objects is an important capability. A truly autonomous robot acquires knowledge through interaction with its environment without using heuristics or prior information encoding human domain insights. Static images often provide insufficient information for inferring the relevant properties of the objects in a scene. Hence, a robot needs to explore these objects by interacting with them. However, there may be many exploratory actions possible, and a large portion of these actions may be non-informative. To learn quickly and efficiently, a robot must select actions that are expected to have the most informative outcomes. In the proposed bottom-up approach, the robot achieves this goal by quantifying the expected informativeness of its own actions. We use this approach to segment a scene into its constituent objects as a first step in learning the properties and affordances of objects. Evaluations showed that the proposed information-theoretic approach allows a robot to efficiently infer the composite structure of its environment.
international conference on robotics and automation | 2015
Oliver Kroemer; Christian Daniel; Gerhard Neumann; Herke van Hoof; Jan Peters
Most manipulation tasks can be decomposed into a sequence of phases, where the robots actions have different effects in each phase. The robot can perform actions to transition between phases and, thus, alter the effects of its actions, e.g. grasp an object in order to then lift it. The robot can thus reach a phase that affords the desired manipulation. In this paper, we present an approach for exploiting the phase structure of tasks in order to learn manipulation skills more efficiently. Starting with human demonstrations, the robot learns a probabilistic model of the phases and the phase transitions. The robot then employs model-based reinforcement learning to create a library of motor primitives for transitioning between phases. The learned motor primitives generalize to new situations and tasks. Given this library, the robot uses a value function approach to learn a high-level policy for sequencing the motor primitives. The proposed method was successfully evaluated on a real robot performing a bimanual grasping task.
Machine Learning | 2016
Christian Daniel; Herke van Hoof; Jan Peters; Gerhard Neumann
Tasks that require many sequential decisions or complex solutions are hard to solve using conventional reinforcement learning algorithms. Based on the semi Markov decision process setting (SMDP) and the option framework, we propose a model which aims to alleviate these concerns. Instead of learning a single monolithic policy, the agent learns a set of simpler sub-policies as well as the initiation and termination probabilities for each of those sub-policies. While existing option learning algorithms frequently require manual specification of components such as the sub-policies, we present an algorithm which infers all relevant components of the option framework from data. Furthermore, the proposed approach is based on parametric option representations and works well in combination with current policy search methods, which are particularly well suited for continuous real-world tasks. We present results on SMDPs with discrete as well as continuous state-action spaces. The results show that the presented algorithm can combine simple sub-policies to solve complex tasks and can improve learning performance on simpler tasks.
ieee-ras international conference on humanoid robots | 2015
Herke van Hoof; Tucker Hermans; Gerhard Neumann; Jan Peters
Dexterous manipulation enables repositioning of objects and tools within a robots hand. When applying dexterous manipulation to unknown objects, exact object models are not available. Instead of relying on models, compliance and tactile feedback can be exploited to adapt to unknown objects. However, compliant hands and tactile sensors add complexity and are themselves difficult to model. Hence, we propose acquiring in-hand manipulation skills through reinforcement learning, which does not require analytic dynamics or kinematics models. In this paper, we show that this approach successfully acquires a tactile manipulation skill using a passively compliant hand. Additionally, we show that the learned tactile skill generalizes to novel objects.
intelligent robots and systems | 2015
Filipe Veiga; Herke van Hoof; Jan Peters; Tucker Hermans
During grasping and other in-hand manipulation tasks maintaining a stable grip on the object is crucial for the tasks outcome. Inherently connected to grip stability is the concept of slip. Slip occurs when the contact between the fingertip and the object is partially lost, resulting in sudden undesired changes to the objects state. While several approaches for slip detection have been proposed in the literature, they frequently rely on previous knowledge of the manipulated object. This previous knowledge may be unavailable, seeing that robots operating in real-world scenarios often must interact with previously unseen objects. In our work we explore the generalization capabilities of well known supervised learning methods, using random forest classifiers to create generalizable slip predictors. We utilize these classifiers in the feedback loop of an object stabilization controller. We show that the controller can successfully stabilize previously unknown objects by predicting and counteracting slip events.
intelligent robots and systems | 2016
Herke van Hoof; Nutan Chen; Maximilian Karl; Patrick van der Smagt; Jan Peters
For many tasks, tactile or visual feedback is helpful or even crucial. However, designing controllers that take such high-dimensional feedback into account is non-trivial. Therefore, robots should be able to learn tactile skills through trial and error by using reinforcement learning algorithms. The input domain for such tasks, however, might include strongly correlated or non-relevant dimensions, making it hard to specify a suitable metric on such domains. Auto-encoders specialize in finding compact representations, where defining such a metric is likely to be easier. Therefore, we propose a reinforcement learning algorithm that can learn non-linear policies in continuous state spaces, which leverages representations learned using auto-encoders. We first evaluate this method on a simulated toy-task with visual input. Then, we validate our approach on a real-robot tactile stabilization task.
intelligent robots and systems | 2016
Zhengkun Yi; Roberto Calandra; Filipe Veiga; Herke van Hoof; Tucker Hermans; Yilei Zhang; Jan Peters
Accurate object shape knowledge provides important information for performing stable grasping and dexterous manipulation. When modeling an object using tactile sensors, touching the object surface at a fixed grid of points can be sample inefficient. In this paper, we present an active touch strategy to efficiently reduce the surface geometry uncertainty by leveraging a probabilistic representation of object surface. In particular, we model the object surface using a Gaussian process and use the associated uncertainty information to efficiently determine the next point to explore. We validate the resulting method for tactile object surface modeling using a real robot to reconstruct multiple, complex object surfaces.
Machine Learning | 2017
Herke van Hoof; Daniel Tanneberg; Jan Peters
To learn control policies in unknown environments, learning agents need to explore by trying actions deemed suboptimal. In prior work, such exploration is performed by either perturbing the actions at each time-step independently, or by perturbing policy parameters over an entire episode. Since both of these strategies have certain advantages, a more balanced trade-off could be beneficial. We introduce a unifying view on step-based and episode-based exploration that allows for such balanced trade-offs. This trade-off strategy can be used with various reinforcement learning algorithms. In this paper, we study this generalized exploration strategy in a policy gradient method and in relative entropy policy search. We evaluate the exploration strategy on four dynamical systems and compare the results to the established step-based and episode-based exploration strategies. Our results show that a more balanced trade-off can yield faster learning and better final policies, and illustrate some of the effects that cause these performance differences.
international conference on artificial intelligence and statistics | 2015
Herke van Hoof; Jan Peters; Gerhard Neumann