Anna Harutyunyan
Vrije Universiteit Brussel
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Anna Harutyunyan.
international symposium on neural networks | 2014
Tim Brys; Anna Harutyunyan; Peter Vrancx; Matthew E. Taylor; Daniel Kudenko; Ann Nowé
Multi-objectivization is the process of transforming a single objective problem into a multi-objective problem. Research in evolutionary optimization has demonstrated that the addition of objectives that are correlated with the original objective can make the resulting problem easier to solve compared to the original single-objective problem. In this paper we investigate the multi-objectivization of reinforcement learning problems. We propose a novel method for the multi-objectivization of Markov Decision problems through the use of multiple reward shaping functions. Reward shaping is a technique to speed up reinforcement learning by including additional heuristic knowledge in the reward signal. The resulting composite reward signal is expected to be more informative during learning, leading the learner to identify good actions more quickly. Good reward shaping functions are by definition correlated with the target value function for the base reward signal, and we show in this paper that adding several correlated signals can help to solve the basic single objective problem faster and better. We prove that the total ordering of solutions, and by consequence the optimality of solutions, is preserved in this process, and empirically demonstrate the usefulness of this approach on two reinforcement learning tasks: a pathfinding problem and the Mario domain.
european conference on artificial intelligence | 2014
Anna Harutyunyan; Tim Brys; Peter Vrancx; Ann Nowé
In this work we propose learning an ensemble of policies related through potential-based shaping rewards via the off-policy Horde framework.
international conference on robotics and automation | 2016
Kevin Tanghe; Anna Harutyunyan; Erwin Aertbeliën; Friedl De Groote; Joris De Schutter; Peter Vrancx; Ann Nowé
Accurate and reliable event prediction is imperative for supporting movement with an exoskeleton. Two events are important during a sit-to-stand movement: seat-off, the event at which the subject leaves the chair and start-of-assistance for hip and knee, the earliest time at which assistance may be provided. This letter analyzes two methods to predict and detect these events. Both methods only have joint encoder data as input. The model-based method uses probabilistic principle component analysis with a Kalman filter. Based on a statistically learned model, a joint trajectory is predicted. The seat-off event is predicted using its correlation with maximum hip angle. Since the start-of-assistance event has no clear correlation with joint trajectories, it cannot be detected with this method. The model-free method is a feed-forward neural network, which learns a mapping between inputs and events directly. It is applied to both seat-off prediction and start-of-assistance detection. Methods have been evaluated on 311 lab-recorded movements. For the seat-off event, the model-based method is more reliable than the model-free method. For the start-of-assistance event, the model-free method performs well, except in an outlier case for one subject. Both of these methods allow accurate and reliable event prediction, only using joint encoder data as inputs.
algorithmic learning theory | 2016
Anna Harutyunyan; Marc G. Bellemare; Tom Stepleton; Rémi Munos
We propose and analyze an alternate approach to off-policy multi-step temporal difference learning, in which off-policy returns are corrected with the current Q-function in terms of rewards, rather than with the target policy in terms of transition probabilities. We prove that such approximate corrections are sufficient for off-policy convergence both in policy evaluation and control, provided certain conditions. These conditions relate the distance between the target and behavior policies, the eligibility trace parameter and the discount factor, and formalize an underlying tradeoff in off-policy TD(\(\lambda \)). We illustrate this theoretical relationship empirically on a continuous-state control task.
Neurocomputing | 2017
Tim Brys; Anna Harutyunyan; Peter Vrancx; Ann Nowé; Matthew E. Taylor
Abstract Ensemble techniques are a powerful approach to creating better decision makers in machine learning. Multiple decision makers are trained to solve a given task, grouped in an ensemble, and their decisions are aggregated. The ensemble derives its power from the diversity of its components, as the assumption is that they make mistakes on different inputs, and that the majority is more likely to be correct than any individual component. Diversity usually comes from the different algorithms employed by the decision makers, or the different inputs used to train the decision makers. We advocate a third way to achieve this diversity, called diversity of evaluation, using the principle of multi-objectivization . This is the process of taking a single-objective problem and transforming it into a multi-objective problem in order to solve the original problem faster and/or better. This is either done through decomposition of the original objective, or the addition of extra objectives, typically based on some (heuristic) domain knowledge. This process basically creates a diverse set of feedback signals for what is underneath still a single-objective problem. In the context of ensemble techniques, these various ways to evaluate a (solution to a) problem allow different components of the ensemble to look at the problem in different ways, generating the necessary diversity for the ensemble. In this paper, we argue for the combination of multi-objectivization and ensemble techniques as a powerful tool to boost solving performance in reinforcement learning. We inject various pieces of heuristic information through reward shaping, creating several distinct enriched reward signals, which can strategically be combined using ensemble techniques to reduce sample complexity. We provide theoretical guarantees and demonstrate the potential of the approach with a range of experiments.
international workshop on combinatorial algorithms | 2013
Glencora Borradaile; Anna Harutyunyan
Minimum cuts have been closely related to shortest paths in planar graphs via planar duality - so long as the graphs are undirected. Even maximum flows are closely related to shortest paths for the same reason - so long as the source and the sink are on a common face. In this paper, we give a correspondence between maximum flows and shortest paths via duality in directed planar graphs with no constraints on the source and sink. We believe this a promising avenue for developing algorithms that are more practical than the current asymptotically best algorithms for maximum st-flow.
international workshop on combinatorial algorithms | 2013
Glencora Borradaile; Anna Harutyunyan
We give an iterative algorithm for finding the maximum flow between a set of sources and sinks that lie on the boundary of a planar graph. Our algorithm uses only O(n) queries to simple data structures, achieving an O(n logn) running time that we expect to be practical given the use of simple primitives. The only existing algorithm for this problem uses divide and conquer and, in order to achieve an O(n logn) running time, requires the use of the (complicated) linear-time shortest-paths algorithm for planar graphs.
Sensors | 2017
Stefan Lambrecht; Anna Harutyunyan; Kevin Tanghe; Maarten Afschrift; Joris De Schutter; Ilse Jonkers
Real-time detection of multiple stance events, more specifically initial contact (IC), foot flat (FF), heel off (HO), and toe off (TO), could greatly benefit neurorobotic (NR) and neuroprosthetic (NP) control. Three real-time threshold-based algorithms have been developed, detecting the aforementioned events based on kinematic data in combination with a biomechanical model. Data from seven subjects walking at three speeds on an instrumented treadmill were used to validate the presented algorithms, accumulating to a total of 558 steps. The reference for the gait events was obtained using marker and force plate data. All algorithms had excellent precision and no false positives were observed. Timing delays of the presented algorithms were similar to current state-of-the-art algorithms for the detection of IC and TO, whereas smaller delays were achieved for the detection of FF. Our results indicate that, based on their high precision and low delays, these algorithms can be used for the control of an NR/NP, with the exception of the HO event. Kinematic data is used in most NR/NP control schemes and is thus available at no additional cost, resulting in a minimal computational burden. The presented methods can also be applied for screening pathological gait or gait analysis in general in/outside of the laboratory.
international conference on artificial intelligence | 2015
Tim Brys; Anna Harutyunyan; Halit Bener Suay; Sonia Chernova; Matthew E. Taylor; Ann Nowé
national conference on artificial intelligence | 2015
Anna Harutyunyan; Sam Devlin; Peter Vrancx; Ann Nowé