Thomas Degris
University of Alberta
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Thomas Degris.
ieee international conference on rehabilitation robotics | 2011
Patrick M. Pilarski; Michael R. W. Dawson; Thomas Degris; Farbod Fahimi; Jason P. Carey; Richard S. Sutton
As a contribution toward the goal of adaptable, intelligent artificial limbs, this work introduces a continuous actor-critic reinforcement learning method for optimizing the control of multi-function myoelectric devices. Using a simulated upper-arm robotic prosthesis, we demonstrate how it is possible to derive successful limb controllers from myoelectric data using only a sparse human-delivered training signal, without requiring detailed knowledge about the task domain. This reinforcement-based machine learning framework is well suited for use by both patients and clinical staff, and may be easily adapted to different application domains and the needs of individual amputees. To our knowledge, this is the first my-oelectric control approach that facilitates the online learning of new amputee-specific motions based only on a one-dimensional (scalar) feedback signal provided by the user of the prosthesis.
international conference on machine learning | 2006
Thomas Degris; Olivier Sigaud; Pierre-Henri Wuillemin
Recent decision-theoric planning algorithms are able to find optimal solutions in large problems, using Factored Markov Decision Processes (FMDPs). However, these algorithms need a perfect knowledge of the structure of the problem. In this paper, we propose SDYNA, a general framework for addressing large reinforcement learning problems by trial-and-error and with no initial knowledge of their structure. SDYNA integrates incremental planning algorithms based on FMDPs with supervised learning techniques building structured representations of the problem. We describe SPITI, an instantiation of SDYNA, that uses incremental decision tree induction to learn the structure of a problem combined with an incremental version of the Structured Value Iteration algorithm. We show that SPITI can build a factored representation of a reinforcement learning problem and may improve the policy faster than tabular reinforcement learning algorithms by exploiting the generalization property of decision tree induction algorithms.
advances in computing and communications | 2012
Thomas Degris; Patrick M. Pilarski; Richard S. Sutton
Reinforcement learning methods are often considered as a potential solution to enable a robot to adapt to changes in real time to an unpredictable environment. However, with continuous action, only a few existing algorithms are practical for real-time learning. In such a setting, most effective methods have used a parameterized policy structure, often with a separate parameterized value function. The goal of this paper is to assess such actor-critic methods to form a fully specified practical algorithm. Our specific contributions include 1) developing the extension of existing incremental policy-gradient algorithms to use eligibility traces, 2) an empirical comparison of the resulting algorithms using continuous actions, 3) the evaluation of a gradient-scaling technique that can significantly improve performance. Finally, we apply our actor-critic algorithm to learn on a robotic platform with a fast sensorimotor cycle (10ms). Overall, these results constitute an important step towards practical real-time learning control with continuous action.
IEEE Robotics & Automation Magazine | 2013
Patrick M. Pilarski; Michael R. W. Dawson; Thomas Degris; Jason P. Carey; K. M. Chan; Jacqueline S. Hebert; Richard S. Sutton
Predicting the future has long been regarded as a powerful means to improvement and success. The ability to make accurate and timely predictions enhances our ability to control our situation and our environment. Assistive robotics is one prominent area in which foresight of this kind can bring improved quality of life. In this article, we present a new approach to acquiring and maintaining predictive knowledge during the online ongoing operation of an assistive robot. The ability to learn accurate, temporally abstracted predictions is shown through two case studies: 1) able-bodied myoelectric control of a robot arm and 2) an amputees interactions with a myoelectric training robot. To our knowledge, this research is the first demonstration of a practical method for real-time prediction learning during myoelectric control. Our approach therefore represents a fundamental tool for addressing one major unsolved problem: amputee-specific adaptation during the ongoing operation of a prosthetic device. The findings in this article also contribute a first explicit look at prediction learning in prosthetics as an important goal in its own right, independent of its intended use within a specific controller or system. Our results suggest that real-time learning of predictions and anticipations is a significant step toward more intuitive myoelectric prostheses and other assistive robotic devices.
international conference on acoustics, speech, and signal processing | 2012
Ashique Rupam Mahmood; Richard S. Sutton; Thomas Degris; Patrick M. Pilarski
Incremental learning algorithms based on gradient descent are effective and popular in online supervised learning, reinforcement learning, signal processing, and many other application areas. An oft-noted drawback of these algorithms is that they include a step-size parameter that needs to be tuned for best performance, which may require manual intervention and significant domain knowledge or additional data. In many cases, an entire vector of step-size parameters (e.g., one for each input feature) needs to be tuned in order to attain the best performance of the algorithm. To address this, several methods have been proposed for adapting step sizes online. For example, Suttons IDBD method can find the best vector step size for the LMS algorithm, and Schraudolphs ELK1 method, an extension of IDBD to neural networks, has proven effective on large applications, such as 3D hand tracking. However, to date all such step-size adaptation methods have included a tunable step-size parameter of their own, which we call the meta-step-size parameter. In this paper we show that the performance of existing step-size adaptation methods are strongly dependent on the choice of their meta-step-size parameter and that their meta-step-size parameter cannot be set reliably in a problem-independent way. We introduce a series of modifications and normalizations to the IDBD method that together eliminate the need to tune the meta-step-size parameter to the particular problem. We show that the resulting overall algorithm, called Autostep, performs as well or better than the existing step-size adaptation methods on a number of idealized and robot prediction problems and does not require any tuning of its meta-step-size parameter. The ideas behind Autostep are not restricted to the IDBD method and the same principles are potentially applicable to other incremental learning settings, such as reinforcement learning.
Nature | 2018
Andrea Banino; Caswell Barry; Benigno Uria; Charles Blundell; Timothy P. Lillicrap; Piotr Mirowski; Alexander Pritzel; Martin J. Chadwick; Thomas Degris; Joseph Modayil; Greg Wayne; Hubert Soyer; Fabio Viola; Brian Zhang; Ross Goroshin; Neil C. Rabinowitz; Razvan Pascanu; Charlie Beattie; Stig Petersen; Amir Sadik; Stephen Gaffney; Helen King; Koray Kavukcuoglu; Demis Hassabis; Raia Hadsell; Dharshan Kumaran
Deep neural networks have achieved impressive successes in fields ranging from object recognition to complex games such as Go1,2. Navigation, however, remains a substantial challenge for artificial agents, with deep neural networks trained by reinforcement learning3–5 failing to rival the proficiency of mammalian spatial behaviour, which is underpinned by grid cells in the entorhinal cortex6. Grid cells are thought to provide a multi-scale periodic representation that functions as a metric for coding space7,8 and is critical for integrating self-motion (path integration)6,7,9 and planning direct trajectories to goals (vector-based navigation)7,10,11. Here we set out to leverage the computational functions of grid cells to develop a deep reinforcement learning agent with mammal-like navigational abilities. We first trained a recurrent network to perform path integration, leading to the emergence of representations resembling grid cells, as well as other entorhinal cell types12. We then showed that this representation provided an effective basis for an agent to locate goals in challenging, unfamiliar, and changeable environments—optimizing the primary objective of navigation through deep reinforcement learning. The performance of agents endowed with grid-like representations surpassed that of an expert human and comparison agents, with the metric quantities necessary for vector-based navigation derived from grid-like units within the network. Furthermore, grid-like representations enabled agents to conduct shortcut behaviours reminiscent of those performed by mammals. Our findings show that emergent grid-like representations furnish agents with a Euclidean spatial metric and associated vector operations, providing a foundation for proficient navigation. As such, our results support neuroscientific theories that see grid cells as critical for vector-based navigation7,10,11, demonstrating that the latter can be combined with path-based strategies to support navigation in challenging environments.Grid-like representations emerge spontaneously within a neural network trained to self-localize, enabling the agent to take shortcuts to destinations using vector-based navigation.
Neurocomputing | 2004
Thomas Degris; Olivier Sigaud; Sidney I. Wiener; Angelo Arleo
Abstract In this model of the head direction cells in the limbic areas of the rat brain, the intrinsic dynamics of the system is determined by a continuous attractor network of spiking neurons. Synaptic excitation is mediated by NMDA and AMPA formal receptors, while inhibition depends on GABA receptors. We focus on the temporal aspects of state transitions of the system following reorientation of visual cues. The model reproduces the short latencies (80 ms ) observed in recordings of the anterodorsal thalamic nucleus. The model makes an experimentally testable prediction concerning the state update dynamics as a function of the magnitude of the reorientation angle.
In: (pp. pp. 605-619). (2014) | 2014
David Silver; Guy Lever; Nicolas Heess; Thomas Degris; Daniël Pieter Wierstra; Martin A. Riedmiller
adaptive agents and multi agents systems | 2011
Richard S. Sutton; Joseph Modayil; Michael Delp; Thomas Degris; Patrick M. Pilarski; Adam White; Doina Precup
international conference on machine learning | 2012
Thomas Degris; Martha White; Richard S. Sutton