Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Philip S. Thomas is active.

Publication


Featured researches published by Philip S. Thomas.


international conference on development and learning | 2012

Motor primitive discovery

Philip S. Thomas; Andrew G. Barto

We present a method for autonomous on-line discovery of motor primitives for Markov decision processes with high-dimensional continuous action spaces. These biologically-inspired motor primitives require overhead to compute but form a compressed representation of the action set that allows for improved performance on subsequent learning tasks that have similar dynamics.


IEEE Transactions on Human-Machine Systems | 2016

Human-Like Rewards to Train a Reinforcement Learning Controller for Planar Arm Movement

Kathleen M. Jagodnik; Philip S. Thomas; Antonie J. van den Bogert; Michael S. Branicky; Robert F. Kirsch

High-level spinal cord injury (SCI) in humans causes paralysis below the neck. Functional electrical stimulation (FES) technology applies electrical current to nerves and muscles to restore movement, and controllers for upper extremity FES neuroprostheses calculate stimulation patterns to produce desired arm movement. However, currently available FES controllers have yet to restore natural movements. Reinforcement learning (RL) is a reward-driven control technique; it can employ user-generated rewards, and human preferences can be used in training. To test this concept with FES, we conducted simulation experiments using computer-generated “pseudo-human” rewards. Rewards with varying properties were used with an actor-critic RL controller for a planar two-degree-of-freedom biomechanical human arm model performing reaching movements. Results demonstrate that sparse, delayed pseudo-human rewards permit stable and effective RL controller learning. The frequency of reward is proportional to learning success, and human-scale sparse rewards permit greater learning than exclusively automated rewards. Diversity of training task sets did not affect learning. Long-term stability of trained controllers was observed. Using human-generated rewards to train RL controllers for upper-extremity FES systems may be useful. Our findings represent progress toward achieving human-machine teaming in control of upper-extremity FES systems for more natural arm movements based on human user preferences and RL algorithm learning capabilities.


international joint conference on artificial intelligence | 2018

Importance Sampling for Fair Policy Selection.

Shayan Doroudi; Philip S. Thomas; Emma Brunskill

We consider the problem of off-policy policy selection in reinforcement learning: using historical data generated from running one policy to compare two or more policies. We show that approaches based on importance sampling can be unfair—they can select the worse of two policies more often than not. We give two examples where the unfairness of importance sampling could be practically concerning. We then present sufficient conditions to theoretically guarantee fairness and a related notion of safety. Finally, we provide a practical importance sampling-based estimator to help mitigate one of the systematic sources of unfairness resulting from using importance sampling for policy selection.


IEEE Transactions on Neural Systems and Rehabilitation Engineering | 2017

Training an Actor-Critic Reinforcement Learning Controller for Arm Movement Using Human-Generated Rewards

Kathleen M. Jagodnik; Philip S. Thomas; Antonie J. van den Bogert; Michael S. Branicky; Robert F. Kirsch

Functional Electrical Stimulation (FES) employs neuroprostheses to apply electrical current to the nerves and muscles of individuals paralyzed by spinal cord injury to restore voluntary movement. Neuroprosthesis controllers calculate stimulation patterns to produce desired actions. To date, no existing controller is able to efficiently adapt its control strategy to the wide range of possible physiological arm characteristics, reaching movements, and user preferences that vary over time. Reinforcement learning (RL) is a control strategy that can incorporate human reward signals as inputs to allow human users to shape controller behavior. In this paper, ten neurologically intact human participants assigned subjective numerical rewards to train RL controllers, evaluating animations of goal-oriented reaching tasks performed using a planar musculoskeletal human arm simulation. The RL controller learning achieved using human trainers was compared with learning accomplished using human-like rewards generated by an algorithm; metrics included success at reaching the specified target; time required to reach the target; and target overshoot. Both sets of controllers learned efficiently and with minimal differences, significantly outperforming standard controllers. Reward positivity and consistency were found to be unrelated to learning success. These results suggest that human rewards can be used effectively to train RL-based FES controllers.


national conference on artificial intelligence | 2011

Value function approximation in reinforcement learning using the fourier basis

George Konidaris; Sarah Osentoski; Philip S. Thomas


international conference on machine learning | 2014

Bias in Natural Actor-Critic Algorithms

Philip S. Thomas


national conference on artificial intelligence | 2016

Increasing the action gap: new operators for reinforcement learning

Marc G. Bellemare; Georg Ostrovski; Arthur Guez; Philip S. Thomas; Rémi Munos


international conference on artificial intelligence | 2015

Personalized ad recommendation systems for life-time value optimization with guarantees

Georgios Theocharous; Philip S. Thomas; Mohammad Ghavamzadeh


innovative applications of artificial intelligence | 2009

Application of the Actor-Critic Architecture to Functional Electrical Stimulation Control of a Human Arm

Philip S. Thomas; Antonie J. van den Bogert; Kathleen M. Jagodnik; Michael S. Branicky


national conference on artificial intelligence | 2015

High confidence off-policy evaluation

Philip S. Thomas; Georgios Theocharous; Mohammad Ghavamzadeh

Collaboration


Dive into the Philip S. Thomas's collaboration.

Top Co-Authors

Avatar

Emma Brunskill

Carnegie Mellon University

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Kathleen M. Jagodnik

Icahn School of Medicine at Mount Sinai

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Scott Niekum

University of Massachusetts Amherst

View shared research outputs
Top Co-Authors

Avatar

Andrew G. Barto

University of Massachusetts Amherst

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Christoph Dann

Carnegie Mellon University

View shared research outputs
Researchain Logo
Decentralizing Knowledge