Proceedings of the First International Conference on Distributed Artificial Intelligence | 2019

An efficient reinforcement learning algorithm for learning deterministic policies in continuous domains

 
 

Abstract


In this paper, we present an improvement to an existing reinforcement learning algorithm that can learn very efficiently deterministic policies in continuous domains. It builds on two recently-proposed techniques. First, it can be seen as a variation of an actor-critic algorithm, called Penalized Neural-Fitted Actor Critic (PeNFAC) [24], which showed excellent experimental performance in the Roboschool environments. Second, it incorporates a better estimate for the value function of the current policy, called V-trace target [3], by allowing the reuse of off-policy data generated by recent previous policies. We experimentally compare two different implementations of V-trace: one based on n-step returns and the other on λ-returns. Finally, we show that our proposed algorithm can outperform several state-of-the-art algorithms (TD3, DDPG, PPO, PeNFAC, NFAC) over three environments of the Roboschool benchmark (Hopper, HalfCheetah, Humanoid).

Volume None
Pages None
DOI 10.1145/3356464.3357704
Language English
Journal Proceedings of the First International Conference on Distributed Artificial Intelligence

Full Text