Science China Information Sciences | 2019

Online adaptive Q-learning method for fully cooperative linear quadratic dynamic games

 
 
 
 
 

Abstract


A model-based offline policy iteration (PI) algorithm and a model-free online Q-learning algorithm are proposed for solving fully cooperative linear quadratic dynamic games. The PI-based adaptive Q-learning method can learn the feedback Nash equilibrium online using the state samples generated by behavior policies, without sending inquiries to the system model. Unlike the existing Q-learning methods, this novel Q-learning algorithm executes both policy evaluation and policy improvement in an adaptive manner. We prove the convergence of the offline PI algorithm by proving its equivalence to Newton’s method while solving the game algebraic Riccati equation (GARE). Furthermore, we prove that the proposed Q-learning method will converge to the Nash equilibrium under a small learning rate if the method satisfies certain persistence of excitation conditions, which can be easily met by suitable behavior policies. Our simulation results demonstrate the good performance of the proposed online adaptive Q-learning algorithm.

Volume 62
Pages None
DOI 10.1007/s11432-018-9865-9
Language English
Journal Science China Information Sciences

Full Text