Kyriakos G. Vamvoudakis
Virginia Tech
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Kyriakos G. Vamvoudakis.
IEEE Control Systems Magazine | 2012
Frank L. Lewis; Draguna Vrabie; Kyriakos G. Vamvoudakis
This article describes the use of principles of reinforcement learning to design feedback controllers for discrete- and continuous-time dynamical systems that combine features of adaptive control and optimal control. Adaptive control [1], [2] and optimal control [3] represent different philosophies for designing feedback controllers. Optimal controllers are normally designed of ine by solving Hamilton JacobiBellman (HJB) equations, for example, the Riccati equation, using complete knowledge of the system dynamics. Determining optimal control policies for nonlinear systems requires the offline solution of nonlinear HJB equations, which are often difficult or impossible to solve. By contrast, adaptive controllers learn online to control unknown systems using data measured in real time along the system trajectories. Adaptive controllers are not usually designed to be optimal in the sense of minimizing user-prescribed performance functions. Indirect adaptive controllers use system identification techniques to first identify the system parameters and then use the obtained model to solve optimal design equations [1]. Adaptive controllers may satisfy certain inverse optimality conditions [4].
Automatica | 2013
Shubhendu Bhasin; Rushikesh Kamalapurkar; Marcus Johnson; Kyriakos G. Vamvoudakis; Frank L. Lewis; Warren E. Dixon
An online adaptive reinforcement learning-based solution is developed for the infinite-horizon optimal control problem for continuous-time uncertain nonlinear systems. A novel actor-critic-identifier (ACI) is proposed to approximate the Hamilton-Jacobi-Bellman equation using three neural network (NN) structures-actor and critic NNs approximate the optimal control and the optimal value function, respectively, and a robust dynamic neural network identifier asymptotically approximates the uncertain system dynamics. An advantage of using the ACI architecture is that learning by the actor, critic, and identifier is continuous and simultaneous, without requiring knowledge of system drift dynamics. Convergence of the algorithm is analyzed using Lyapunov-based adaptive control methods. A persistence of excitation condition is required to guarantee exponential convergence to a bounded region in the neighborhood of the optimal control and uniformly ultimately bounded (UUB) stability of the closed-loop system. Simulation results demonstrate the performance of the actor-critic-identifier method for approximate optimal control.
Automatica | 2011
Kyriakos G. Vamvoudakis; Frank L. Lewis
In this paper we present an online adaptive control algorithm based on policy iteration reinforcement learning techniques to solve the continuous-time (CT) multi player non-zero-sum (NZS) game with infinite horizon for linear and nonlinear systems. NZS games allow for players to have a cooperative team component and an individual selfish component of strategy. The adaptive algorithm learns online the solution of coupled Riccati equations and coupled Hamilton-Jacobi equations for linear and nonlinear systems respectively. This adaptive control method finds in real-time approximations of the optimal value and the NZS Nash-equilibrium, while also guaranteeing closed-loop stability. The optimal-adaptive algorithm is implemented as a separate actor/critic parametric network approximator structure for every player, and involves simultaneous continuous-time adaptation of the actor/critic networks. A persistence of excitation condition is shown to guarantee convergence of every critic to the actual optimal value function for that player. A detailed mathematical analysis is done for 2-player NZS games. Novel tuning algorithms are given for the actor/critic networks. The convergence to the Nash equilibrium is proven and stability of the system is also guaranteed. This provides optimal adaptive control solutions for both non-zero-sum games and their special case, the zero-sum games. Simulation examples show the effectiveness of the new algorithm.
international symposium on neural networks | 2009
Kyriakos G. Vamvoudakis; Frank L. Lewis
In this paper we discuss an online algorithm based on policy iteration for learning the continuous-time (CT) optimal control solution with infinite horizon cost for nonlinear systems with known dynamics. We present an online adaptive algorithm implemented as an actor/critic structure which involves simultaneous continuous-time adaptation of both actor and critic neural networks. We call this ‘synchronous’ policy iteration. A persistence of excitation condition is shown to guarantee convergence of the critic to the actual optimal value function. Novel tuning algorithms are given for both critic and actor networks, with extra terms in the actor tuning law being required to guarantee closed-loop dynamical stability. The convergence to the optimal controller is proven, and stability of the system is also guaranteed. Simulation examples show the effectiveness of the new algorithm.
IEEE/CAA Journal of Automatica Sinica | 2014
Kyriakos G. Vamvoudakis
This paper proposes a novel optimal adaptive event-triggered control algorithm for nonlinear continuous-time systems. The goal is to reduce the controller updates, by sampling the state only when an event is triggered to maintain stability and optimality. The online algorithm is implemented based on an actor/critic neural network structure. A critic neural network is used to approximate the cost and an actor neural network is used to approximate the optimal event-triggered controller. Since in the algorithm proposed there are dynamics that exhibit continuous evolutions described by ordinary differential equations and instantaneous jumps or impulses, we will use an impulsive system approach. A Lyapunov stability proof ensures that the closed-loop system is asymptotically stable. Finally, we illustrate the effectiveness of the proposed solution compared to a time-triggered controller.
mediterranean conference on control and automation | 2009
Draguna Vrabie; Kyriakos G. Vamvoudakis; Frank L. Lewis
In this paper we present two adaptive algorithms which offer solution to the continuous-time optimal control problem for nonlinear, affine in the inputs, time-invariant systems. Both algorithms were developed based on the Generalized Policy Iteration technique and involve adaptation of two neural network structures namely Actor, providing the control signal, and Critic, performing evaluation of the control performance. Despite the similarities, the two adaptive algorithms differ in the manner in which the adaptation takes place, required knowledge on the system dynamics, and formulation of the persistence of excitation requirement. The main difference is that one algorithm uses sequential adaptation of the actor and critic structures, i.e. while one is trained the other one is kept constant, while for the second algorithm the two neural networks are trained synchronously in a continuous-time fashion. The two algorithms are described in detail and proof of convergence is provided. Simulation results of applying the two algorithms for finding the optimal state feedback controller of a nonlinear system are also presented.
Automatica | 2014
Mohammed I. Abouheaf; Frank L. Lewis; Kyriakos G. Vamvoudakis; Sofie Haesaert; Robert Babuska
This paper introduces a new class of multi-agent discrete-time dynamic games, known in the literature as dynamic graphical games. For that reason a local performance index is defined for each agent that depends only on the local information available to each agent. Nash equilibrium policies and best-response policies are given in terms of the solutions to the discrete-time coupled Hamilton-Jacobi equations. Since in these games the interactions between the agents are prescribed by a communication graph structure we have to introduce a new notion of Nash equilibrium. It is proved that this notion holds if all agents are in Nash equilibrium and the graph is strongly connected. A novel reinforcement learning value iteration algorithm is given to solve the dynamic graphical games in an online manner along with its proof of convergence. The policies of the agents form a Nash equilibrium when all the agents in the neighborhood update their policies, and a best response outcome when the agents in the neighborhood are kept constant. The paper brings together discrete Hamiltonian mechanics, distributed multi-agent control, optimal control theory, and game theory to formulate and solve these multi-agent dynamic graphical games. A simulation example shows the effectiveness of the proposed approach in a leader-synchronization case along with optimality guarantees.
Archive | 2009
Kyriakos G. Vamvoudakis; Frank L. Lewis
In this chapter, we discuss an online algorithm based on policy iteration (PI) for learning the continuous-time (CT) optimal control solution for nonlinear systems with infinite horizon costs. We present an online adaptive algorithm implemented as an actor/critic structure which involves simultaneous continuous-time adaptation of both actor and critic neural networks. We call this “synchronous” PI. A persistence of excitation condition is shown to guarantee convergence of the critic to the actual optimal value function. Novel tuning algorithms are given for both critic and actor networks, with extra terms in the actor tuning law being required to guarantee closed-loop dynamical stability. The convergence to the optimal controller is proven, and stability of the system is also guaranteed. Simulation examples show the effectiveness of the new algorithm.
IEEE Transactions on Automatic Control | 2014
Kyriakos G. Vamvoudakis; João P. Hespanha; Bruno Sinopoli; Yilin Mo
We propose new game theoretic approaches to estimate a binary random variable based on sensor measurements that may have been corrupted by a cyber-attacker. The estimation problem is formulated as a zero-sum partial information game in which a detector attempts to minimize the probability of an estimation error and an attacker attempts to maximize this probability. While this problem can be solved exactly by reducing it to the computation of the value of a matrix, this approach is computationally feasible only for a small number of sensors. The two key results of this paper provide complementary computationally efficient solutions to the construction of the optimal detector. The first result provides an explicit formula for the optimal detector but it is only valid when the number of sensors is roughly smaller than two over the probability of sensor errors. In contrast, the detector provided by the second result is valid for an arbitrary number of sensor. While it may result in a probability of estimation error that is ϵ above the minimum achievable, we show that this error ϵ is small when the number of sensors is large, which is precisely the case for which the first result does not apply.
Automatica | 2016
Qiang Jiao; Hamidreza Modares; Shengyuan Xu; Frank L. Lewis; Kyriakos G. Vamvoudakis
This paper addresses distributed optimal tracking control of multi-agent linear systems subject to external disturbances. The concept of differential game theory is utilized to formulate this distributed control problem into a multi-player zero-sum differential graphical game, which provides a new perspective on distributed tracking of multiple agents influenced by disturbances. In the presented differential graphical game, the dynamics and performance indices for each node depend on local neighbor information and disturbances. It is shown that the solution to the multi-agent differential graphical games in the presence of disturbances requires the solution to coupled Hamilton-Jacobi-Isaacs (HJI) equations. Multi-agent learning policy iteration (PI) algorithm is provided to find the solution to these coupled HJI equations and its convergence is proven. It is also shown that L 2 -bounded synchronization errors can be guaranteed using this technique. An online PI algorithm is given to solve the zero-sum game in real time. A simulation example is provided to show the effectiveness of the online approach.