Frank L. Lewis | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Frank L. Lewis is active.

Explore More

Publication

Featured researches published by Frank L. Lewis.

Archive | 2012

Synchronous online learning for multiplayer non-zero-sum games

Draguna Vrabie; Kyriakos G. Vamvoudakis; Frank L. Lewis

This chapter shows how to solve multiplayer non-zero-sum (NZS) games online using novel adaptive control structures based on reinforcement learning. For the most part, interest in the control systems community has been in the (non-cooperative) zero-sum games, which provide the solution of the H-infinity robust control problem. However, dynamic team games may have some cooperative objectives and some selfish objectives among the players. This cooperative/ non-cooperative balance is captured in the NZS games, as detailed herein.

Archive | 2012

Integral reinforcement learning (IRL) for non-linear continuous-time systems

Draguna Vrabie; Kyriakos G. Vamvoudakis; Frank L. Lewis

This chapter presents an adaptive method based on actor-critic reinforcement learning (RL) for solving online the optimal control problem for non-linear continuous-time systems in the state space form x(t)=f(x)+g(x)u(t). The algorithm, first presented in Vrabie et al. (2008, 2009), Vrabie (2009), Vrabie and Lewis (2009), solves the optimal control problem without requiring knowledge of the drift dynamics f(x). The method is based on policy iteration (PI), a RL algorithm that iterates between the steps of policy evaluation and policy improvement. The PI method starts by evaluating the cost of a given admissible initial policy and then uses this information to obtain a new control policy, which is improved in the sense of having a smaller associated cost compared with the previous policy. These two steps are repeated until the policy improvement step no longer changes the present policy, indicating that the optimal control behavior has been obtained.

Archive | 2012

Optimal adaptive control using integral reinforcement learning for linear systems

Draguna Vrabie; Kyriakos G. Vamvoudakis; Frank L. Lewis

This chapter presented a new policy iteration technique that solves the continuous time LQR problem online without using knowledge about the systems internal dynamics (system matrix A). The algorithm was derived by writing the value function in integral reinforcement form to yield a new form of Bellman equation for CT systems. This allows the derivation of an integral reinforcement learning (IRL) algorithm, which is an adaptive controller that converges online to the solution of the optimal LQR controller. IRL is based on an adaptive critic scheme in which the actor performs continuous-time control while the critic incrementally corrects the actors behavior at discrete moments in time until best performance is obtained. The critic evaluates the actor performance over a period of time and formulates it in a parameterized form. Based on the critics evaluation the actor behavior policy is updated for improved control performance.

Archive | 2012

Value iteration for continuous-time systems

Draguna Vrabie; Kyriakos G. Vamvoudakis; Frank L. Lewis

The idea of value iteration has been applied to the online learning of optimal controllers for discrete-time (DT) systems for many years. In the work of Werbos (1974, 1989, 1991, 1992, 2009) a family of DT learning control algorithms based on value iteration ideas has been developed. These techniques are known as approximate dynamic programming or adaptive dynamic programming (ADP). ADP includes heuristic dynamic programming (HDP) (which is value iteration), dual heuristic programming and action-based variants of those algorithms, which are equivalent to Q learning for DT dynamical system xk+1 = f (xk) + g(xk)uk. Value iteration algorithms rely on the special form of the DT Bellman equation V(xk) = r(xk, uk) + yV(xk+1), with r(xk, uk) the utility or stage cost of the value function. This equation has two occurrences of the value function evaluated at two times k and k + 1 and does not depend on the system dynamics f (xk),g(xk).

Archive | 2012

Integral reinforcement learning for zero-sum two-player games

Draguna Vrabie; Kyriakos G. Vamvoudakis; Frank L. Lewis

In this chapter we present a continuous-time adaptive dynamic programming (ADP) procedure that uses the idea of integral reinforcement learning (IRL) to find online the Nash-equilibrium solution for the two-player zero-sum (ZS) differential game. We consider continuous-time (CT) linear dynamics of the form x= Ax + B1w + B2u, where u(t), w(t) are the control actions of the two players, and an infinite-horizon quadratic cost. This work is from Vrabie and Lewis (2010).

2012 6th IEEE Multi-Conference on Systems and Control, MSC 2012 | 2012

2012 IEEE Multi-Conference on Systems and Control, MSC 2012

Kyriakos G. Vamvoudakis; Frank L. Lewis

In this paper we introduce an online algorithm that uses integral reinforcement knowledge for learning the continuous-time Nash game (zero-sum and non-zero-sum) solution for nonlinear systems with infinite horizon costs and partial knowledge of the system dynamics. This algorithm is a data based approach to the solution of the coupled Hamilton-Jacobi equations and it does not require explicit knowledge on the systems drift dynamics. A novel adaptive control algorithm is given that is based on policy iteration and implemented using an actor/critic structure for every player in the game having 2N adaptive approximator structures. All 2N approximation networks are adapted simultaneously. Novel adaptive control tuning algorithms are given for the critic and actor networks. The convergence to the Nash solution of the game is proven, and stability of the system is also guaranteed. Simulation examples support the theoretical result.

Archive | 2011

Integral Reinforcement Learning for Finding Online the Feedback Nash Equilibrium of Nonzero-Sum Differential Games

Draguna Vrabie; Frank L. Lewis

Adaptive/Approximate Dynamic Programming (ADP) is the class of methods that provide online solution to optimal control problems while making use of measured information from the system and using computation in a forward in time fashion, as opposed to the backward in time procedure that is characterizing the classical Dynamic Programming approach (Bellman, 2003). These methods were initially developed for systems with finite state and action spaces and are based on Sutton’s temporal difference learning (Sutton, 1988), Werbos’ Heuristic Dynamic Programming (HDP) (Werbos, 1992), and Watkins’ Qlearning (Watkins, 1989). The applicability of these online learning methods to real world problems is enabled by approximation tools and theory. The value that is associated with a given admissible control policy will be determined using value function approximation, online learning techniques, and data measured from the system. A control policy is determined based on the information on the control performance encapsulated in the value function approximator. Given the universal approximation property of neural networks (Hornik et al., 1990), they are generally used in the reinforcement learning literature for representation of value functions (Werbos, 1992), (Bertsekas and Tsitsiklis, 1996), (Prokhorov and Wunsch, 1997), (Hanselmann et al., 2007). Another type of approximation structure is a linear combination of a basis set of functions and it has been used in (Beard et al., 1997), (Abu-Khalaf et al., 2006), (Vrabie et al. 2009). The approximation structure used for performance estimation, endowed with learning capabilities, is often referred to as a critic. Critic structures provide performance information to the control structure that computes the input of the system. The performance information from the critic is used in learning procedures to determine improved action policies. The methods that make use of critic structures to determine online optimal behaviour strategies are also referred to as adaptive critics (Prokhorov and Wunsch, 1997), (Al-Tamimi et al., 2007), (Kulkarni & Venayagamoorthy, 2010). Most of the previous research on continuous-time reinforcement learning algorithms that provide an online approach to the solution of optimal control problems, assumed that the dynamical system is affected only by a single control strategy. In a game theory setup, the controlled system is affected by a number of control inputs, computed by different controllers

conference on decision and control | 1993

A numerical algorithm for the solution of generalized Lyapunov equations

Pradeep Misra; Vassilis Syrmos; Frank L. Lewis

This paper studies the numerical solution of generalized Lyapunov equation arising in the study of stability of continuous and discrete time descriptor (generalized state space or implicit) systems. The algorithms proposed in this paper are shown to be computationally efficient and numerically backward stable. Numerical examples are presented to illustrate the performance of proposed algorithm.<<ETX>>

Archive | 1986