Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Biao Luo is active.

Publication


Featured researches published by Biao Luo.


IEEE Transactions on Systems, Man, and Cybernetics | 2015

Off-Policy Reinforcement Learning for

Biao Luo; Huai-Ning Wu; Tingwen Huang

The H∞ control design problem is considered for nonlinear systems with unknown internal system model. It is known that the nonlinear H∞ control problem can be transformed into solving the so-called Hamilton-Jacobi-Isaacs (HJI) equation, which is a nonlinear partial differential equation that is generally impossible to be solved analytically. Even worse, model-based approaches cannot be used for approximately solving HJI equation, when the accurate system model is unavailable or costly to obtain in practice. To overcome these difficulties, an off-policy reinforcement leaning (RL) method is introduced to learn the solution of HJI equation from real system data instead of mathematical system model, and its convergence is proved. In the off-policy RL method, the system data can be generated with arbitrary policies rather than the evaluating policy, which is extremely important and promising for practical systems. For implementation purpose, a neural network (NN)-based actor-critic structure is employed and a least-square NN weight update algorithm is derived based on the method of weighted residuals. Finally, the developed NN-based off-policy RL method is tested on a linear F16 aircraft plant, and further applied to a rotational/translational actuator system.


IEEE Transactions on Neural Networks | 2012

H_\infty

Huai-Ning Wu; Biao Luo

It is well known that the nonlinear H∞ state feedback control problem relies on the solution of the Hamilton-Jacobi-Isaacs (HJI) equation, which is a nonlinear partial differential equation that has proven to be impossible to solve analytically. In this paper, a neural network (NN)-based online simultaneous policy update algorithm (SPUA) is developed to solve the HJI equation, in which knowledge of internal system dynamics is not required. First, we propose an online SPUA which can be viewed as a reinforcement learning technique for two players to learn their optimal actions in an unknown environment. The proposed online SPUA updates control and disturbance policies simultaneously; thus, only one iterative loop is needed. Second, the convergence of the online SPUA is established by proving that it is mathematically equivalent to Newtons method for finding a fixed point in a Banach space. Third, we develop an actor-critic structure for the implementation of the online SPUA, in which only one critic NN is needed for approximating the cost function, and a least-square method is given for estimating the NN weight parameters. Finally, simulation studies are provided to demonstrate the effectiveness of the proposed algorithm.


IEEE Transactions on Neural Networks | 2015

Control Design

Biao Luo; Huai-Ning Wu; Han-Xiong Li

Highly dissipative nonlinear partial differential equations (PDEs) are widely employed to describe the system dynamics of industrial spatially distributed processes (SDPs). In this paper, we consider the optimal control problem of the general highly dissipative SDPs, and propose an adaptive optimal control approach based on neuro-dynamic programming (NDP). Initially, Karhunen-Loève decomposition is employed to compute empirical eigenfunctions (EEFs) of the SDP based on the method of snapshots. These EEFs together with singular perturbation technique are then used to obtain a finite-dimensional slow subsystem of ordinary differential equations that accurately describes the dominant dynamics of the PDE system. Subsequently, the optimal control problem is reformulated on the basis of the slow subsystem, which is further converted to solve a Hamilton-Jacobi-Bellman (HJB) equation. HJB equation is a nonlinear PDE that has proven to be impossible to solve analytically. Thus, an adaptive optimal control method is developed via NDP that solves the HJB equation online using neural network (NN) for approximating the value function; and an online NN weight tuning law is proposed without requiring an initial stabilizing control policy. Moreover, by involving the NN estimation error, we prove that the original closed-loop PDE system with the adaptive optimal control policy is semiglobally uniformly ultimately bounded. Finally, the developed method is tested on a nonlinear diffusion-convection-reaction process and applied to a temperature cooling fin of high-speed aerospace vehicle, and the achieved results show its effectiveness.


systems man and cybernetics | 2012

Neural Network Based Online Simultaneous Policy Update Algorithm for Solving the HJI Equation in Nonlinear

Biao Luo; Huai-Ning Wu

This paper addresses the approximate optimal control problem for a class of parabolic partial differential equation (PDE) systems with nonlinear spatial differential operators. An approximate optimal control design method is proposed on the basis of the empirical eigenfunctions (EEFs) and neural network (NN). First, based on the data collected from the PDE system, the Karhunen-Loève decomposition is used to compute the EEFs. With those EEFs, the PDE system is formulated as a high-order ordinary differential equation (ODE) system. To further reduce its dimension, the singular perturbation (SP) technique is employed to derive a reduced-order model (ROM), which can accurately describe the dominant dynamics of the PDE system. Second, the Hamilton-Jacobi-Bellman (HJB) method is applied to synthesize an optimal controller based on the ROM, where the closed-loop asymptotic stability of the high-order ODE system can be guaranteed by the SP theory. By dividing the optimal control law into two parts, the linear part is obtained by solving an algebraic Riccati equation, and a new type of HJB-like equation is derived for designing the nonlinear part. Third, a control update strategy based on successive approximation is proposed to solve the HJB-like equation, and its convergence is proved. Furthermore, an NN approach is used to approximate the cost function. Finally, we apply the developed approximate optimal control method to a diffusion-reaction process with a nonlinear spatial operator, and the simulation results illustrate its effectiveness.


Neural Networks | 2015

H_{\infty}

Biao Luo; Huai-Ning Wu; Tingwen Huang; Derong Liu

The constrained optimal control problem depends on the solution of the complicated Hamilton-Jacobi-Bellman equation (HJBE). In this paper, a data-based off-policy reinforcement learning (RL) method is proposed, which learns the solution of the HJBE and the optimal control policy from real system data. One important feature of the off-policy RL is that its policy evaluation can be realized with data generated by other behavior policies, not necessarily the target policy, which solves the insufficient exploration problem. The convergence of the off-policy RL is proved by demonstrating its equivalence to the successive approximation approach. Its implementation procedure is based on the actor-critic neural networks structure, where the function approximation is conducted with linearly independent basis functions. Subsequently, the convergence of the implementation procedure with function approximation is also proved. Finally, its effectiveness is verified through computer simulations.


IEEE Transactions on Neural Networks | 2016

Control

Biao Luo; Derong Liu; Tingwen Huang; Ding Wang

Model-free control is an important and promising topic in control fields, which has attracted extensive attention in the past few years. In this paper, we aim to solve the model-free optimal tracking control problem of nonaffine nonlinear discrete-time systems. A critic-only Q-learning (CoQL) method is developed, which learns the optimal tracking control from real system data, and thus avoids solving the tracking Hamilton-Jacobi-Bellman equation. First, the Q-learning algorithm is proposed based on the augmented system, and its convergence is established. Using only one neural network for approximating the Q-function, the CoQL method is developed to implement the Q-learning algorithm. Furthermore, the convergence of the CoQL method is proved with the consideration of neural network approximation error. With the convergent Q-function obtained from the CoQL method, the adaptive optimal tracking control is designed based on the gradient descent scheme. Finally, the effectiveness of the developed CoQL method is demonstrated through simulation studies. The developed CoQL method learns with off-policy data and implements with a critic-only structure, thus it is easy to realize and overcome the inadequate exploration problem.


International Journal of Systems Science | 2016

Adaptive Optimal Control of Highly Dissipative Nonlinear Spatially Distributed Processes With Neuro-Dynamic Programming

Hongwen Ma; Derong Liu; Ding Wang; Biao Luo

ABSTRACT In this paper, we concentrate on investigating bipartite output consensus in networked multi-agent systems of high-order power integrators. Systems with power integrator are ubiquitous among weakly coupled, unstable and underactuated mechanical systems. In the presence of input noises, an adaptive disturbance compensator and a technique of adding power integrator are introduced to the complex nonlinear multi-agent systems to reduce the deterioration of system performance. Additionally, due to the existence of negative communication weights among agents, whether bipartite output consensus of high-order power integrators can be achieved remains unknown. Therefore, it is of great importance to study this issue. The underlying idea of designing the distributed controller is to combine the output information of each agent itself and its neighbours, the state feedback within its internal system and input adaptive noise compensator all together. When the signed digraph is structurally balanced, bipartite output consensus can be reached. Finally, numerical simulations are provided to verify the validity of the developed criteria.


Neural Networks | 2017

Approximate Optimal Control Design for Nonlinear One-Dimensional Parabolic PDE Systems Using Empirical Eigenfunctions and Neural Network

Zhanyu Yang; Biao Luo; Derong Liu; Yueheng Li

In this paper, the synchronization of memristor-based neural networks with time-varying delays via pinning control is investigated. A novel pinning method is introduced to synchronize two memristor-based neural networks which denote drive system and response system, respectively. The dynamics are studied by theories of differential inclusions and nonsmooth analysis. In addition, some sufficient conditions are derived to guarantee asymptotic synchronization and exponential synchronization of memristor-based neural networks via the presented pinning control. Furthermore, some improvements about the proposed control method are also discussed in this paper. Finally, the effectiveness of the obtained results is demonstrated by numerical simulations.


Information Sciences | 2017

Reinforcement learning solution for HJB equation arising in constrained optimal control problem

Biao Luo; Derong Liu; Tingwen Huang; Xiong Yang; Hongwen Ma

Abstract Policy iteration and value iteration are two main iterative adaptive dynamic programming frameworks for solving optimal control problems. Policy iteration converges fast while requiring an initial stabilizing control policy, which is a strict constraint in practice. Value iteration avoids the requirement of initial admissible control policy while converging much slowly. This paper tries to utilize the advantages of policy iteration and value iteration, and avoids their drawbacks at the same time. Therefore, a multi-step heuristic dynamic programming (MsHDP) method is developed for solving the optimal control problem of nonlinear discrete-time systems. MsHDP speeds up value iteration and avoids the requirement of initial admissible control policy in policy iteration at the same time. The convergence theory of MsHDP is established by proving that it converges to the solution of the Bellman equation. For implementation purpose, the actor-critic neural network (NN) structure is developed. The critic NN is employed to estimate the value function and its NN weight vector is computed with a least-square scheme. The actor NN is used to estimate the control policy and a gradient descent method is proposed for updating its NN weight vector. According to the comparative simulation studies on two examples, the effectiveness and advantages of MsHDP are verified.


International Journal of Intelligent Computing and Cybernetics | 2013

Model-Free Optimal Tracking Control via Critic-Only Q-Learning

Chao Guo; Huai-Ning Wu; Biao Luo; Lei Guo

Purpose – The air‐breathing hypersonic vehicle (AHV) includes intricate inherent coupling between the propulsion system and the airframe dynamics, which results in an intractable nonlinear system for the controller design. The purpose of this paper is to propose an H∞ control method for AHV based on the online simultaneous policy update algorithm (SPUA).Design/methodology/approach – Initially, the H∞ state feedback control problem of the AHV is converted to the problem of solving the Hamilton‐Jacobi‐Isaacs (HJI) equation, which is notoriously difficult to solve both numerically and analytically. To overcome this difficulty, the online SPUA is introduced to solve the HJI equation without requiring the accurate knowledge of the internal system dynamics. Subsequently, the online SPUA is implemented on the basis of an actor‐critic structure, in which neural network (NN) is employed for approximating the cost function and a least‐square method is used to calculate the NN weight parameters.Findings – Simulation...

Collaboration


Dive into the Biao Luo's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar

Derong Liu

Chinese Academy of Sciences

View shared research outputs
Top Co-Authors

Avatar

Ding Wang

Chinese Academy of Sciences

View shared research outputs
Top Co-Authors

Avatar

Zhanyu Yang

University of Science and Technology Beijing

View shared research outputs
Top Co-Authors

Avatar

Yueheng Li

University of Science and Technology Beijing

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Hongwen Ma

Chinese Academy of Sciences

View shared research outputs
Top Co-Authors

Avatar

Han-Xiong Li

City University of Hong Kong

View shared research outputs
Top Co-Authors

Avatar

Chao Li

Chinese Academy of Sciences

View shared research outputs
Researchain Logo
Decentralizing Knowledge