Yuanheng Zhu | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Yuanheng Zhu is active.

Explore More

Publication

Featured researches published by Yuanheng Zhu.

IEEE Transactions on Systems, Man, and Cybernetics | 2016

Experience Replay for Optimal Control of Nonzero-Sum Game Systems With Unknown Dynamics

Dongbin Zhao; Qichao Zhang; Ding Wang; Yuanheng Zhu

In this paper, an approximate online equilibrium solution is developed for an N-player nonzero-sum (NZS) game systems with completely unknown dynamics. First, a model identifier based on a three-layer neural network (NN) is established to reconstruct the unknown NZS games systems. Moreover, the identifier weight vector is updated based on experience replay technique which can relax the traditional persistence of excitation condition to a simplified condition on recorded data. Then, the single-network adaptive dynamic programming (ADP) with experience replay algorithm is proposed for each player to solve the coupled nonlinear Hamilton-Jacobi (HJ) equations, where only the critic NN weight vectors are required to tune for each player. The feedback Nash equilibrium is provided by the solution of the coupled HJ equations. Based on the experience replay technique, a novel critic NN weights tuning law is proposed to guarantee the stability of the closed-loop system and the convergence of the value functions. Furthermore, a Lyapunov-based stability analysis shows that the uniform ultimate boundedness of the closed-loop system is achieved. Finally, two simulation examples are given to verify the effectiveness of the proposed control scheme.

Neurocomputing | 2014

Full-range adaptive cruise control based on supervised adaptive dynamic programming

Dongbin Zhao; Zhaohui Hu; Zhongpu Xia; Cesare Alippi; Yuanheng Zhu; Ding Wang

The paper proposes a supervised adaptive dynamic programming (SADP) algorithm for a full-range adaptive cruise control (ACC) system, which can be formulated as a dynamic programming problem with stochastic demands. The suggested ACC system has been designed to allow the host vehicle to drive both in highways and in Stop and Go (SG) urban scenarios. The ACC system can autonomously drive the host vehicle to a desired speed and/or a given distance from the target vehicle in both operational cases. Traditional adaptive dynamic programming (ADP) is a suitable tool to address the problem but training usually suffers from low convergence rates and hardly achieves an effective controller. A SADP algorithm which introduces the concept of inducing region is here introduced to overcome such training drawbacks. The SADP algorithm performs very well in all simulation scenarios and always better than more traditional controllers. The conclusion is that the proposed SADP algorithm is an effective control methodology able to effectively address the full-range ACC problem.

IEEE Transactions on Neural Networks | 2015

MEC—A Near-Optimal Online Reinforcement Learning Algorithm for Continuous Deterministic Systems

Dongbin Zhao; Yuanheng Zhu

In this paper, the first probably approximately correct (PAC) algorithm for continuous deterministic systems without relying on any system dynamics is proposed. It combines the state aggregation technique and the efficient exploration principle, and makes high utilization of online observed samples. We use a grid to partition the continuous state space into different cells to save samples. A near-upper Q operator is defined to produce a near-upper Q function using samples in each cell. The corresponding greedy policy effectively balances between exploration and exploitation. With the rigorous analysis, we prove that there is a polynomial time bound of executing nonoptimal actions in our algorithm. After finite steps, the final policy reaches near optimal in the framework of PAC. The implementation requires no knowledge of systems and has less computation complexity. Simulation studies confirm that it is a better performance than other similar PAC algorithms.

IEEE Transactions on Neural Networks | 2017

Iterative Adaptive Dynamic Programming for Solving Unknown Nonlinear Zero-Sum Game Based on Online Data

Yuanheng Zhu; Dongbin Zhao; Xiangjun Li

IEEE Transactions on Industrial Electronics | 2017

Event-Triggered Optimal Control for Partially Unknown Constrained-Input Systems via Adaptive Dynamic Programming

Yuanheng Zhu; Dongbin Zhao; Haibo He; Junhong Ji

H_\infty

Neurocomputing | 2015

Convergence analysis and application of fuzzy-HDP for nonlinear discrete-time HJB systems

Yuanheng Zhu; Dongbin Zhao; Derong Liu

control is a powerful method to solve the disturbance attenuation problems that occur in some control systems. The design of such controllers relies on solving the zero-sum game (ZSG). But in practical applications, the exact dynamics is mostly unknown. Identification of dynamics also produces errors that are detrimental to the control performance. To overcome this problem, an iterative adaptive dynamic programming algorithm is proposed in this paper to solve the continuous-time, unknown nonlinear ZSG with only online data. A model-free approach to the Hamilton–Jacobi–Isaacs equation is developed based on the policy iteration method. Control and disturbance policies and value are approximated by neural networks (NNs) under the critic–actor–disturber structure. The NN weights are solved by the least-squares method. According to the theoretical analysis, our algorithm is equivalent to a Gauss–Newton method solving an optimization problem, and it converges uniformly to the optimal solution. The online data can also be used repeatedly, which is highly efficient. Simulation results demonstrate its feasibility to solve the unknown nonlinear ZSG. When compared with other algorithms, it saves a significant amount of online measurement time.

Neural Computing and Applications | 2015

A data-based online reinforcement learning algorithm satisfying probably approximately correct principle

Yuanheng Zhu; Dongbin Zhao

Event-triggered control has been an effective tool in dealing with problems with finite communication and computation resources. In this paper, we design an event-triggered control for nonlinear constrained-input continuous-time systems based on the optimal policy. Constraints on controls are handled using a bounded function. To learn the optimal solution with partially unknown dynamics, an online adaptive dynamic programming algorithm is proposed. The identifier network, the critic network, and the actor network are employed to approximate the unknown drift dynamics, the optimal value, and the optimal policy, respectively. The identifier is tuned based on online data, which further trains the critic and actor at triggering instants. A concurrent learning technique repeatedly uses past data to train the critic. Stability of the closed-loop system, and convergence of neural networks to the optimal solutions are proved by Lyapunov analysis. In the end, the algorithm is applied to the overhead crane system to observe the performance. The event-triggered optimal controller with constraints stabilizes the system and consumes much less sampling times.

international symposium on neural networks | 2012

Neural and fuzzy dynamic programming for under-actuated systems

Dongbin Zhao; Yuanheng Zhu; Haibo He

In this paper, a type of fuzzy system structure is applied to heuristic dynamic programming (HDP) algorithm to solve nonlinear discrete-time Hamilton-Jacobi-Bellman (DT-HJB) problems. The fuzzy system here is adopted as a 0-order T-S fuzzy system using triangle membership functions (MFs). The convergence of HDP and approximability of the multivariate 0-order T-S fuzzy system is analyzed in this paper. It is derived that the cost function and control policy of HDP can be iterated to the DT-HJB solution and optimal policy. The multivariate 0-order T-S (Tanaka-Sugeno) fuzzy system using triangle MFs is proven as a universal approximator, to guarantee the convergence of the Fuzzy-HDP mechanism. Some simulations are implemented to observe the performance of the proposed method both in mathematical solution and practical issue. It is concluded that Fuzzy-HDP outperforms traditional optimal control in more complex systems.

ieee symposium series on computational intelligence | 2016

Deep reinforcement learning with experience replay based on SARSA

Dongbin Zhao; Haitao Wang; Kun Shao; Yuanheng Zhu

This paper proposes a probably approximately correct (PAC) algorithm that directly utilizes online data efficiently to solve the optimal control problem of continuous deterministic systems without system parameters for the first time. The dependence on some specific approximation structures is crucial to limit the wide application of online reinforcement learning (RL) algorithms. We utilize the online data directly with the kd-tree technique to remove this limitation. Moreover, we design the algorithm in the PAC principle. Complete theoretical proofs are presented, and three examples are simulated to verify its good performance. It draws the conclusion that the proposed RL algorithm specifies the maximum running time to reach a near-optimal control policy with only online data.

Neurocomputing | 2017

Data-driven adaptive dynamic programming for continuous-time fully cooperative games with partially constrained inputs

Qichao Zhang; Dongbin Zhao; Yuanheng Zhu

This paper aims to integrate the fuzzy control with adaptive dynamic programming (ADP) scheme, to provide an optimized fuzzy control performance, together with faster convergence of ADP for the help of the fuzzy prior knowledge. ADP usually consists of two neural networks, one is the Actor as the controller, the other is the Critic as the performance evaluator. A fuzzy controller applied in many fields can be used instead as the Actor to speed up the learning convergence, because of its simplicity and prior information on fuzzy membership and rules. The parameters of the fuzzy rules are learned by ADP scheme to approach optimal control performance. The feature of fuzzy controller makes the system steady and robust to system states and uncertainties. Simulations on under-actuated systems, a cart-pole plant and a pendubot plant, are implemented. It is verified that the proposed scheme is capable of balancing under-actuated systems and has a wider control zone.

Explore More