Bahare Kiumarsi
University of Texas at Arlington
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Bahare Kiumarsi.
Automatica | 2014
Bahare Kiumarsi; Frank L. Lewis; Hamidreza Modares; Ali Karimpour; Mohammad Bagher Naghibi-Sistani
Abstract In this paper, a novel approach based on the Q -learning algorithm is proposed to solve the infinite-horizon linear quadratic tracker (LQT) for unknown discrete-time systems in a causal manner. It is assumed that the reference trajectory is generated by a linear command generator system. An augmented system composed of the original system and the command generator is constructed and it is shown that the value function for the LQT is quadratic in terms of the state of the augmented system. Using the quadratic structure of the value function, a Bellman equation and an augmented algebraic Riccati equation (ARE) for solving the LQT are derived. In contrast to the standard solution of the LQT, which requires the solution of an ARE and a noncausal difference equation simultaneously, in the proposed method the optimal control input is obtained by only solving an augmented ARE. A Q -learning algorithm is developed to solve online the augmented ARE without any knowledge about the system dynamics or the command generator. Convergence to the optimal solution is shown. A simulation example is used to verify the effectiveness of the proposed control scheme.
IEEE Transactions on Neural Networks | 2015
Bahare Kiumarsi; Frank L. Lewis
This paper presents a partially model-free adaptive optimal control solution to the deterministic nonlinear discrete-time (DT) tracking control problem in the presence of input constraints. The tracking error dynamics and reference trajectory dynamics are first combined to form an augmented system. Then, a new discounted performance function based on the augmented system is presented for the optimal nonlinear tracking problem. In contrast to the standard solution, which finds the feedforward and feedback terms of the control input separately, the minimization of the proposed discounted performance function gives both feedback and feedforward parts of the control input simultaneously. This enables us to encode the input constraints into the optimization problem using a nonquadratic performance function. The DT tracking Bellman equation and tracking Hamilton-Jacobi-Bellman (HJB) are derived. An actor-critic-based reinforcement learning algorithm is used to learn the solution to the tracking HJB equation online without requiring knowledge of the system drift dynamics. That is, two neural networks (NNs), namely, actor NN and critic NN, are tuned online and simultaneously to generate the optimal bounded control policy. A simulation example is given to show the effectiveness of the proposed method.
IEEE Transactions on Systems, Man, and Cybernetics | 2015
Bahare Kiumarsi; Frank L. Lewis; Mohammad Bagher Naghibi-Sistani; Ali Karimpour
In this paper, an output-feedback solution to the infinite-horizon linear quadratic tracking (LQT) problem for unknown discrete-time systems is proposed. An augmented system composed of the system dynamics and the reference trajectory dynamics is constructed. The state of the augmented system is constructed from a limited number of measurements of the past input, output, and reference trajectory in the history of the augmented system. A novel Bellman equation is developed that evaluates the value function related to a fixed policy by using only the input, output, and reference trajectory data from the augmented system. By using approximate dynamic programming, a class of reinforcement learning methods, the LQT problem is solved online without requiring knowledge of the augmented system dynamics only by measuring the input, output, and reference trajectory from the augmented system. We develop both policy iteration (PI) and value iteration (VI) algorithms that converge to an optimal controller that require only measuring the input, output, and reference trajectory data. The convergence of the proposed PI and VI algorithms is shown. A simulation example is used to verify the effectiveness of the proposed control scheme.
Neurocomputing | 2015
Bahare Kiumarsi; Frank L. Lewis; Daniel S. Levine
In this paper motivated by recently discovered neurocognitive models of mechanisms in the brain, a new reinforcement learning (RL) method is presented based on a novel critic neural network (NN) structure to solve the optimal tracking problem of a nonlinear discrete time-varying system in an online manner. A multiple-model approach combined with an adaptive self-organizing map (ASOM) neural network is used to detect changes in the dynamics of the system. The number of sub-models is determined adaptively and grows once a mismatch between the stored sub-models and the new data is detected. By using the ASOM neural network, a novel value function approximation (VFA) scheme is presented. Each sub-model contributes into the value function based on a responsibility signal obtained by the ASOM. The responsibility signal determines how much each sub-model contributes to the general value function. Novel policy iteration and the value iteration algorithms are presented to find the optimal control for the partially-unknown nonlinear discrete time-varying systems in an online manner. Simulation results demonstrate the effectiveness of the proposed control scheme.
advances in computing and communications | 2015
Kyriakos G. Vamvoudakis; Panos J. Antsaklis; Warren E. Dixon; João P. Hespanha; Frank L. Lewis; Hamidreza Modares; Bahare Kiumarsi
This tutorial paper will discuss the development of novel state-of-the-art control approaches and theory for complex systems based on machine intelligence in order to enable full autonomy. Given the presence of modeling uncertainties, the unavailability of the model, the possibility of cooperative/non-cooperative goals and malicious attacks compromising the security of teams of complex systems, there is a need for approaches that respond to situations not programmed or anticipated in design. Unfortunately, existing schemes for complex systems do not take into account recent advances of machine intelligence. We shall discuss on how to be inspired by the human brain and combine interdisciplinary ideas from different fields, i.e. computational intelligence, game theory, control theory, and information theory to develop new self-configuring algorithms for decision and control given the unavailability of model, the presence of enemy components and the possibility of network attacks. Due to the adaptive nature of the algorithms, the complex systems will be capable of breaking or splitting into parts that are themselves autonomous and resilient. The algorithms discussed will be characterized by strong abilities of learning and adaptivity. As a result, the complex systems will be fully autonomous, and tolerant to communication failures.
Journal of Dynamic Systems Measurement and Control-transactions of The Asme | 2012
Ahmad Darabi; Alireza Alfi; Bahare Kiumarsi; Hamidreza Modares
Winding inductances of an exciter machine of brushless generator normally consist of nonsinusoidal terms versus rotor position angle, so evaluations of the inductances necessitate detailed modeling and complicated parameter identification procedures. In this paper, an adaptive particle swarm optimization (APSO), which is a novel heuristic computation technique, is proposed to identify parameters of an exciter machine. The proposed approach evaluates the model parameters just knowing the main field impedance, measured exciter field voltage and current. APSO is employed to solve the optimization problem of minimizing the difference between output quantities (exciter field current) of the model and real systems. Two modifications are incorporated into the conventional particle swarm optimization (PSO) scheme that prevents local convergence and provides excellent quality of final result. Performance of the proposed APSO is compared with those of the real-coded genetic algorithm (GA) and PSO with linearly decreasing inertia weight (LDW-PSO), in terms of the parameter accuracy and convergence speed. Simulation results illustrated in the paper show that the proposed APSO is more successful in comparison with LDW-PSO and GA. [DOI: 10.1115/1.4005371]
IEEE Control Systems Magazine | 2017
Kyriakos G. Vamvoudakis; Hamidreza Modares; Bahare Kiumarsi; Frank L. Lewis
Complex human-engineered systems involve an interconnection of multiple decision makers (or agents) whose collective behavior depends on a compilation of local decisions that are based on partial information about each other and the state of the environment [1]-[4]. Strategic interactions among agents in these systems can be modeled as a multiplayer simultaneous-move game [5]-[8]. The agents involved can have conflicting objectives, and it is natural to make decisions based upon optimizing individual payoffs or costs.
Automatica | 2017
Bahare Kiumarsi; Frank L. Lewis; Zhong Ping Jiang
Abstract In this paper, a model-free solution to the H ∞ control of linear discrete-time systems is presented. The proposed approach employs off-policy reinforcement learning (RL) to solve the game algebraic Riccati equation online using measured data along the system trajectories. Like existing model-free RL algorithms, no knowledge of the system dynamics is required. However, the proposed method has two main advantages. First, the disturbance input does not need to be adjusted in a specific manner. This makes it more practical as the disturbance cannot be specified in most real-world applications. Second, there is no bias as a result of adding a probing noise to the control input to maintain persistence of excitation (PE) condition. Consequently, the convergence of the proposed algorithm is not affected by probing noise. An example of the H ∞ control for an F-16 aircraft is given. It is seen that the convergence of the new off-policy RL algorithm is insensitive to probing noise.
robotics automation and mechatronics | 2015
Bahare Kiumarsi; Hamidreza Modares; Frank L. Lewis; Zhong Ping Jiang
This paper proposes a model-free H∞ control design for linear discrete-time systems using reinforcement learning (RL). A novel off-policy RL algorithm is used to solve the game algebraic Riccati equation (GARE) online using the measured data along the system trajectories. The proposed RL algorithm has the following advantages compared to existing model-free RL methods for solving H∞ control problem: 1) It is data efficient and fast since a stream of experiences which is obtained from executing a fixed behavioral policy is reused to update many value functions correspond to different leaning policies sequentially. 2) The disturbance input does not need to be adjusted in a specific manner. 3) There is no bias as a result of adding a probing noise to the control input to maintain persistence of excitation conditions. A simulation example is used to verify the effectiveness of the proposed control scheme.
IEEE Transactions on Systems, Man, and Cybernetics | 2017
Jinna Li; Bahare Kiumarsi; Tianyou Chai; Frank L. Lewis; Jialu Fan
Industrial flow lines are composed of unit processes operating on a fast time scale and performance measurements known as operational indices measured at a slower time scale. This paper presents a model-free optimal solution to a class of two time-scale industrial processes using off-policy reinforcement learning (RL). First, the lower-layer unit process control loop with a fast sampling period and the upper-layer operational index dynamics at a slow time scale are modeled. Second, a general optimal operational control problem is formulated to optimally prescribe the set-points for the unit industrial process. Then, a zero-sum game off-policy RL algorithm is developed to find the optimal set-points by using data measured in real-time. Finally, a simulation experiment is employed for an industrial flotation process to show the effectiveness of the proposed method.