IFAC-PapersOnLine | 2021

Constrained Q-Learning for Batch Process Optimization

 
 
 
 
 

Abstract


Chemical process optimization and control often require satisfaction of constraints for safe operation. Reinforcement learning (RL) has been shown to be a powerful control technique that can handle nonlinear stochastic optimal control problems. Despite this promise, RL has yet to see significant translation to industrial practice due to its inability to satisfy state constraints. This work aims to address this challenge. We propose an “oracle”-assisted constrained Q-learning algorithm that guarantees the satisfaction of joint chance constraints with high probability, which is required for safety critical tasks. To that end, constraint tightening (backoffs) are introduced, which are adjusted using Broyden’s method, hence making the backoffs self-tuned. This results in a general methodology that can be integrated into approximate dynamic programming-based algorithms to guarantee constraint satisfaction with high probability. Finally, a case study is presented to compare the performance of the proposed approach with that of model predictive control (MPC). The superior performance of the proposed algorithm, in terms of constraint handling, signifies a step toward the use of RL in real world optimization and control of systems, where constraints are critical in ensuring safety.

Volume None
Pages None
DOI 10.1016/j.ifacol.2021.08.290
Language English
Journal IFAC-PapersOnLine

Full Text