Generally, the Optimal control problem for sequential decision making under uncertainties typically seek control laws in an offline manner assuming availability of the underlying dynamic models. When the underlying models are unavailable or partially known, adaptive control approaches are employed in an online fashion. Thus, online RL methods represent an adaptive optimal control method in the sense that (sub)optimal control laws are obtained online using real-time measurements without a model. Stability analysis of optimal and adaptive control methods are crucial in safety-related and potentially hazardous applications. Informally, stability requires containment, that is, for bounded initial conditions the system state remains bounded for all future times. When interconnected with nonlinear dynamical systems it is identified that by regulating the input-output gradients of policies, the robust stability can be strongly guaranteed based on a semi-definite programming feasibility problem. The method certifies a large set of stabilizing controllers by exploiting problem specific structures.
Please follow the article “Busoniu, Lucian & de Bruin, Tim & Tolić, Domagoj & Kober, Jens & Palunko, Ivana. (2018). Reinforcement learning for control: Performance, stability, and deep approximators. Annual Reviews in Control. 10.1016/j.arcontrol.2018.09.005.” for more information on ensuring stability for reinforcement learning-based controller.