Reinforcement Learning based Constrained Optimal Control: an Interpretable Reward Design

Jingjie Ni

East China University of Science and Technology

Tuesday, September 30, 2025, 11:00 - 12:30

SR 01-012

Abstract:

This study introduces an interpretable reward design framework for solving constrained optimal control problems using reinforcement learning. The main challenge is to achieve both cost reduction and constraint satisfaction using a single reward function within the reinforcement learning framework. To address this, we propose interpretable reward schemes for both deterministic discrete-state and continuous-state systems, proving their optimality. For deterministic discrete-state systems, we use the minimum-cost state-flipped control for reachability in Boolean control networks as a case study. We design rewards based on the upper bound of reachability steps without cycles to ensure that terminal constraints are met while also optimizing for minimum-cost state-flipped control. In large-scale networks, we introduce adaptive variable rewards based on the known maximum steps needed to reach the target state set without cycles, which helps to speed up convergence. For deterministic continuous-state systems, the reward function is composed of four components: terminal constraint reward, guidance reward, penalty for state constraint violations, and cost reduction incentive. We present a theoretically justified reward design that establishes bounds for these components. Recognizing the importance of prior knowledge in reward design, we sequentially solve two subproblems, using each solution to inform the reward design for the subsequent problem. We also integrate reinforcement learning with curriculum learning, utilizing solutions from simpler subproblems to assist in tackling more complex challenges, thereby facilitating convergence.