Initialization Strategies for Differentiable MPC in RL

Master Thesis defense

Leonard Fichtner

Friday, May 23, 2025, 10:00

IMBIT, Georges-Köhler-Allee 201, Raum 42

The complementary strengths and weaknesses of model predictive control (MPC) and reinforcement learning (RL) motivate the development of hybrid methods that aim to combine the strengths of both paradigms. One promising direction is hierarchical MPC-RL, where a neural network predicts parameters for an MPC controller em bedded within a policy. In this setting, differentiable MPC methods have attracted growing interest in recent years. These methods enable gradients to pass through the optimization problem by treating the MPC controller as a differentiable layer of a neural network. However, implementing these methods requires significant expertise in both, MPC as well as RL. So far, there is only limited software support available to the research community. Further, differentiable MPC-RL methods place a high demand on the MPC controller, strongly integrating the MPC controller in the train ing of the networks. Hence, high computational efficiency and reliable convergence properties are desirable. This thesis addresses these challenges in three ways. First, the state-of-the-art MPC solver acados has been integrated with the PyTorch learning framework in a new open-source software package, “Learning Predictive Control” (leap-c). This provides a flexible interface for combining RL and MPC, leveraging efficient sensitivity computations and supporting differentiation through the MPC optimization problems in a convenient way. Second, a novel algorithm, SAC-FOP, which was introduced in the context of leap-c, is discussed. It combines differentiable MPC with off-policy RL. The algorithm is evaluated on different learning tasks and compared to a standard model-free RL method and a variant of SAC-FOP that does not use differentiable MPC. SAC-FOP shows significantly increased sample efficiency and enhanced final performance compared to the model-free off-policy RL algorithm and to a lesser degree, also in comparison to the other MPC-RL algorithm. Third, the thesis investigates the impact of initialization strategies in the context of this novel algorithm which could be crucial for improving the stability and speed. For this, several new initialization strategies are introduced The initialization strategies are systematically evaluated on tasks with varying computational complexity and compared to standard, commonly used initialization strategies. For the computation ally easier tasks, no significant differences in runtime or rewards could be observed, whereas for the harder task, the methods differ significantly. Surprisingly, even simple, easy-to-implement standard strategies can lead to significant improvements. Some newly proposed initialization methods achieve competitive performance across all tasks while maintaining low computational overhead and implementation complexity. The new learning-based initialization strategies show strong potential, but they come with substantial computational overhead and maintenance costs.

Zoom: https://uni-freiburg.zoom.us/j/6373625484?pwd=bElVd0NEZk14OUhpNnVxVXRpMUErQT09