Guided NMPC for inducing learned control policy behaviors

Barbara Barros Carlos

Robotics Lab, Sapienza Università di Roma, Italy

Tuesday, May 21, 2019, 11:00 - 12:30

Building 102 - SR 01-012

Reinforcement Learning (RL) is a section of machine learning in which agents try to maximize their cumulative reward given the current state by taking a sequence of actions. On the other hand, nonlinear model predictive control (NMPC) is a model-based control method that seeks to solve at each time step
an optimal control problem while minimizing a cost function over a prediction horizon. To tackle the problem of model inaccuracy while performing a task and increase the performance on the real system, an approach that combines control theory and RL is proposed. This approach addresses a real-time optimization problem whereby the trajectory of the optimization problem is adapted rather than the model. In this talk, the Pendubot, a two-link, underactuated robotic arm, is used as a benchmark to perform a swing-up task.
In the control design phase, the proposed NMPC is responsible for making emerge different learned policy behaviors in the swing-up phase through the enlargement or reduction  of the constraints. In the offline learning phase, a behavioral training is done. Then the learned policy is invoked at runtime to generate trajectories within the NMPC prediction horizon, making the desired behavior emerge. Thus, this approach guarantees constraint satisfaction under parametric uncertainty. The complete scheme is tested in real-time on an experimental setup and runs in the millisecond range.