Model Predictive Control and Reinforcement Learning

 

Lectures: Joschka Boedecker and Moritz Diehl

Guest Lectures: Sebastien Gros (NTNU Trondheim) and Sergey Levine (UC Berkeley)

Exercises: Katrin Baumgärtner and Jasper Hoffmann

University of Freiburg, July 26 to August 4, 2021

(online available, all times are Central European Summer Time) 


 

This block course of 8 days duration is intended for master and PhD students from engineering, computer science, mathematics, physics, and other mathematical sciences. The aim is that participants understand the main concepts of model predictive control (MPC) and reinforcement learning (RL) as well the similarities and differences between the two approaches. In hands-on exercises and project work they learn to apply the methods to practical optimal control problems from science and engineering. 

The course consists of lectures, exercises, and project work. The lectures in the first week will be given by Prof. Dr. Joschka Boedecker and Prof. Dr. Moritz Diehl from the University of Freiburg. In the second week of the course, invited guest lectures will be given by Prof. Dr. Sebastien Gros from NTNU Trondheim (Norway) and Prof. Dr. Sergey Levine from UC Berkeley (California, US).

Topics include

  • Optimal Control Problem (OCP) formulations - constrained, infinite horizon, discrete time, stochastic, robust
  • Markov Decision Processes (MDP)
  • From continuous to discrete: discretization in space and time
  • Dynamic Programming (DP) concepts and algorithms - value iteration and policy iteration
  • Linear Quadratic Regulator (LQR) and Riccati equations
  • Convexity considerations in DP for constrained linear systems
  • Model predictive control (MPC) formulations and stability guarantees
  • MPC algorithms - quadratic programming, direct multiple shooting, Gauss-Newton, real-time iterations
  • Differential Dynamic Programming (DDP) for the solution of unconstrained MPC problems 
  • Reinforcement Learning (RL) formulations and approaches  
  • Model-free RL: Monte Carlo, temporal differences, model learning, direct policy search
  • RL with function approximation
  • Model-based RL and combinations of model-based and model-free methods
  • Similarities and differences between MPC and RL

The course will be conducted as a mixed virtual and real event if the Corona situation in July 2021 allows, and otherwise be fully virtual. All lecture videos and exercises will be made openly available. Each course day starts at 9:00 and ends at 17:00 (5 p.m.) Freiburg time (MET). Lectures are typically followed by computer exercises in Python. A mandatory requirement for officially passing the course is successful participation in an online microexam on Friday July 30, 2021, at 9:00. In the second week, on August 2-4, 2021, participants will work on application projects which apply at least one of the MPC and RL methods to self-chosen application problems from any area of science or engineering. The results of the projects, that can be performed in teams of either one or two people (preferred), will be presented in a public poster presentation on the last day of the course, and a short report to be submitted two weeks after the course. The report will determine the final grade of the course.


   This course has received funding from the European Union's Horizon 2020 research and innovation programme under the Marie Sklodowska-Curie grant agreement No. 953348.

The course can be followed in different levels of participation:

Participation Level Passing Requirements Certificate Max. No of Participants
Level A - Online Listening -  -

unlimited

Level B - First Week with Exam exercises and microexam on July 30 "Certificate B", without grade 

120

Level C - Full Time with Project Level B requirements plus project presentation on Aug. 4 "Certificate C", without grade

90

Level D - Full Time with Report Level C requirements plus report on Aug. 18 "Certificate D", with or without grade, 3 ECTS.

60

The maximum number of physical participants in Freiburg is 24. For Level D participation, priority will be given to students of the University of Freiburg.

Registration to the course is closed!

Any questions on the course can be addressed to Katrin Baumgärtner (katrin.baumgaertner@imtek.uni-freiburg.de).

Certificates:

If you have participated in the course and you want to get a certificate, please fill out the form until the 20th of August.


Microexam:

Form to fill in the answers

Exam

 


Projects: guidelines, teams, slides

Please upload your project reports here until the 18th of August.


Preparation for the exercises:

For the exercises you need:

  1. An editor that can handle Jupyter Notebooks.
  2. A way to install python packages like pip/pip3. The packages are listed in the requirements.txt.

To achieve this, it is helpful to make yourself first familiar with how to install python packages. After that, download the requirements file linked above and install the packages as described here. This should also install Jupyter Lab, which is an editor that lives in your browser. To start Jupyter Lab type  jupyter lab  in your terminal. Further instructions for Jupyter Lab can be found here.

Optional: There is an optional exercise using the open-source software acados for high-performance embedded NMPC, which is developed within the group of Prof. Diehl. If you would like to try it, please follow the installation instructions here.

Exercises:

Monday: exercise01 (solution), exercise02 (solution)

Tuesday: exercise03 (solution), exercise04 (solution)

Wednesday: exercise05 (solution), exercise06 (solution)

Thursday: exercise07 (solution), exercise08 (solution)


Lecture location:

HS 1015, Kollegiengebäude I, Platz der Universität 3 , D-79098 Freiburg, Germany

At the lecture location, there will be basic catering like coffee, water and fruits.

All Lectures and Exercise Sessions are broadcasted via Zoom:

Join Zoom Meeting
https://uni-freiburg.zoom.us/j/65439158240?pwd=OHJrb2p2U2xDUEpVQTRBWWlvZGo0UT09

Meeting ID: 654 3915 8240
Passcode: 0uu1tq0ek

Get-to-know-each-other session:

There will be a get to know each other session on the first Monday. For the participants that are taking part via Zoom, we will meet in wonder.me under the following link:

https://www.wonder.me/r?id=20f5f8da-411c-4c79-8560-05fbe4475ebf

For everyone else, we will just meet in the lecture room.


Schedule

Please note that the lecture recordings might not show a video if you open them in your browser. Downloading the lecture recordings and watching them locally should be working fine.

Monday

Lecture 1 - Introduction - Joschka Boedecker and Moritz Diehl

Video

 

Lecture 2 - Dynamic Systems and Simulation (Moritz Diehl)

Video

 

Exercise 1 - Dynamic System Simulation - Katrin Baumgärtner  und Jasper Hoffmann

 

 

Lecture 3 - Numerical Optimization (Moritz Diehl)

Video

 

Exercise 2 - Numerical Optimization - Katrin Baumgärtner

 

Tuesday

Lecture 4 - Dynamic Programming and LQR - Moritz Diehl

Video

 

Exercise 3 - Dynamic Programming and LQR - Katrin Baumgärtner

 

 

Lecture 5 - MDPs, PI and VI  - Joschka Boedecker

Video

 

Lecture 6 - Monte Carlo RL, Temporal Difference and Q-Learning - Joschka Boedecker

 

 

Exercise 4 - Q-Learning - Jasper Hoffmann

 

Wednesday

Lecture 7 - Numerical Optimal Control - Moritz Diehl

Slides1, Slides2, Blackboard, Slides3

 

Video, Video

 

Exercise 5 - Numerical Optimal Control - Katrin Baumgärtner

 

 

Lecture 8 - MPC Stability Theory - Moritz Diehl (Blackboard)

Video

 

Lecture 9 - MPC Algorithms - Moritz Diehl - cancelled -

 

 

Exercise 6 - Model Predictive Control - Katrin Baumgärtner

 

Thursday

Lecture 10 - Onpolicy RL with Function Approximation - Joschka Boedecker

Video

 

Lecture 11 - Offpolicy RL with Function Approximation - Joschka Boedecker

Video

 

Exercise 7 - RL with Function Approximators - Jasper Hoffmann

 

 

Lecture 12 - Policy Gradient Methods - Joschka Boedecker

Video

 

Lecture 13 - Advanced Value-based Methods - Joschka Boedecker

Video

Friday

Lecture 14 - Recent Algorithms for Nonlinear and Robust MPC - Moritz Diehl. Slides: PartA1, PartA2, PartB

VideoPartA, VideoPartC

Paper PDFs: PA1, PA2, PA3, PA4

 

Lecture 15 - Planning and Learning - Joschka Boedecker

Video

 

Lecture 16 - Differences and Similarities of MPC and RL - Joschka Boedecker and Moritz Diehl

Video

 

Extra: LP formulation of DP - Moritz Diehl

Video

Monday

Guest Lecture  Sebastien Gros: Adaptation of MPC via RL: fundamental principles

Video

 

Guest Lecture Sebastien Gros: RL and MPC: safety, stability, and some more recent results

Video

 

Guest Lecture -  Sergey Levine: Model-Free and Model-Based Reinforcement Learning from Offline Data

 


 

 

Time Slots

Monday

Tuesday

Wednesday

Thursday

Friday

09:00-09:45

Lecture 1 - Introduction - Joschka Boedecker and Moritz Diehl

Video

Lecture 4 - Dynamic Programming and LQR - Moritz Diehl

Video

Lecture 7 - Numerical Optimal Control - Moritz Diehl

Slides1, Slides2

Blackboard

Video

Lecture 10 - Onpolicy RL with Function Approximation - Joschka Boedecker

Video

Microexam

10:00-10:45

Lecture 2: Dynamic Systems and Simulation (Moritz Diehl)

Video

Lecture 4  (continued) -  Moritz Diehl

Lecture 7  (continued) - Moritz Diehl

Slides3

Video

Lecture 11 - Offpolicy RL with Function Approximation - Joschka Boedecker

Video

Lecture 14 - Recent Algorithms for Nonlinear and Robust MPC - Moritz Diehl. Slides: PartA1, PartA2, PartB

VideoPartA

VideoPartC

Paper PDFs: PA1, PA2, PA3, PA4

11:15-12:00

Exercise 1 - Dynamic System Simulation - Katrin Baumgärtner  und Jasper Hoffmann

Exercise 3 - Dynamic Programming and LQR - Katrin Baumgärtner

Exercise 5 - Numerical Optimal Control - Katrin Baumgärtner

Exercise 7 - RL with Function Approximators - Jasper Hoffmann

Project Guidelines

14:00-14:45

Lecture 3: Numerical Optimization (Moritz Diehl)

Video

Lecture 5 - MDPs, PI and VI  - Joschka Boedecker

Video

Lecture 8 - MPC Stability Theory - Moritz Diehl

Blackboard

Video

Lecture 12 - Policy Gradient Methods - Joschka Boedecker

Video

Lecture 15 - Planning and Learning - Joschka Boedecker

Video

15:00-15:45

Extended Coffee Break / Get-to-Know Each-Other-Session

Lecture 6 - Monte Carlo RL, Temporal Difference and Q-Learning - Joschka Boedecker

Video

Lecture 9 - MPC Algorithms - Moritz Diehl

- cancelled -

Lecture 13 - Advanced Value-based Methods - Joschka Boedecker

Video

Lecture 16 - Differences and Similarities of MPC and RL - Joschka Boedecker and Moritz Diehl

Video

 

DP as LP

Video

16:15-17:00

Exercise 2 - Numerical Optimization - Katrin Baumgärtner

Exercise 4 - Q-Learning - Jasper Hoffmann

Exercise 6 - Model Predictive Control - Katrin Baumgärtner

Exercise 8 - Policy Gradient - Jasper Hoffmann

Project Pitch Presentations

 

Time Slots

Monday

Tuesday

Wednesday

09:00-09:45

Guest Lecture  Sebastien Gros: Adaptation of MPC via RL: fundamental principles

Video

Slides

Project Work

 

Project Work

10:00-10:45

Project Commitments

Project Status Updates

Project Presentations

slides

11:15-12:00

Project Work

Question session: Jasper Hoffmann & Katrin Baumgärtner

Project Work

Question session:

Jasper Hoffmann & Katrin Baumgärtner

Project Presentations

14:00-14:45

Guest Lecture Sebastien Gros: RL and MPC: safety, stability, and some more recent results

Video

Slides

Project Work

Project Presentations

15:00-15:45

Project Work

Project Work

Question

Session:

Jasper Hoffmann & Katrin Baumgärtner

Certificate Handout (A,B,C)

16:15-17:00

Project Work

Question

Session:

Jasper & Katrin Baumgärtner

Guest Lecture -  Sergey Levine: Model-Free and Model-Based Reinforcement Learning from Offline Data , Start moved to 17:15

Slides

End at 16:00