Tuesday, August 03, 2021, 17:15
virtual, see link below
Abstract: I will cover the challenges and opportunities that arise when we train policies and models with reinforcement learning using previously collected (offline) data. This setting, referred to as offline reinforcement learning, arises in many realistic settings: autonomous driving using data collected from human drivers, inventory management and other operations research tasks using previously collected data from hand-designed policies, and a range of other applications. Offline reinforcement learning presents major challenges due to distributional shift, since the policy that collected the data differs systematically from the current policy that is being evaluated. Addressing this issue requires some type of constraint or conservatism method, and I will discuss some recent methods that approach this challenge from a variety of different perspectives.
This talk is part of the summer school on model predictive control and reinforcement learning and openly available.
Meeting ID: 654 3915 8240