Offline Reinforcement Learning
- đ¤ Speaker: Max Patacchiola (University of Cambridge), Stephen Chung (University of Cambridge), Adam Jelley (University of Edinburgh)
- đ Date & Time: Wednesday 15 February 2023, 11:00 - 12:30
- đ Venue: Cambridge University Engineering Department, CBL Seminar room BE4-38.
Abstract
In the first part of the talk we will introduce the common terms used in standard online RL. After that we will define the offline RL setting, describing applications and benchmarks. We will then focus on behavioural cloning (BC), as a simple and stable baseline for learning a policy from offline interaction data. As a particular instance of BC, we will describe the decision transformer, a recently proposed method that leverages the transformer architecture to tackle the offline RL setting. In the second part of the talk, we will explore how off-policy RL algorithms originally designed for the online setting (such as SAC ) can be adapted to better handle the necessary distribution shift required for improving on the policy in the offline data, without online feedback. We will find that this reduces to a problem of quantifying and managing uncertainty. In the third and last part of the talk, we will first review the classical offline reinforcement learning methods, including ways to evaluate and improve policies using offline data by importance sampling. The challenges and applicability of these methods will be discussed. Then, we will review modern offline RL methods, including policy constraint methods and model-based offline RL methods. In policy constraint methods, we encourage the new policy to be similar to the policy observed in the offline dataset, while in model-based offline RL methods, we quantify the uncertainty of the model and use the uncertainty to discourage the new policy from visiting those uncertain regions.
References:
Levine, S., Kumar, A., Tucker, G., & Fu, J. (2020). Offline reinforcement learning: Tutorial, review, and perspectives on open problems. arXiv preprint arXiv:2005.01643.
Chen, L., Lu, K., Rajeswaran, A., Lee, K., Grover, A., Laskin, M., ... & Mordatch, I. (2021). Decision transformer: Reinforcement learning via sequence modeling. Advances in neural information processing systems, 34, 15084-15097.
Fujimoto, S., & Gu, S. S. (2021). A minimalist approach to offline reinforcement learning. Advances in neural information processing systems, 34, 20132-20145.
An, G., Moon, S., Kim, J. H., & Song, H. O. (2021). Uncertainty-based offline reinforcement learning with diversified q-ensemble. Advances in neural information processing systems, 34, 7436-7447.
Series This talk is part of the Machine Learning Reading Group @ CUED series.
Included in Lists
- All Talks (aka the CURE list)
- bld31
- Cambridge Centre for Data-Driven Discovery (C2D3)
- Cambridge Forum of Science and Humanities
- Cambridge Language Sciences
- Cambridge talks
- Cambridge University Engineering Department, CBL Seminar room BE4-38.
- Cambridge University Engineering Department Talks
- Centre for Smart Infrastructure & Construction
- Chris Davis' list
- Computational Continuum Mechanics Group Seminars
- custom
- Featured lists
- Guy Emerson's list
- Hanchen DaDaDash
- Inference Group Journal Clubs
- Inference Group Summary
- Information Engineering Division seminar list
- Interested Talks
- Machine Learning Reading Group
- Machine Learning Reading Group @ CUED
- Machine Learning Summary
- ML
- ndk22's list
- ob366-ai4er
- Quantum Matter Journal Club
- Required lists for MLG
- rp587
- School of Technology
- Simon Baker's List
- TQS Journal Clubs
- Trust & Technology Initiative - interesting events
- yk373's list
- yk449
Note: Ex-directory lists are not shown.
![[Talks.cam]](/static/images/talkslogosmall.gif)

Max Patacchiola (University of Cambridge), Stephen Chung (University of Cambridge), Adam Jelley (University of Edinburgh)
Wednesday 15 February 2023, 11:00-12:30