BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//Talks.cam//talks.cam.ac.uk//
X-WR-CALNAME:Talks.cam
BEGIN:VEVENT
SUMMARY:Offline Reinforcement Learning - Max Patacchiola (University of Ca
 mbridge)\, Stephen Chung (University of Cambridge)\, Adam Jelley (Universi
 ty of Edinburgh)
DTSTART:20230215T110000Z
DTEND:20230215T123000Z
UID:TALK197326@talks.cam.ac.uk
CONTACT:James Allingham
DESCRIPTION:In the first part of the talk we will introduce the common ter
 ms used in standard online RL. After that we will define the offline RL se
 tting\, describing applications and benchmarks. We will then focus on beha
 vioural cloning (BC)\, as a simple and stable baseline for learning a poli
 cy from offline interaction data. As a particular instance of BC\, we will
  describe the decision transformer\, a recently proposed method that lever
 ages the transformer architecture to tackle the offline RL setting. In the
  second part of the talk\, we will explore how off-policy RL algorithms or
 iginally designed for the online setting (such as SAC) can be adapted to b
 etter handle the necessary distribution shift required for improving on th
 e policy in the offline data\, without online feedback. We will find that 
 this reduces to a problem of quantifying and managing uncertainty. In the 
 third and last part of the talk\, we will first review the classical offli
 ne reinforcement learning methods\, including ways to evaluate and improve
  policies using offline data by importance sampling. The challenges and ap
 plicability of these methods will be discussed. Then\, we will review mode
 rn offline RL methods\, including policy constraint methods and model-base
 d offline RL methods. In policy constraint methods\, we encourage the new 
 policy to be similar to the policy observed in the offline dataset\, while
  in model-based offline RL methods\, we quantify the uncertainty of the mo
 del and use the uncertainty to discourage the new policy from visiting tho
 se uncertain regions.\n\nReferences:\n\nLevine\, S.\, Kumar\, A.\, Tucker\
 , G.\, & Fu\, J. (2020). Offline reinforcement learning: Tutorial\, review
 \, and perspectives on open problems. arXiv preprint arXiv:2005.01643.\n\n
 Chen\, L.\, Lu\, K.\, Rajeswaran\, A.\, Lee\, K.\, Grover\, A.\, Laskin\, 
 M.\, ... & Mordatch\, I. (2021). Decision transformer: Reinforcement learn
 ing via sequence modeling. Advances in neural information processing syste
 ms\, 34\, 15084-15097.\n\nFujimoto\, S.\, & Gu\, S. S. (2021). A minimalis
 t approach to offline reinforcement learning. Advances in neural informati
 on processing systems\, 34\, 20132-20145.\n\nAn\, G.\, Moon\, S.\, Kim\, J
 . H.\, & Song\, H. O. (2021). Uncertainty-based offline reinforcement lear
 ning with diversified q-ensemble. Advances in neural information processin
 g systems\, 34\, 7436-7447.
LOCATION:Cambridge University Engineering Department\, CBL Seminar room BE
 4-38.
END:VEVENT
END:VCALENDAR
