BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//Talks.cam//talks.cam.ac.uk//
X-WR-CALNAME:Talks.cam
BEGIN:VEVENT
SUMMARY:Reward Modelling - Usman Anwar\, University of Cambridge
DTSTART:20230524T100000Z
DTEND:20230524T113000Z
UID:TALK201370@talks.cam.ac.uk
CONTACT:Isaac Reid
DESCRIPTION:Reward modelling broadly refers to the methods and practices f
 or specifying the goals and objectives of a learning system and determinin
 g what constitutes a desirable outcome. Within reinforcement learning (RL)
 \, it refers to the process of designing and defining the rewards or reinf
 orcement signals. In this talk\, I will provide an overview of the popular
  methods for reward modelling\, differentiating between implicit reward mo
 delling methods such as imitation learning and cooperative inverse reinfor
 cement learning\, and explicit reward modelling methods such as inverse RL
  and RL from human feedback. I will further highlight various theoretical 
 challenges in reward modelling\, discuss use of reward modelling in langua
 ge models such as GPT-4 and connections of reward modelling problem with A
 I alignment.
LOCATION:Cambridge University Engineering Department\, CBL Seminar room BE
 4-38.
END:VEVENT
END:VCALENDAR