BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//Talks.cam//talks.cam.ac.uk//
X-WR-CALNAME:Talks.cam
BEGIN:VEVENT
SUMMARY:Pluralistic Alignment through Personalized Reinforcement Learning 
 from Human Feedback - Natasha Jaques\, University of Washington
DTSTART:20241031T170000Z
DTEND:20241031T180000Z
UID:TALK222274@talks.cam.ac.uk
CONTACT:Tiancheng Hu
DESCRIPTION:Abstract: Reinforcement Learning from Human Feedback (RLHF) is
  a powerful paradigm for aligning foundation models to human values and pr
 eferences. However\, current RLHF techniques cannot account for the natura
 lly occurring differences in individual human preferences across a diverse
  population. When these differences arise\, traditional RLHF frameworks si
 mply average over them\, leading to inaccurate rewards and poor performanc
 e for minority groups. To address the need for pluralistic alignment\, we 
 develop a novel multimodal RLHF method\, which we term Variational Prefere
 nce Learning (VPL). In this talk\, I will first give an overview of past a
 pproaches to RLHF\, and then show how VPL address issues of value monism. 
 VPL uses a few preference labels to infer a novel user-specific latent var
 iable\, and learns reward models and policies conditioned on this latent w
 ithout additional user-specific data. While conceptually simple\, we show 
 that in practice\, this reward modeling requires careful algorithmic consi
 derations around model architecture and reward scaling. To empirically val
 idate our proposed technique\, we first show that it can provide a way to 
 combat underspecification in simulated control problems\, inferring and op
 timizing user-specific reward functions. Next\, we conduct experiments on 
 pluralistic language datasets representing diverse user preferences and de
 monstrate improved reward function accuracy. We additionally show the bene
 fits of this probabilistic framework in terms of measuring uncertainty\, a
 nd actively learning user preferences. This work enables learning from div
 erse populations of users with divergent preferences\, an important challe
 nge that naturally occurs in problems from robot learning to foundation mo
 del alignment.\n\nBio: Natasha Jaques is an Assistant Professor of Compute
 r Science and Engineering at the University of Washington\, and a Senior R
 esearch Scientist at Google DeepMind. Her research focuses on Social Reinf
 orcement Learning in multi-agent and human-AI interactions. During her PhD
  at MIT\, she developed techniques for learning from human feedback signal
 s to train language models which were later built on by OpenAI’s series 
 of work on Reinforcement Learning from Human Feedback (RLHF). In the multi
 -agent space\, she has developed techniques for improving coordination thr
 ough the optimization of social influence\, and adversarial environment ge
 neration for improving the robustness of RL agents. Natasha's work has rec
 eived various awards\, including Best Demo at NeurIPS\, an honourable ment
 ion for Best Paper at ICML\, and the Outstanding PhD Dissertation Award fr
 om the Association for the Advancement of Affective Computing. Her work ha
 s been featured in Science Magazine\, MIT Technology Review\, Quartz\, IEE
 E Spectrum\, Boston Magazine\, and on CBC radio\, among others. Natasha ea
 rned her Masters degree from the University of British Columbia\, and unde
 rgraduate degrees in Computer Science and Psychology from the University o
 f Regina.\n\n
LOCATION:https://cam-ac-uk.zoom.us/j/97599459216?pwd=QTRsOWZCOXRTREVnbTJBd
 XVpOXFvdz09
END:VEVENT
END:VCALENDAR
