BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//Talks.cam//talks.cam.ac.uk//
X-WR-CALNAME:Talks.cam
BEGIN:VEVENT
SUMMARY:Exploring and Controlling Social Values in Large Language Models 
 through Role-Playing  - Paul Röttger (Oxford University)
DTSTART:20230120T120000Z
DTEND:20230120T130000Z
UID:TALK195865@talks.cam.ac.uk
CONTACT:Michael Schlichtkrull
DESCRIPTION:Abstract: \n\nSocial values are a key factor in human decisio
 n-making. Some people\, for example\, oppose the death penalty while othe
 rs support it\, and there is no single objective truth. Large language m
 odels are pre-trained on texts authored by many different people with dif
 ferent social values. But when prompted to answer an ethical question or 
 complete a subjective task\, model responses will necessarily align with s
 ome social values\, and not others. This leads to two questions that I wa
 nt to answer in my research: 1) What social values are reflected in mode
 l behaviour? 2) How can we control these values\, and by extension model 
 behaviour? In my talk\, I will introduce role-playing as a framework fo
 r exploring these questions\, differentiating between generic roles tha
 t models play by default\, and specific roles that we ask them to play\
 , for example based on sociodemographic attributes. I will discuss requi
 rements for successful role-playing\, including role stability\, internal
  and external alignment\, as well as the limitations of role-playing. Las
 tly\, I will present initial role-playing experiments for hate speech det
 ection\, as a highly subjective task.\n\nBio: \n\nPaul Röttger is a fina
 l-year DPhil student at the University of Oxford\, working on natural lan
 guage processing. In his thesis\, he focused on evaluating and improving h
 ate speech detection models\, adapting language models to language change
 \, and managing subjectivity in data annotation. His main research intere
 st now is in exploring and controlling the behaviour of large language m
 odels in relation to social values\, as part of a larger goal to make mod
 els more helpful and less harmful.
LOCATION:Computer Lab\, SS03
END:VEVENT
END:VCALENDAR
