BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//Talks.cam//talks.cam.ac.uk//
X-WR-CALNAME:Talks.cam
BEGIN:VEVENT
SUMMARY:What Happens When They're Smarter Than Us? - Dr Konstantinos Voudo
 uris (UK AI Security Institute)
DTSTART:20260227T120000Z
DTEND:20260227T130000Z
UID:TALK242905@talks.cam.ac.uk
CONTACT:Suchir Salhan
DESCRIPTION:Abstract: We are building machine learning models that increas
 ingly outperform humans on particular tasks. I argue that this creates a p
 articularly hard version of the principal-agent problem\, in which we\, th
 e principal\, have to supervise capable agents that lack our norms and inc
 entives\, while only being able to monitor a fraction of their outputs. Ag
 ents that are more capable than their principals can learn to exploit this
  capability gap\, which creates attendant risks. I present a promising ove
 rsight strategy\, the debate protocol\, in which matched AI debaters argue
  before a weaker human judge. I sketch an analysis showing that debate is 
 vulnerable to exploitation when the judge makes systematic errors\, becaus
 e the debaters can steer arguments towards the judge's cognitive blind spo
 ts. I propose a partial remedy\, debate-by-jury\, in which juries of human
  judges oversee debates. Juries help when jurors' errors are relatively un
 correlated\, but when biases are shared across jurors\, aggregation can am
 plify rather than correct error.\n\n**Speaker Bio:** Dr Konstantinos Voudo
 uris is the cognitive scientist on the Alignment Team at the UK AI Securit
 y Institute. He holds a PhD in psychology (2024) from the University of Ca
 mbridge. His research focuses on advancing the sciences of AI alignment\, 
 scalable oversight\, and AI evaluation\, using tools from the cognitive sc
 iences. Combining these diverse fields allows us to build better\, safer\,
  and more human-like AI systems\, as well as informed and sensible AI poli
 cy.
LOCATION:SS03 Hybrid (In-Person + Online). Here is the Google Meet Link: h
 ttps://meet.google.com/cru-hcuo-rhu
END:VEVENT
END:VCALENDAR
