BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//Talks.cam//talks.cam.ac.uk//
X-WR-CALNAME:Talks.cam
BEGIN:VEVENT
SUMMARY:Interpreting and Controlling Intermediate Representations in Large
  Language Models - Nicola Cancedda -Meta's Fundamental AI Research (FAIR) 
 team. 
DTSTART:20241126T140000Z
DTEND:20241126T150000Z
UID:TALK224317@talks.cam.ac.uk
CONTACT:Sally Matthews
DESCRIPTION:Large Language Models (LLMs) reshaped the AI landscape and inv
 ited themselves as dinner-table topics\, yet we do not really understand h
 ow they work. Propelled by this consideration\, the field of AI interpreta
 bility is enjoying a revival.\nIn this talk I will introduce some fundamen
 tal interpretability concepts and discuss how insights from studying the i
 nternal activations of models led us to develop a training framework that 
 significantly increases the robustness of LLMs to 'jailbreaking' attacks. 
 I will also illustrate some explorations of the internal workings of trans
 former-based autoregressive LLMs that unexpectedly led to explaining 'atte
 ntion sinking'\, a necessary mechanism for their proper functioning. I wil
 l finally offer my perspective on interesting future directions.\n\nNicola
  Cancedda is a researcher with Meta's Fundamental AI Research (FAIR) team.
  His current focus is on better understanding how Large Language Models re
 alize complex behaviors to make them more capable\, safer\, and more effic
 ient. He is an alumnus of the University of Rome "La Sapienza"\, and has h
 eld applied and fundamental research and management positions at Meta\, Xe
 rox\, and Microsoft\, pushing the state of the art in Machine Learning\, M
 achine Translation\, and Natural Language Processing\, and leading the tra
 nsfer of research results to large-scale production environment.
LOCATION:Computer Lab\, LT1
END:VEVENT
END:VCALENDAR
