BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//Talks.cam//talks.cam.ac.uk//
X-WR-CALNAME:Talks.cam
BEGIN:VEVENT
SUMMARY:Model Interpretability: from Illusions to Opportunities - Dr. Asma
  Ghandeharioun (Google DeepMind)
DTSTART:20250612T130000Z
DTEND:20250612T140000Z
UID:TALK231046@talks.cam.ac.uk
CONTACT:Shun Shao
DESCRIPTION:Abstract\n\nWhile the capabilities of today’s large language
  models (LLMs) are reaching—and even surpassing—what was once thought 
 impossible\, concerns remain regarding their misalignment\, such as genera
 ting misinformation or harmful text\, which continues to be an open area o
 f research. Understanding LLMs’ internal representations can help explai
 n their behavior\, verify their alignment with human values\, and mitigate
  instances where they produce errors. In this talk\, I begin by challengin
 g common misconceptions about the connections between LLMs' hidden represe
 ntations and their downstream behavior\, highlighting several “interpret
 ability illusions.”\n\nNext\, I introduce Patchscopes\, a framework we d
 eveloped that leverages the model itself to explain its internal represent
 ations in natural language. I’ll show how it can be used to answer a wid
 e range of questions about an LLM's computation. Beyond unifying prior ins
 pection techniques\, Patchscopes opens up new possibilities\, such as usin
 g a more capable model to explain the representations of a smaller model. 
 I show how patchscope can be used as a tool for inspection\, discovery\, a
 nd even error correction. Some examples include fixing multihop reasoning 
 errors\, the interaction between user personas and latent misalignment\, a
 nd understanding why different classes of contextualization errors happen.
 \n\nI hope by the end of this talk\, the audience shares my excitement in 
 appreciating the beauty of the internal mechanisms of AI systems\, underst
 ands the nuances of model interpretability and why some observations might
  lead to illusions\, and takes away Patchscope\, a powerful tool for quali
 tative analysis of how and why LLMs work and fail in different scenarios.\
 n\n\nBio\n\nAsma Ghandeharioun\, Ph.D.\, is a senior research scientist wi
 th the People + AI Research team at Google DeepMind. She works on aligning
  AI with human values through better understanding and controlling (langua
 ge) models\, uniquely by demystifying their inner workings and correcting 
 collective misconceptions along the way. While her current research is mos
 tly focused on machine learning interpretability\, her previous work spans
  conversational AI\, affective computing\, and\, more broadly\, human-cent
 ered AI. She holds a doctorate and master’s degree from MIT and a bachel
 or’s degree from the Sharif University of Technology. She has been train
 ed as a computer scientist/engineer and has research experience at MIT\, G
 oogle Research\, Microsoft Research\, Ecole Polytechnique Fédérale de La
 usanne (EPFL)\, to name a few.\n\nHer work has been published in premier p
 eer-reviewed machine learning venues such as NeurIPS\, ICLR\, ICML\, NAACL
 \, EMNLP\, AAAI\, ACII\, and AISTATS. She has received awards at NeurIPS a
 nd her work has been featured in Quanta Magazine\, Wired\, Wall Street Jou
 rnal\, and New Scientist.\n
LOCATION:https://cam-ac-uk.zoom.us/j/97599459216?pwd=QTRsOWZCOXRTREVnbTJBd
 XVpOXFvdz09
END:VEVENT
END:VCALENDAR