BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//Talks.cam//talks.cam.ac.uk//
X-WR-CALNAME:Talks.cam
BEGIN:VEVENT
SUMMARY:Actionable Interpretability for AI Safety - Prof. Mor Geva (Tel Av
 iv University)
DTSTART:20260226T110000Z
DTEND:20260226T120000Z
UID:TALK243601@talks.cam.ac.uk
CONTACT:Lucas Resck
DESCRIPTION:Abstract: \nInterpretability research for large language model
 s (LLMs) has advanced rapidly in recent years. Yet a central open question
  remains: how can these insights can be transformed into practical tools f
 or improving AI safety? In this talk\, I present ongoing efforts to levera
 ge interpretability for both immediate and long-term safety goals. First\,
  I show how disentangling model parameters enables precise knowledge erasu
 re\, achieving finer-grained and more robust control than common fine-tuni
 ng and editing methods. Next\, I introduce a scalable approach for decompo
 sing residual stream activations through their local geometry\, demonstrat
 ing its advantages for localizing and steering model behavior. Lastly\, I 
 turn to the increasingly debated question of AI consciousness\, using inte
 rpretability to test a neuroscientifically inspired indicator of agency an
 d meta-cognitive monitoring in LLMs.\n\n\nBio: \nMor Geva is an Assistant 
 Professor at the School of Computer Science and AI at Tel Aviv University.
  Her research focuses on understanding the inner workings of large languag
 e models to increase their transparency and efficiency\, control their ope
 ration\, and improve their reasoning abilities. Mor completed a Ph.D. in C
 omputer Science at Tel Aviv University\, was a postdoctoral researcher at 
 Google DeepMind and the Allen Institute for AI\, and worked as a Research 
 Scientist at Google Research. She is a recipient of Intel's Rising Star Fa
 culty Award (2024)\, the Alon Scholarship for Outstanding Faculty (2024)\,
  EMNLP Best Paper Award (2024)\, EACL Outstanding Paper Award (2023)\, MIT
  Rising Star in EECS nomination (2021)\, and the Dan David Prize for Gradu
 ate Students in the field of AI (2020).
LOCATION:https://cam-ac-uk.zoom.us/j/86890624365?pwd=oYGWpY7d5r3JOaUCaJXTD
 0sRECFxab.1
END:VEVENT
END:VCALENDAR
