How Do Language Models Reason and Compute? A Mechanistic Interpretability Approach
- đ¤ Speaker: Julia Dima (DIS MPhil)
- đ Date & Time: Wednesday 11 March 2026, 11:00 - 12:00
- đ Venue: MR10, Centre for Mathematical Sciences
Abstract
Mechanistic interpretability aims to uncover the internal algorithms implemented by neural networks by identifying the circuits responsible for specific behaviours.
In this talk, we introduce the goals and methods of mechanistic interpretability for LLMs, including recent approaches based on sparse feature decompositions, circuit analysis, and attribution graphs. We discuss how these tools can help better understand the internal mechanisms behind specific model behaviours, such as reasoning or arithmetic, and the importance of these mechanisms for scientific insight into LLMs.
We will base our discussion on a medium-scale language model (Qwen3-4B) and build on ideas from On the Biology of a Large Language Model (Anthropic, 2025).
Series This talk is part of the DAMTP ML for Science Reading Group series.
Included in Lists
Note: Ex-directory lists are not shown.
![[Talks.cam]](/static/images/talkslogosmall.gif)


Wednesday 11 March 2026, 11:00-12:00