University of Cambridge > Talks.cam > DAMTP ML for Science Reading Group > How Do Language Models Reason and Compute? A Mechanistic Interpretability Approach

How Do Language Models Reason and Compute? A Mechanistic Interpretability Approach

Download to your calendar using vCal

If you have a question about this talk, please contact Liz Tan .

Mechanistic interpretability aims to uncover the internal algorithms implemented by neural networks by identifying the circuits responsible for specific behaviours.

In this talk, we introduce the goals and methods of mechanistic interpretability for LLMs, including recent approaches based on sparse feature decompositions, circuit analysis, and attribution graphs. We discuss how these tools can help better understand the internal mechanisms behind specific model behaviours, such as reasoning or arithmetic, and the importance of these mechanisms for scientific insight into LLMs.

We will base our discussion on a medium-scale language model (Qwen3-4B) and build on ideas from On the Biology of a Large Language Model (Anthropic, 2025).

This talk is part of the DAMTP ML for Science Reading Group series.

This talk is included in these lists:

Note that ex-directory lists are not shown.

 

Š 2006-2025 Talks.cam, University of Cambridge. Contact Us | Help and Documentation | Privacy and Publicity