University of Cambridge > Talks.cam > DAMTP ML for Science Reading Group > An Introduction to Mechanistic Interpretability

An Introduction to Mechanistic Interpretability

Download to your calendar using vCal

If you have a question about this talk, please contact Rachel C. Zhang .

In the first journal club, we will be discussing mechanistic interpretability for language models. The meeting will be structured as follows:

- Overview of mechanistic interpretability and deep-dive into transformer circuits (see ‘A Mathematical Framework of Transformer Circuits’ Anthropic 2021: https://transformer-circuits.pub/2021/framework/index.html)

- Discussion of recent paper studying how LLMs develop perceptual abilities by investigating how Claude 3.5 Haiku learns to perform linebreaking in fixed-width text (see ‘When Models Manipulate Manifolds: The Geometry of a Counting Task’ Anthropic 2025: https://transformer-circuits.pub/2025/linebreaks/index.html)

It is not necessary to read the above literature before the session!

This talk is part of the DAMTP ML for Science Reading Group series.

This talk is included in these lists:

Note that ex-directory lists are not shown.

 

© 2006-2025 Talks.cam, University of Cambridge. Contact Us | Help and Documentation | Privacy and Publicity