University of Cambridge > Talks.cam > DAMTP ML for Science Reading Group > An Introduction to Mechanistic Interpretability

Log in

Google

Microsoft

Information on

Subscribing to talks details

Finding a talk details

Adding a talk details

Disseminating talks details

Help and Documentation details

An Introduction to Mechanistic Interpretability

Download to your calendar using vCal

Rachel C. Zhang (DAMTP PhD), Liz Tan (DAMTP PhD)
Tuesday 28 October 2025, 15:30-16:30
B1.19 Potters Room, Centre for Mathematical Sciences, Cambridge CB3 0WA.

If you have a question about this talk, please contact Rachel C. Zhang .

In the first journal club, we will be discussing mechanistic interpretability for language models. The meeting will be structured as follows:

- Overview of mechanistic interpretability and deep-dive into transformer circuits (see ‘A Mathematical Framework of Transformer Circuits’ Anthropic 2021: https://transformer-circuits.pub/2021/framework/index.html)

- Discussion of recent paper studying how LLMs develop perceptual abilities by investigating how Claude 3.5 Haiku learns to perform linebreaking in fixed-width text (see ‘When Models Manipulate Manifolds: The Geometry of a Counting Task’ Anthropic 2025: https://transformer-circuits.pub/2025/linebreaks/index.html)

It is not necessary to read the above literature before the session!

This talk is part of the DAMTP ML for Science Reading Group series.

This talk is included in these lists:

Note that ex-directory lists are not shown.

An Introduction to Mechanistic Interpretability

📅 Download to calendar (vCal)

👤 Speaker: Rachel C. Zhang (DAMTP PhD), Liz Tan (DAMTP PhD)
📅 Date & Time: Tuesday 28 October 2025, 15:30 - 16:30
📍 Venue: B1.19 Potters Room, Centre for Mathematical Sciences, Cambridge CB3 0WA

Questions? Contact Rachel C. Zhang

Abstract

In the first journal club, we will be discussing mechanistic interpretability for language models. The meeting will be structured as follows:

It is not necessary to read the above literature before the session!

Series This talk is part of the DAMTP ML for Science Reading Group series.

Included in Lists

Note: Ex-directory lists are not shown.

Log in

🔐 Log In

Information on

ℹ️ Information

An Introduction to Mechanistic Interpretability

This talk is included in these lists:

An Introduction to Mechanistic Interpretability

Abstract

Included in Lists

Log in

🔐 Log In

Information on

ℹ️ Information

An Introduction to Mechanistic Interpretability

This talk is included in these lists:

Other lists

Other talks

An Introduction to Mechanistic Interpretability

Abstract

Included in Lists