Natural Experiments in NLP and Where to Find Them
- 👤 Speaker: Pietro Lesci, University of Cambridge
- 📅 Date & Time: Friday 15 November 2024, 15:30 - 17:00
- 📍 Venue: MR12, Centre for Mathematical Sciences, Wilberforce Road, Cambridge
Abstract
Zoom Link available upon request
In training language models, training choices—such as the random seed for data ordering or the token vocabulary size—significantly influence model behaviour. Answering counterfactual questions like “How would the model perform if this instance were excluded from training?” is computationally expensive, as it requires re-training the model. Once these training configurations are set, they become fixed, creating a “natural experiment” where modifying the experimental conditions incurs high computational costs. Using econometric techniques to estimate causal effects from observational studies enables us to analyse the impact of these choices without requiring full experimental control or repeated model training. In this talk, I will present our paper, Causal Estimation of Memorisation Profiles (Best Paper Award at ACL 2024 ), which introduces a novel method based on the difference-in-differences technique from econometrics to estimate memorisation without requiring model re-training. I will also cover the necessary econometric concepts and key literature on memorisation in language models.
Suggested readings:
Counterfactual memorization in neural language models (https://proceedings.neurips.cc/paper_files/paper/2023/file/7bc4f74e35bcfe8cfe43b0a860786d6a-Paper-Conference.pdf)
Quantifying memorization across neural language models (https://arxiv.org/pdf/2202.07646)
Series This talk is part of the Causal Inference Reading Group series.
Included in Lists
- All CMS events
- All Talks (aka the CURE list)
- bld31
- Causal Inference Reading Group
- CMS Events
- DPMMS info aggregator
- DPMMS lists
- DPMMS Lists
- Hanchen DaDaDash
- Interested Talks
- MR12, Centre for Mathematical Sciences, Wilberforce Road, Cambridge
- School of Physical Sciences
- Statistical Laboratory info aggregator
Note: Ex-directory lists are not shown.
![[Talks.cam]](/static/images/talkslogosmall.gif)


Friday 15 November 2024, 15:30-17:00