BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//Talks.cam//talks.cam.ac.uk//
X-WR-CALNAME:Talks.cam
BEGIN:VEVENT
SUMMARY:Natural Experiments in NLP and Where to Find Them - Pietro Lesci\,
  University of Cambridge 
DTSTART:20241115T153000Z
DTEND:20241115T170000Z
UID:TALK224398@talks.cam.ac.uk
CONTACT:Martina Scauda
DESCRIPTION:%{color:red}*Zoom Link available upon request*%\n\nIn training
  language models\, training choices—such as the random seed for data ord
 ering or the token vocabulary size—significantly influence model behavio
 ur. Answering counterfactual questions like “How would the model perform
  if this instance were excluded from training?” is computationally expen
 sive\, as it requires re-training the model. Once these training configura
 tions are set\, they become fixed\, creating a “natural experiment” wh
 ere modifying the experimental conditions incurs high computational costs.
  Using econometric techniques to estimate causal effects from observationa
 l studies enables us to analyse the impact of these choices without requir
 ing full experimental control or repeated model training. In this talk\, I
  will present our paper\, Causal Estimation of Memorisation Profiles (Best
  Paper Award at ACL 2024)\, which introduces a novel method based on the d
 ifference-in-differences technique from econometrics to estimate memorisat
 ion without requiring model re-training. I will also cover the necessary e
 conometric concepts and key literature on memorisation in language models.
 \n\nSuggested readings:\n\nCounterfactual memorization in neural language 
 models (https://proceedings.neurips.cc/paper_files/paper/2023/file/7bc4f74
 e35bcfe8cfe43b0a860786d6a-Paper-Conference.pdf)\n\nQuantifying memorizatio
 n across neural language models (https://arxiv.org/pdf/2202.07646)
LOCATION:MR12\, Centre for Mathematical Sciences\, Wilberforce Road\, Camb
 ridge
END:VEVENT
END:VCALENDAR