BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//Talks.cam//talks.cam.ac.uk//
X-WR-CALNAME:Talks.cam
BEGIN:VEVENT
SUMMARY:Building and Understanding Human-scale Language Models - Aaron Mue
 ller (Boston University)
DTSTART:20251205T150000Z
DTEND:20251205T160000Z
UID:TALK235891@talks.cam.ac.uk
CONTACT:Suchir Salhan
DESCRIPTION:**Abstract:**\nHumans learn language from less than 100 millio
 n words. Today’s state-of-the-art language models are exposed to trillio
 ns of words. What do today’s human-scale language models learn—and wha
 t don’t they? How can we close this gap in data efficiency? In this talk
 \, I will start by presenting insights from 3 years of the BabyLM Challeng
 e. The purpose of BabyLM is to encourage researchers to train language mod
 els using only as much data as a human would need when first learning lang
 uage\, and to democratize access to language modeling research. Participan
 ts have submitted a wide variety of systems\; the most highly performing s
 ystems tend to come from innovations to the architecture of training objec
 tive. Then\, I will present recent work on the training dynamics of both h
 uman-scale and large-scale language models. I will present a method for un
 derstanding what concepts a model is learning at specific points in traini
 ng. Using subject-verb agreement as a case study\, I will show that simple
 r word-matching features are learned early in training\, while more abstra
 ct grammatical number detectors—including more abstract cross-linguistic
  number features—are learned far later in training. I will conclude by d
 iscussing the future of BabyLM\, and the future of interpretability as a t
 ool for understanding—and improving—language model training. \n\n**Bio
 :**\nAaron Mueller is an Assistant Professor (Lecturer) of Computer Scienc
 e (Informatics) and\, by courtesy\, of Data Science at Boston University. 
 His research centers on developing language modeling methods and evaluatio
 ns inspired by causal and linguistic principles\, and applying these to pr
 ecisely control and improve the generalization of computational models of 
 language. He completed his Ph.D. at Johns Hopkins University. His work has
  been published in ML and NLP venues (such as ICML\, ACL\, and EMNLP) and 
 has won awards at TMLR and ACL. He is a recurring organizer of the Blackbo
 xNLP and BabyLM workshops.
LOCATION:ONLINE ONLY. Google Meet Link: https://meet.google.com/sbr-mvua-f
 cc
END:VEVENT
END:VCALENDAR