BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//Talks.cam//talks.cam.ac.uk//
X-WR-CALNAME:Talks.cam
BEGIN:VEVENT
SUMMARY:Formal symbolic models for LLMs: pretraining\, evaluation and post
 -training - Prof. Tal Linzen (NYU &amp\; Google)
DTSTART:20260312T150000Z
DTEND:20260312T160000Z
UID:TALK243604@talks.cam.ac.uk
CONTACT:Lucas Resck
DESCRIPTION:Abstract: Formal symbolic models—ideal versions of the compu
 tational problems involved in learning about and interacting with the worl
 d—are central in linguistics and cognitive science. These models make it
  possible to generate unlimited amounts of synthetic data of variable comp
 lexity\, as well as define verifiably correct outcomes for each instance o
 f the problem. I will discuss three studies that leverage these properties
  of formal models. First\, I will show that by pretraining transformer LLM
 s on formal languages before training them on natural languages\, we can m
 ake training both more compute-efficient and more data-efficient overall. 
 Second\, I will introduce context-free language recognition as an evaluati
 on task for LLM. I will show that the complexity of the grammar reliably p
 redicts the model’s accuracy on this task\, and that even the strongest 
 reasoning models available struggle to perform this task as the complexity
  of the language increases. Finally\, I will demonstrate how symbolic Baye
 sian models can be used to evaluate and improve LLM abilities to update th
 eir probabilistic beliefs when interacting with users.\n\nBio: Tal Linzen 
 is an Associate Professor of Linguistics and Data Science at New York Univ
 ersity and a Staff Research Scientist at Google. He studies the connection
 s between machine learning and human language comprehension and acquisitio
 n\, as well as cognitively motivated approaches for language model evaluat
 ion\, post-training\, and interpretability.
LOCATION:https://cam-ac-uk.zoom.us/j/86890624365?pwd=oYGWpY7d5r3JOaUCaJXTD
 0sRECFxab.1
END:VEVENT
END:VCALENDAR