BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//Talks.cam//talks.cam.ac.uk//
X-WR-CALNAME:Talks.cam
BEGIN:VEVENT
SUMMARY:Learning Rate Schedules\, Scaling Laws\, and Techniques for Pretra
 ining LLMs - Alex Hägele\, EPFL
DTSTART:20250408T130000Z
DTEND:20250408T140000Z
UID:TALK229840@talks.cam.ac.uk
CONTACT:Sally Matthews
DESCRIPTION:Large Language Model (LLM) pretraining relies on complex strat
 egies for large-scale optimization\, with the learning rate schedule being
  particularly important yet often following conventional rules.\nIn this t
 alk\, I will discuss our recent NeurIPS Spotlight that investigates a simp
 le but effective strategy: a constant learning rate followed by strategic 
 cooldowns. Our analysis demonstrates that this approach does not only perf
 orm reliably\, but it offers practical advantages as it does not require p
 redetermined training lengths and easily allows continual training. Import
 antly\, these findings enable more efficient scaling law experiments\, as 
 they allow for reuse of training runs and thereby substantially reduce com
 pute and GPU hours. In a followup work\, we investigate theoretical explan
 ations of the unique behavior of such learning rate schedules\, leveraging
  last-iterate convergence bounds which closely match real experiments.\nAt
  the end of the talk\, I will conclude by introducing the Swiss AI initiat
 ive (https://www.swiss-ai.org/) which deploys the world's first national r
 esearch infrastructure with 10\,000 NVIDIA Grace Hopper GPUs. This initiat
 ive leverages our research innovations\, such as the above\, to develop st
 ate-of-the-art open and multilingual LLMs\, with the goal of advancing ful
 ly transparent scientific research on foundation models.\n\nBio: Alex Häg
 ele is a PhD Student at EPFL in the Machine Learning and Optimization grou
 p (MLO) supervised by Martin Jaggi. Currently\, he is part of the inaugura
 l Anthropic Fellowship for AI Safety research\, based in London. Previousl
 y\, he completed his BSc+MSc in Computer Science at ETH Zürich and was a 
 visiting Student Researcher at Apple MLR in Paris. His research explores s
 caling behavior and training of language models\, spanning optimization\, 
 data\, and architectures.
LOCATION:Computer Lab\, FW26
END:VEVENT
END:VCALENDAR
