University of Cambridge > Talks.cam > Cambridge ML Systems Seminar Series > Learning Rate Schedules, Scaling Laws, and Techniques for Pretraining LLMs

Log in

Google

Microsoft

Information on

Subscribing to talks details

Finding a talk details

Adding a talk details

Disseminating talks details

Help and Documentation details

Learning Rate Schedules, Scaling Laws, and Techniques for Pretraining LLMs

Download to your calendar using vCal

Alex Hägele, EPFL
Tuesday 08 April 2025, 14:00-15:00
Computer Lab, FW26.

If you have a question about this talk, please contact Sally Matthews .

Large Language Model (LLM) pretraining relies on complex strategies for large-scale optimization, with the learning rate schedule being particularly important yet often following conventional rules. In this talk, I will discuss our recent NeurIPS Spotlight that investigates a simple but effective strategy: a constant learning rate followed by strategic cooldowns. Our analysis demonstrates that this approach does not only perform reliably, but it offers practical advantages as it does not require predetermined training lengths and easily allows continual training. Importantly, these findings enable more efficient scaling law experiments, as they allow for reuse of training runs and thereby substantially reduce compute and GPU hours. In a followup work, we investigate theoretical explanations of the unique behavior of such learning rate schedules, leveraging last-iterate convergence bounds which closely match real experiments. At the end of the talk, I will conclude by introducing the Swiss AI initiative (https://www.swiss-ai.org/) which deploys the world’s first national research infrastructure with 10,000 NVIDIA Grace Hopper GPUs. This initiative leverages our research innovations, such as the above, to develop state-of-the-art open and multilingual LLMs, with the goal of advancing fully transparent scientific research on foundation models.

Bio: Alex Hägele is a PhD Student at EPFL in the Machine Learning and Optimization group (MLO) supervised by Martin Jaggi. Currently, he is part of the inaugural Anthropic Fellowship for AI Safety research, based in London. Previously, he completed his BSc+MSc in Computer Science at ETH Z ürich and was a visiting Student Researcher at Apple MLR in Paris. His research explores scaling behavior and training of language models, spanning optimization, data, and architectures.

This talk is part of the Cambridge ML Systems Seminar Series series.

This talk is included in these lists:

Note that ex-directory lists are not shown.

Learning Rate Schedules, Scaling Laws, and Techniques for Pretraining LLMs

📅 Download to calendar (vCal)

👤 Speaker: Alex Hägele, EPFL
📅 Date & Time: Tuesday 08 April 2025, 14:00 - 15:00
📍 Venue: Computer Lab, FW26

Questions? Contact Sally Matthews

Abstract

Series This talk is part of the Cambridge ML Systems Seminar Series series.

Included in Lists

Note: Ex-directory lists are not shown.

Log in

🔐 Log In

Information on

ℹ️ Information

Learning Rate Schedules, Scaling Laws, and Techniques for Pretraining LLMs

This talk is included in these lists:

Learning Rate Schedules, Scaling Laws, and Techniques for Pretraining LLMs

Abstract

Included in Lists

Log in

🔐 Log In

Information on

ℹ️ Information

Learning Rate Schedules, Scaling Laws, and Techniques for Pretraining LLMs

This talk is included in these lists:

Other lists

Other talks

Learning Rate Schedules, Scaling Laws, and Techniques for Pretraining LLMs

Abstract

Included in Lists