BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//Talks.cam//talks.cam.ac.uk//
X-WR-CALNAME:Talks.cam
BEGIN:VEVENT
SUMMARY:Evolution Strategies at the Hyperscale - Bidipta Sarkar (Universit
 y of Oxford)
DTSTART:20260318T110000Z
DTEND:20260318T123000Z
UID:TALK245848@talks.cam.ac.uk
CONTACT:Xianda Sun
DESCRIPTION:Evolution Strategies (ES) is a class of powerful black-box opt
 imisation methods that are highly parallelisable and can handle non-differ
 entiable and noisy objectives. However\, naïve ES becomes prohibitively e
 xpensive at scale on GPUs due to the low arithmetic intensity of batched m
 atrix multiplications with unstructured random perturbations. We introduce
  Evolution Guided GeneRal Optimisation via Low-rank Learning (EGGROLL)\, w
 hich improves arithmetic intensity by structuring individual perturbations
  as rank-r matrices\, resulting in a hundredfold increase in training spee
 d for billion-parameter models at large population sizes\, achieving up to
  91% of the throughput of pure batch inference. We provide a rigorous theo
 retical analysis of Gaussian ES for high-dimensional parameter objectives\
 , investigating conditions needed for ES updates to converge in high dimen
 sions. Our results reveal a linearising effect\, and proving consistency b
 etween EGGROLL and ES as parameter dimension increases. Our experiments sh
 ow that EGGROLL: (1) enables the stable pretraining of nonlinear recurrent
  language models that operate purely in integer datatypes\, (2) is competi
 tive with GRPO for post-training LLMs on reasoning tasks\, and (3) does no
 t compromise performance compared to ES in tabula rasa RL settings\, despi
 te being faster. Our code is available at https://eshyperscale.github.io/.
 \n
LOCATION:Cambridge University Engineering Department\, CBL Seminar room BE
 4-38.
END:VEVENT
END:VCALENDAR
