BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//Talks.cam//talks.cam.ac.uk//
X-WR-CALNAME:Talks.cam
BEGIN:VEVENT
SUMMARY:Meta Flow Maps enable scalable reward alignment - Yee Whye Teh (Ox
 ford/Deepmind)
DTSTART:20260306T140000Z
DTEND:20260306T150000Z
UID:TALK244315@talks.cam.ac.uk
CONTACT:Po-Ling Loh
DESCRIPTION:Controlling generative models is computationally expensive. Th
 is is because optimal alignment with a reward function--whether via infere
 nce-time steering or fine-tuning--requires estimating the value function. 
 This task demands access to the conditional posterior p1|t(x1|xt)\, the di
 stribution of clean data x1 consistent with an intermediate state xt\, a r
 equirement that typically compels methods to resort to costly trajectory s
 imulations. To address this bottleneck\, we introduce Meta Flow Maps (MFMs
 )\, a framework extending consistency models and flow maps into the stocha
 stic regime. MFMs are trained to perform stochastic one-step posterior sam
 pling\, generating arbitrarily many i.i.d. draws of clean data x1 from any
  intermediate state. Crucially\, these samples provide a differentiable re
 parametrization that unlocks efficient value function estimation. We lever
 age this capability to solve bottlenecks in both paradigms: enabling infer
 ence-time steering without inner rollouts\, and facilitating unbiased\, of
 f-policy fine-tuning to general rewards. Empirically\, our single-particle
  steered-MFM sampler outperforms a Best-of-1000 baseline on ImageNet acros
 s multiple rewards at a fraction of the compute.\n\nArXiv manuscript: http
 s://arxiv.org/abs/2601.14430
LOCATION:MR12\, Centre for Mathematical Sciences
END:VEVENT
END:VCALENDAR
