BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//Talks.cam//talks.cam.ac.uk//
X-WR-CALNAME:Talks.cam
BEGIN:VEVENT
SUMMARY:Variance in Policy Gradient methods and Learning Sequential Latent
  Variable Models - George Tucker\, Google Brain
DTSTART:20180704T100000Z
DTEND:20180704T110000Z
UID:TALK107542@talks.cam.ac.uk
CONTACT:39846
DESCRIPTION:I will discuss two efforts to improve learning in RL. In the f
 irst part\, I'll talk about our work towards understanding variance in pol
 icy gradient estimators. PPO and TRPO provide strong performance at the co
 st of requiring many on-policy samples\, which makes them challenging to u
 se in real-world applications. The high sample requirement arises from hig
 h variance gradient estimates. We explore where this variance comes from\,
  and how we can reduce it. \n\nSwitching gears\, in the second part\, I'll
  talk about learning models of the world\, which can simplify control by l
 ifting the problem to a lower dimensional embedding space. Three groups in
 dependently introduced the idea of using a particle filter to train highly
  flexible non-linear sequential latent variable models. A key deficiency w
 ith this work is that the training procedure cannot properly account for t
 emporal dependencies in the data because it uses the filtering distributio
 ns. We introduce learned tilting functions\, which allow us to control the
  target distributions sequential Monte Carlos passes through. In principle
 \, we can train everything jointly with a coherent objective. I'll discuss
  preliminary results and challenges that we have yet to resolve.\n\nBio:\n
 George Tucker is a researcher on the Google Brain team focusing on reinfor
 cement learning and sequence models. He received his PhD from MIT in Mathe
 matics and previously worked as researcher at Amazon in the speech group.
LOCATION:Engineering Department\, CBL Room BE-438.
END:VEVENT
END:VCALENDAR
