BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//Talks.cam//talks.cam.ac.uk//
X-WR-CALNAME:Talks.cam
BEGIN:VEVENT
SUMMARY:Backprop through the Void: Optimizing Control Variates for Black-B
 ox Gradient Estimation. - Geoff Roeder (University of Toronto)
DTSTART:20171127T110000Z
DTEND:20171127T120000Z
UID:TALK95800@talks.cam.ac.uk
CONTACT:39846
DESCRIPTION:Gradient-based optimization is the foundation of deep learning
  and reinforcement learning. Even when the mechanism being optimized is un
 known or not differentiable\, optimization using high-variance or biased g
 radient estimates is still often the best strategy. We introduce a general
  framework for learning low-variance\, unbiased gradient estimators for bl
 ack-box functions of random variables. Our method uses gradients of a neur
 al network trained jointly with model parameters or policies\, and is appl
 icable in both discrete and continuous settings. We demonstrate this frame
 work for training discrete latent-variable models. We also give an unbiase
 d\, action-conditional extension of the advantage actor-critic reinforceme
 nt learning algorithm.
LOCATION:CBL Seminar Room
END:VEVENT
END:VCALENDAR