BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//Talks.cam//talks.cam.ac.uk//
X-WR-CALNAME:Talks.cam
BEGIN:VEVENT
SUMMARY:DiCE: The Infinitely Differentiable Monte-Carlo Estimator - Jakob 
 Foerster\, University of Oxford
DTSTART:20180619T120000Z
DTEND:20180619T130000Z
UID:TALK107470@talks.cam.ac.uk
CONTACT:Microsoft Research Cambridge Talks Admins
DESCRIPTION:The score function estimator is widely used for estimating gra
 dients of stochastic objectives in Stochastic Computation Graphs (SCG)\, e
 g. in reinforcement learning and meta-learning. While deriving the first-o
 rder gradient estimators by differentiating a surrogate loss (SL) objectiv
 e is computationally and conceptually simple\, using the same approach for
  higher-order gradients is more challenging. Firstly\, analytically derivi
 ng and implementing such estimators is laborious and not compliant with au
 tomatic differentiation. Secondly\, repeatedly applying SL to construct ne
 w objectives for each order gradient involves increasingly cumbersome grap
 h manipulations. Lastly\, to match the first-order gradient under differen
 tiation\, SL treats part of the cost as a fixed sample\, which we show lea
 ds to missing and wrong terms for higher-order gradient estimators. To add
 ress all these shortcomings in a unified way\, we introduce DiCE\, which p
 rovides a single objective that can be differentiated repeatedly\, generat
 ing correct gradient estimators of any order in SCGs. Unlike SL\, DiCE rel
 ies on automatic differentiation for performing the requisite graph manipu
 lations. We verify the correctness of DiCE both through a proof and throug
 h numerical evaluation of the DiCE gradient estimates. We also use DiCE to
  propose and evaluate a novel approach for multi-agent learning. Our code 
 is available at this URL
LOCATION:Auditorium\, Microsoft Research Ltd\, 21 Station Road\, Cambridge
 \, CB1 2FB
END:VEVENT
END:VCALENDAR