BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//Talks.cam//talks.cam.ac.uk//
X-WR-CALNAME:Talks.cam
BEGIN:VEVENT
SUMMARY:Fastest Convergence for Reinforcement Learning  - Prof. Sean Meyn\
 , University of Florida
DTSTART:20181008T140000Z
DTEND:20181008T150000Z
UID:TALK112225@talks.cam.ac.uk
CONTACT:Prof. Ramji Venkataramanan
DESCRIPTION: \nThere are two well known Stochastic Approximation technique
 s that are known to have optimal rate of convergence (measured in terms of
  asymptotic variance): the Stochastic Newton-Raphson (SNR) algorithm [a ma
 trix gain algorithm that resembles the deterministic Newton-Raphson method
 ]\,  and the Ruppert-Polyak averaging technique.   This talk will present 
 new applications of these concepts for reinforcement learning.\n \n1.  Int
 roducing Zap Q-Learning.   In recent work\, first presented at NIPS 201\, 
  it is shown that a new formulation of SNR provides a new approach to Q-le
 arning that has provably optimal rate of convergence under general assumpt
 ions\,   and astonishingly quick convergence in numerical examples.  In pa
 rticular\,  the standard Q-learning algorithm of Watkins typically has inf
 inite asymptotic covariance\, and in simulations the Zap Q-Learning algori
 thm exhibits much faster convergence than the Ruppert-Polyak averaging met
 hod.   The only difficulty is the matrix inversion required in the SNR rec
 ursion. \n \n2. A remedy is proposed based on a variant of Polyak’s heav
 y-ball method.  For a special choice of the “momentum" gain sequence\, i
 t is shown that the parameter estimates obtained from the algorithm are es
 sentially identical to those obtained using SNR.   This new algorithm does
  not require matrix inversion.   In simulations it is found that the sampl
 e paths of the two algorithms couple.   A theoretical explanation for coup
 ling is established for linear recursions.\n\n*Biosketch*\n\nSean Meyn rec
 eived the BA degree in mathematics from the University of California\, Los
  Angeles\, in 1982 and the PhD degree in electrical engineering from McGil
 l University\, Canada\, in 1987 (with Prof. P. Caines). He is now Professo
 r and Robert C. Pittman Eminent Scholar Chair in the Department of Electri
 cal and Computer Engineering at the University of Florida\, the director o
 f the Laboratory for Cognition and Control\, and director of the Florida I
 nstitute for Sustainable Energy. His academic research interests include t
 heory and applications of decision and control\, stochastic processes\, an
 d optimization. He has received many awards for his research on these topi
 cs\, and is a fellow of the IEEE. He has held visiting positions at univer
 sities all over the world\, including the Indian Institute of Science\, Ba
 ngalore during 1997-1998 where he was a Fulbright Research Scholar. During
  his latest sabbatical during the 2006-2007 academic year he was a visitin
 g professor at MIT and United Technologies Research Center (UTRC). His awa
 rd-winning 1993 monograph with Richard Tweedie\, Markov Chains and Stochas
 tic Stability\, has been cited thousands of times in journals from a range
  of fields. The latest version is published in the Cambridge Mathematical 
 Library. For the past ten years his applied research has focused on engine
 ering\, markets\, and policy in energy systems. He regularly engages in in
 dustry\, government\, and academic panels on these topics\, and hosts an a
 nnual workshop at the University of Florida.\n
LOCATION:LT6\, Baker Building\, CUED
END:VEVENT
END:VCALENDAR
