Fastest Convergence for Reinforcement Learning
- 👤 Speaker: Prof. Sean Meyn, University of Florida
- 📅 Date & Time: Monday 08 October 2018, 15:00 - 16:00
- 📍 Venue: LT6, Baker Building, CUED
Abstract
There are two well known Stochastic Approximation techniques that are known to have optimal rate of convergence (measured in terms of asymptotic variance): the Stochastic Newton-Raphson (SNR) algorithm [a matrix gain algorithm that resembles the deterministic Newton-Raphson method], and the Ruppert-Polyak averaging technique. This talk will present new applications of these concepts for reinforcement learning.
1. Introducing Zap Q-Learning. In recent work, first presented at NIPS 201 , it is shown that a new formulation of SNR provides a new approach to Q-learning that has provably optimal rate of convergence under general assumptions, and astonishingly quick convergence in numerical examples. In particular, the standard Q-learning algorithm of Watkins typically has infinite asymptotic covariance, and in simulations the Zap Q-Learning algorithm exhibits much faster convergence than the Ruppert-Polyak averaging method. The only difficulty is the matrix inversion required in the SNR recursion.
2. A remedy is proposed based on a variant of Polyak’s heavy-ball method. For a special choice of the “momentum” gain sequence, it is shown that the parameter estimates obtained from the algorithm are essentially identical to those obtained using SNR . This new algorithm does not require matrix inversion. In simulations it is found that the sample paths of the two algorithms couple. A theoretical explanation for coupling is established for linear recursions.
Biosketch
Sean Meyn received the BA degree in mathematics from the University of California, Los Angeles, in 1982 and the PhD degree in electrical engineering from McGill University, Canada, in 1987 (with Prof. P. Caines). He is now Professor and Robert C. Pittman Eminent Scholar Chair in the Department of Electrical and Computer Engineering at the University of Florida, the director of the Laboratory for Cognition and Control, and director of the Florida Institute for Sustainable Energy. His academic research interests include theory and applications of decision and control, stochastic processes, and optimization. He has received many awards for his research on these topics, and is a fellow of the IEEE . He has held visiting positions at universities all over the world, including the Indian Institute of Science, Bangalore during 1997-1998 where he was a Fulbright Research Scholar. During his latest sabbatical during the 2006-2007 academic year he was a visiting professor at MIT and United Technologies Research Center (UTRC). His award-winning 1993 monograph with Richard Tweedie, Markov Chains and Stochastic Stability, has been cited thousands of times in journals from a range of fields. The latest version is published in the Cambridge Mathematical Library. For the past ten years his applied research has focused on engineering, markets, and policy in energy systems. He regularly engages in industry, government, and academic panels on these topics, and hosts an annual workshop at the University of Florida.
Series This talk is part of the Probabilistic Systems, Information, and Inference Group Seminars series.
Included in Lists
- All Talks (aka the CURE list)
- bld31
- Cambridge Centre for Data-Driven Discovery (C2D3)
- Cambridge talks
- Cambridge University Engineering Department Talks
- Centre for Smart Infrastructure & Construction
- Chris Davis' list
- Computational Continuum Mechanics Group Seminars
- Featured lists
- Information Engineering Division seminar list
- Interested Talks
- LT6, Baker Building, CUED
- ndk22's list
- ob366-ai4er
- Probabilistic Systems, Information, and Inference Group Seminars
- rp587
- School of Technology
- Trust & Technology Initiative - interesting events
- yk449
Note: Ex-directory lists are not shown.
![[Talks.cam]](/static/images/talkslogosmall.gif)

Prof. Sean Meyn, University of Florida
Monday 08 October 2018, 15:00-16:00