BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//Talks.cam//talks.cam.ac.uk//
X-WR-CALNAME:Talks.cam
BEGIN:VEVENT
SUMMARY:New Relative Value Iteration and Q-Learning Algorithms for Ergodic
  Risk Sensitive Control of Markov Chains - Guodong Pang (Rice University)
DTSTART:20251113T105000Z
DTEND:20251113T113000Z
UID:TALK238522@talks.cam.ac.uk
DESCRIPTION:In this talk\, we will present new Jacobi-like relative value 
 iteration (RVI) algorithms for the ergodic risk-sensitive control problem 
 of discrete-time Markov chains\, and the associated Q-learning algorithms.
  In the case of finite state space\, we prove the iterates of the new RVI 
 algorithms converge geometrically\, and in the case of countable state spa
 ce\, we prove the convergence of the appropriately truncated problem. We e
 mploy the entropy variational formula in order to tackle the multiplicativ
 e nature of the risk-sensitive Bellman operator\, albeit with an additiona
 l optimization problem over a corresponding set of probability vectors. We
  then discuss the entropy-based risk-sensitive Q-learning algorithms corre
 sponding to the existing and new Jacobi-like RVI algorithms. These Q-learn
 ing algorithms have two coupled components: the usual Q-function iterates 
 and the new probability iterates arising from the entropy-variational form
 ula. We prove the convergence of the coupled iterates by investigating the
  multi-scale stochastic approximations for these iterates.&nbsp\;
LOCATION:Seminar Room 1\, Newton Institute
END:VEVENT
END:VCALENDAR