New Relative Value Iteration and Q-Learning Algorithms for Ergodic Risk Sensitive Control of Markov Chains
- đ¤ Speaker: Guodong Pang (Rice University)
- đ Date & Time: Thursday 13 November 2025, 10:50 - 11:30
- đ Venue: Seminar Room 1, Newton Institute
Abstract
In this talk, we will present new Jacobi-like relative value iteration (RVI) algorithms for the ergodic risk-sensitive control problem of discrete-time Markov chains, and the associated Q-learning algorithms. In the case of finite state space, we prove the iterates of the new RVI algorithms converge geometrically, and in the case of countable state space, we prove the convergence of the appropriately truncated problem. We employ the entropy variational formula in order to tackle the multiplicative nature of the risk-sensitive Bellman operator, albeit with an additional optimization problem over a corresponding set of probability vectors. We then discuss the entropy-based risk-sensitive Q-learning algorithms corresponding to the existing and new Jacobi-like RVI algorithms. These Q-learning algorithms have two coupled components: the usual Q-function iterates and the new probability iterates arising from the entropy-variational formula. We prove the convergence of the coupled iterates by investigating the multi-scale stochastic approximations for these iterates.
Series This talk is part of the Isaac Newton Institute Seminar Series series.
Included in Lists
- All CMS events
- bld31
- dh539
- Featured lists
- INI info aggregator
- Isaac Newton Institute Seminar Series
- School of Physical Sciences
- Seminar Room 1, Newton Institute
Note: Ex-directory lists are not shown.
![[Talks.cam]](/static/images/talkslogosmall.gif)

Guodong Pang (Rice University)
Thursday 13 November 2025, 10:50-11:30