BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//Talks.cam//talks.cam.ac.uk//
X-WR-CALNAME:Talks.cam
BEGIN:VEVENT
SUMMARY:Quantized Q-Learning for Stochastic Control with Borel Spaces and 
 General Information Structures - Serdar Yuksel (Queen's University\, Canad
 a)
DTSTART:20251113T090000Z
DTEND:20251113T094000Z
UID:TALK238513@talks.cam.ac.uk
DESCRIPTION:Reinforcement learning algorithms often require finiteness of 
 state and action spaces in Markov decision processes. In this presentation
 \, we show that under mild regularity conditions (in particular\, involvin
 g only weak continuity or Wasserstein continuity of the transition kernel 
 of an MDP)\, Qlearning for standard Borel MDPs via quantization of states 
 and actions (called Quantized Q-Learning) converges to a limit under mild 
 ergodicity conditions\, and furthermore this limit satisfies an optimality
  equation which leads to near optimality with either explicit performance 
 bounds or which are guaranteed to be asymptotically optimal. Our approach 
 builds on (i) near-optimality of finite state model approximations for MDP
 s with weakly continuous kernels\, and (ii) convergence of quantized Q-lea
 rning to a limit which corresponds to the fixed point of a constructed app
 roximate finite MDP which depends on the exploration policy used during le
 arning. This result also implies near optimality of empirical model learni
 ng where one fits a finite MDP model to data as an alternative to quantize
 d Q-learning\, for which we also obtain sample complexity bounds. Thus\, w
 e present a general rigorous convergence and near optimality result for th
 e applicability of Q-learning and model learning for continuous MDPs. Our 
 analysis applies also to problems with non-compact state spaces via non-un
 iform quantization with convergence bounds\, to non-Markovian stochastic c
 ontrol problems which can be lifted to measure-valued MDPs under appropria
 te topologies (as in POMDPs and decentralized stochastic control)\, and co
 ntrolled diffusions via time-discretization. [Joint work with Ali Kara\, E
 mre Demirci\, Omar Mrani-Zentar\, Naci Saldi\, and Somnath Pradhan]&nbsp\;
LOCATION:Seminar Room 1\, Newton Institute
END:VEVENT
END:VCALENDAR
