BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//Talks.cam//talks.cam.ac.uk//
X-WR-CALNAME:Talks.cam
BEGIN:VEVENT
SUMMARY:Simple Reinforcement Learning Algorithms for Continuous State and 
 Action Space Systems - Prof. Rahul Jain\, University of Southern Californi
 a
DTSTART:20190617T110000Z
DTEND:20190617T120000Z
UID:TALK126373@talks.cam.ac.uk
CONTACT:Prof. Ramji Venkataramanan
DESCRIPTION:Reinforcement Learning (RL) problems for continuous state and 
 action space systems are quite challenging. Recently\, deep reinforcement 
 learning methods have been shown to be quite effective for certain RL prob
 lems in settings of very large/continuous state and action spaces. But suc
 h methods require extensive hyper-parameter tuning\, huge amount of data\,
  and come with no performance guarantees. We note that such methods are mo
 stly trained `offline’ on experience replay buffers. \n\nIn this talk\, 
 I will describe a series of simple reinforcement learning schemes for vari
 ous settings. Our premise is that we have access to a generative model tha
 t can give us simulated samples of the next state. We will start with fini
 te state and action space MDPs. An `empirical value learning’ (EVL) algo
 rithm can be derived quite simply by replacing the expectation in the Bell
 man operator with an empirical estimate.  We note that the EVL algorithm h
 as remarkably good numerical performance for practical purposes. We next e
 xtend this to continuous state spaces by considering randomized function a
 pproximation on a reproducible kernel Hilbert space (RKHS). This allows fo
 r arbitrarily good approximation with high probability for any problem due
  to its universal function approximation property. Next\, we consider cont
 inuous action spaces. In each iteration of EVL\, we sample actions from th
 e continuous action space\, and take a supremum over the sampled actions. 
 Under mild assumptions on the MDP\, we show that this performs quite well 
 numerically\, with provable performance guarantees. Finally\, we consider 
 the `Online-EVL’ algorithm that learns from a trajectory of state-action
 -reward sequence. Under mild mixing conditions on the trajectory\, we can 
 provide performance bounds and also show that it has competitive (and in f
 act marginally better) performance as compared to the Deep Q-Network algor
 ithm on a benchmark RL problem. I will conclude by a brief overview of the
  framework of probabilistic contraction analysis of iterated random operat
 ors that underpins the theoretical analysis. \n\nThis talk is based on wor
 k with a number of people including  Vivek Borkar (IIT Bombay)\, Peter Gly
 nn (Stanford)\, Abhishek Gupta (Ohio State)\, William Haskell (Purdue)\, D
 ileep Kalathil (Texas A&M)\, and Hiteshi Sharma (USC).
LOCATION:LR3A\, Inglis Building\, CUED
END:VEVENT
END:VCALENDAR
