BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//Talks.cam//talks.cam.ac.uk//
X-WR-CALNAME:Talks.cam
BEGIN:VEVENT
SUMMARY:Reinforcement Learning with Exogenous States and Rewards - Profess
 or Thomas G. Dietterich\, School of EECS\, Oregon State University
DTSTART:20260401T123000Z
DTEND:20260401T133000Z
UID:TALK243643@talks.cam.ac.uk
CONTACT:Kimberly Cole
DESCRIPTION:Exogenous state variables and rewards can slow reinforcement l
 earning by injecting uncontrolled variation into the reward signal. In thi
 s talk\, I’ll describe our work on formalizing exogenous state variables
  and rewards. Then I’ll discuss our main result: if the reward function 
 decomposes additively into endogenous and exogenous components\, the MDP c
 an be decomposed into an exogenous Markov Reward Process (based on the exo
 genous reward) and an endogenous Markov Decision Process (optimizing the e
 ndogenous reward). Any optimal policy for the endogenous MDP is also an op
 timal policy for the original MDP\, but because the endogenous reward typi
 cally has reduced variance\, the endogenous MDP is easier to solve. The se
 cond half of the talk will introduce two algorithms for causal discovery o
 f the exogenous subspace of the state space. Once discovered\, we can mode
 l the exogenous reward function and remove it from the MDP so that RL can 
 focus on the endogenous reward only. Experiments on a variety of challengi
 ng synthetic MDPs show that these methods\, applied online\, discover larg
 e exogenous state spaces and produce substantial speedups in reinforcement
  learning. (Joint work with George Trimponias (Intercom.io))\n\nThis will 
 be followed by a discussion from 2.30pm to 3pm about the future of researc
 h in the presence of automated AI/ML research:\n\nHow should we choose res
 earch topics to study (either for automation or topics that are not amenab
 le to automation)? How should the research be reported? What are the work 
 products? How should we evaluate automated research? How can we separate r
 eal research from fake imitations of research? How can we assimilate an ex
 ponentially-exploding number of research results?
LOCATION:Department of Engineering - LT1
END:VEVENT
END:VCALENDAR