BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//Talks.cam//talks.cam.ac.uk//
X-WR-CALNAME:Talks.cam
BEGIN:VEVENT
SUMMARY:Structured Offline Reinforcement Learning via Reward Filtering and
  Orthogonal Q-Contrasts - Angela Zhou (University of Southern California)
DTSTART:20260303T110000Z
DTEND:20260303T114500Z
UID:TALK244372@talks.cam.ac.uk
DESCRIPTION:\n\n\n\n\n\nWe study offline reinforcement learning under stru
 ctural conditions where the dynamics may depend on many state variables\, 
 but optimal decisions depend only on a sparse\, reward-relevant subset of 
 the state. This &ldquo\;decision-theoretic sparsity&rdquo\; that optimal p
 olicy and value functions admit lower-dimensional structure\, although ful
 l-state transition estimation can be difficult. First\, we develop a rewar
 d-relevance-filtered approach for linear function approximation that modif
 ies thresholded Lasso within least-squares policy evaluation and fitted Q-
 iteration to focus estimation on reward-relevant components. Second to imp
 rove robustness\, we propose a structured difference-of-Q framework via or
 thogonal learning: a dynamic generalization of R-learning that targets Q-f
 unction contrasts sufficient for policy optimization\, accommodates black-
 box nuisance estimators of Q and the behavior policy\, and yields robust p
 olicy optimization guarantees under a margin condition. Together\, these m
 ethods formalize and exploit reward-relevant structure to improve statisti
 cal efficiency and robustness in offline RL.\n\n\n\n\n&nbsp\;\n\n
LOCATION:Seminar Room 1\, Newton Institute
END:VEVENT
END:VCALENDAR