Structured Offline Reinforcement Learning via Reward Filtering and Orthogonal Q-Contrasts
- đ¤ Speaker: Angela Zhou (University of Southern California)
- đ Date & Time: Tuesday 03 March 2026, 11:00 - 11:45
- đ Venue: Seminar Room 1, Newton Institute
Abstract
We study offline reinforcement learning under structural conditions where the dynamics may depend on many state variables, but optimal decisions depend only on a sparse, reward-relevant subset of the state. This “decision-theoretic sparsity” that optimal policy and value functions admit lower-dimensional structure, although full-state transition estimation can be difficult. First, we develop a reward-relevance-filtered approach for linear function approximation that modifies thresholded Lasso within least-squares policy evaluation and fitted Q-iteration to focus estimation on reward-relevant components. Second to improve robustness, we propose a structured difference-of-Q framework via orthogonal learning: a dynamic generalization of R-learning that targets Q-function contrasts sufficient for policy optimization, accommodates black-box nuisance estimators of Q and the behavior policy, and yields robust policy optimization guarantees under a margin condition. Together, these methods formalize and exploit reward-relevant structure to improve statistical efficiency and robustness in offline RL.
Series This talk is part of the Isaac Newton Institute Seminar Series series.
Included in Lists
- All CMS events
- bld31
- dh539
- Featured lists
- INI info aggregator
- Isaac Newton Institute Seminar Series
- School of Physical Sciences
- Seminar Room 1, Newton Institute
Note: Ex-directory lists are not shown.
![[Talks.cam]](/static/images/talkslogosmall.gif)

Angela Zhou (University of Southern California)
Tuesday 03 March 2026, 11:00-11:45