University of Cambridge > Talks.cam > Isaac Newton Institute Seminar Series > Structured Offline Reinforcement Learning via Reward Filtering and Orthogonal Q-Contrasts

Structured Offline Reinforcement Learning via Reward Filtering and Orthogonal Q-Contrasts

Download to your calendar using vCal

If you have a question about this talk, please contact nobody.

CIFW02 - Causal identification and discovery

We study offline reinforcement learning under structural conditions where the dynamics may depend on many state variables, but optimal decisions depend only on a sparse, reward-relevant subset of the state. This “decision-theoretic sparsity” that optimal policy and value functions admit lower-dimensional structure, although full-state transition estimation can be difficult. First, we develop a reward-relevance-filtered approach for linear function approximation that modifies thresholded Lasso within least-squares policy evaluation and fitted Q-iteration to focus estimation on reward-relevant components. Second to improve robustness, we propose a structured difference-of-Q framework via orthogonal learning: a dynamic generalization of R-learning that targets Q-function contrasts sufficient for policy optimization, accommodates black-box nuisance estimators of Q and the behavior policy, and yields robust policy optimization guarantees under a margin condition. Together, these methods formalize and exploit reward-relevant structure to improve statistical efficiency and robustness in offline RL.

 

This talk is part of the Isaac Newton Institute Seminar Series series.

This talk is included in these lists:

Note that ex-directory lists are not shown.

 

Š 2006-2025 Talks.cam, University of Cambridge. Contact Us | Help and Documentation | Privacy and Publicity