BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//Talks.cam//talks.cam.ac.uk//
X-WR-CALNAME:Talks.cam
BEGIN:VEVENT
SUMMARY:A Graphical Approach to State Variable Selection in Off-policy Lea
 rning - Joakim Blach Andersen (Statistical Laboratory)
DTSTART:20250321T153000Z
DTEND:20250321T170000Z
UID:TALK229540@talks.cam.ac.uk
CONTACT:Martina Scauda
DESCRIPTION:Preprint available at: https://arxiv.org/abs/2501.00854\n\nSeq
 uential decision problems are widely studied across many areas of science.
  A key challenge when learning policies from historical data - a practice 
 commonly referred to as off-policy learning - is how to ``identify'' the i
 mpact of a policy of interest when the observed data are not randomized. O
 ff-policy learning has mainly been studied in two settings: dynamic treatm
 ent regimes (DTRs)\, where the focus is on controlling confounding in medi
 cal problems with short decision horizons\, and offline reinforcement lear
 ning (RL)\, where the focus is on dimension reduction in closed systems su
 ch as games. The gap between these two well studied settings has limited t
 he wider application of off-policy learning to many real-world problems. U
 sing the theory for causal inference based on acyclic directed mixed graph
  (ADMGs)\, we provide a set of graphical identification criteria in genera
 l decision processes that encompass both DTRs and MDPs. We discuss how our
  results relate to the often implicit causal assumptions made in the DTR a
 nd RL literatures and further clarify several common misconceptions. Final
 ly\, we present a realistic simulation study for the dynamic pricing probl
 em encountered in container logistics\, and demonstrate how violations of 
 our graphical criteria can lead to suboptimal policies.
LOCATION:MR12\,  Centre for Mathematical Sciences\, Wilberforce Road\, Cam
 bridge
END:VEVENT
END:VCALENDAR
