BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//Talks.cam//talks.cam.ac.uk//
X-WR-CALNAME:Talks.cam
BEGIN:VEVENT
SUMMARY:Contextual Bilevel Reinforcement Learning for Incentive Alignment 
 - Yifan Hu (Rutgers\, The State University of New Jersey)
DTSTART:20251113T113000Z
DTEND:20251113T121000Z
UID:TALK238525@talks.cam.ac.uk
DESCRIPTION:The optimal policy in various real-world strategic decision-ma
 king problems depends both on the environmental configuration and exogenou
 s events. For these settings\, we introduce Contextual Bilevel Reinforceme
 nt Learning (CB-RL)\, a stochastic bilevel decision-making model\, where t
 he lower level consists of solving a contextual Markov Decision Process (C
 MDP). CB-RL can be viewed as a Stackelberg Game where the leader and a ran
 dom context beyond the leader&rsquo\;s control together decide the setup o
 f many MDPs that potentially multiple followers best respond to. This fram
 ework extends beyond traditional bilevel optimization and finds relevance 
 in diverse fields such as RLHF\, tax design\, reward shaping\, contract th
 eory and mechanism design. We propose a stochastic Hyper Policy Gradient D
 escent (HPGD) algorithm to solve CB-RL\, and demonstrate its convergence. 
 Notably\, HPGD uses stochastic hypergradient estimates\, based on observat
 ions of the followers&rsquo\; trajectories. Therefore\, it allows follower
 s to use any training procedure and the leader to be agnostic of the speci
 fic algorithm\, which aligns with various real-world scenarios. We further
  consider the setting when the leader can influence the training of follow
 ers and propose an accelerated algorithm. We empirically demonstrate the p
 erformance of our algorithm for reward shaping and tax design.&nbsp\;
LOCATION:Seminar Room 1\, Newton Institute
END:VEVENT
END:VCALENDAR
