University of Cambridge > Talks.cam > Isaac Newton Institute Seminar Series > Optimizing for the Long-Term Without Delay

Log in

University Account

External (via Google)

Information on

Subscribing to talks details

Finding a talk details

Adding a talk details

Disseminating talks details

Help and Documentation details

Optimizing for the Long-Term Without Delay

Download to your calendar using vCal

Kelly Zhang (Imperial College London)
Monday 10 November 2025, 16:30-17:10
Seminar Room 1, Newton Institute.

If you have a question about this talk, please contact nobody.

SCLW01 - Bridging Stochastic Control And Reinforcement Learning: Theories and Applications

Increasingly, recommender systems are tasked with improving users’ long-term satisfaction. In this context, we study a content exploration task, which we formalize as a bandit problem with delayed rewards. There is an apparent trade-off in choosing the learning signal: waiting for the full reward to become available might take several weeks, slowing the rate of learning, whereas using short-term proxy rewards reflects the actual long-term goal only imperfectly. First, we develop a predictive model of delayed rewards that incorporates all information obtained to date. Rewards as well as shorter-term surrogate outcomes are combined through a Bayesian filter to obtain a probabilistic belief. Second, we devise a bandit algorithm that quickly learns to identify content aligned with long-term success using this new predictive model. We prove a regret bound for our algorithm that depends on the Value of Progressive Feedback, an information theoretic metric that captures the quality of short-term leading indicators that are observed prior to the long-term reward. We apply our approach to a podcast recommendation problem, where we seek to recommend shows that users engage with repeatedly over two months. We empirically validate that our approach significantly outperforms methods that optimize for short-term proxies or rely solely on delayed rewards, as demonstrated by an A/B test in a recommendation system that serves hundreds of millions of users.

This talk is part of the Isaac Newton Institute Seminar Series series.

This talk is included in these lists:

Note that ex-directory lists are not shown.

Optimizing for the Long-Term Without Delay

📅 Download to calendar (vCal)

⚠️ Important: SCLW01 - Bridging Stochastic Control And Reinforcement Learning: Theories and Applications

👤 Speaker: Kelly Zhang (Imperial College London)
📅 Date & Time: Monday 10 November 2025, 16:30 - 17:10
📍 Venue: Seminar Room 1, Newton Institute

Questions? Contact the organiser

Abstract

Series This talk is part of the Isaac Newton Institute Seminar Series series.

Included in Lists

Note: Ex-directory lists are not shown.

Log in

🔐 Log In

Information on

ℹ️ Information

Optimizing for the Long-Term Without Delay

This talk is included in these lists:

Optimizing for the Long-Term Without Delay

Abstract

Included in Lists

Log in

🔐 Log In

Information on

ℹ️ Information

Optimizing for the Long-Term Without Delay

This talk is included in these lists:

Other lists

Other talks

Optimizing for the Long-Term Without Delay

Abstract

Included in Lists