BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//Talks.cam//talks.cam.ac.uk//
X-WR-CALNAME:Talks.cam
BEGIN:VEVENT
SUMMARY:Bayesian Best-Arm Identification - Rianne de Heide\, Machine Learn
 ing Group\, Centrum Wiskunde en Informatica (CWI)
DTSTART:20210312T130000Z
DTEND:20210312T140000Z
UID:TALK158104@talks.cam.ac.uk
CONTACT:96082
DESCRIPTION:In multi-armed bandits\, a learner repeatedly chooses an arm t
 o play\, and receives a reward from the associated unknown probability dis
 tribution. We study the task of best-arm identification (BAI)\, where the 
 learner is not only asked to sample an arm at each stage\, but is also ask
 ed to output a recommendation (i.e.\, a guess for the arm with the largest
  mean reward) after a certain period. Unlike in another well-studied bandi
 t setting\, the learner is not interested in maximising the sum of rewards
  gathered during the exploration (or minimising regret)\, but only cares a
 bout the quality of her recommendation. We investigate a Bayesian-flavoure
 d sampling rule called Top-Two Thompson sampling (TTTS). In particular\, w
 e justify its use for fixed-confidence BAI. We further propose a variant o
 f TTTS called Top-Two Transportation Cost (T3C)\, which disposes of the co
 mputational burden of TTTS. As our main contribution\, we provide the firs
 t sample complexity analysis of TTTS and T3C when coupled with a very natu
 ral Bayesian stopping rule\, for bandits with Gaussian rewards\, solving o
 ne of the open questions raised by Russo (2016). We also provide new poste
 rior convergence results for TTTS under two models that are commonly used 
 in practice: bandits with Gaussian and Bernoulli rewards and conjugate pri
 ors.
LOCATION:https://us02web.zoom.us/j/86285792868?pwd=UGJFeit5RVozOTdqUTdGeEF
 XNlk1Zz09
END:VEVENT
END:VCALENDAR