BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//Talks.cam//talks.cam.ac.uk//
X-WR-CALNAME:Talks.cam
BEGIN:VEVENT
SUMMARY:Bandits with Switching Costs: T^{2/3} Regret - Yuval Peres\, Micro
 soft Research Redmond
DTSTART:20140402T130000Z
DTEND:20140402T140000Z
UID:TALK51819@talks.cam.ac.uk
CONTACT:Microsoft Research Cambridge Talks Admins
DESCRIPTION:Consider the adversarial two-armed bandit problem in a setting
  where the player incurs a unit cost each time he switches actions. We pro
 ve that the player's  T-round regret in this setting (i.e.\, his excess lo
 ss compared to the better of the two actions)  is T^{2/3} (up to a log ter
 m). In the corresponding full-information problem\, the minimax regret is 
 known to grow at a slower rate of  T^{1/2} . The difference between these 
 two rates  indicates  that learning with bandit feedback (i.e. just knowin
 g the loss from the player's action\, not the alternative)  can be signifi
 cantly harder than learning with full-information feedback. It also shows 
 that without switching costs\, any regret-minimizing algorithm for the ban
 dit problem must sometimes switch actions very frequently. The proof is ba
 sed on an information-theoretic analysis of a loss process arising from a 
 multi-scale random walk.  \n\n(Joint work with Ofer Dekel\, Jian Ding and 
  Tomer Koren\, to appear in STOC 2014 available at http://arxiv.org/abs/13
 10.2997)
LOCATION:Auditorium\, Microsoft Research Ltd\, 21 Station Road\, Cambridge
 \, CB1 2FB
END:VEVENT
END:VCALENDAR
