BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//Talks.cam//talks.cam.ac.uk//
X-WR-CALNAME:Talks.cam
BEGIN:VEVENT
SUMMARY:Regret Bounds for Gaussian Process Bandit Problems - Steffen Grune
 walder (University College London)
DTSTART:20101112T160000Z
DTEND:20101112T170000Z
UID:TALK26723@talks.cam.ac.uk
CONTACT:Richard Nickl
DESCRIPTION:\nBandit algorithms are concerned with trading exploration wit
 h exploitation\nwhere a number of options are available but we can only le
 arn their quality by\nexperimenting with them. I address the scenario in w
 hich the reward distribution for\narms is modeled by a Gaussian process an
 d there is no noise in the observed reward. My\nmain result is a bound on 
 the regret experienced by algorithms relative to the a\nposteriori optimal
  strategy of playing the best arm throughout based on benign\nassumptions 
 about the covariance function defining the Gaussian process. One of the ke
 y\ntools to derive the bounds is the Dudley integral which allows to bound
  the expected\nsupremum of a Gaussian Process. The upper bounds are comple
 mented with corresponding\nlower bounds for particular covariance function
 s demonstrating that in general there is\nat most a logarithmic looseness 
 in the upper bounds. The bounds are in a sense quite\ncrude as they are ba
 sed on an algorithm that samples the whole space uniformly for the\nmaximu
 m. A better approach would be to consider an algorithm that pays more atte
 ntion to\n"high reward plateaus" and study its regret. I will discuss this
  approach and the\ntechnical problems associated with it in an outlook.\n\
 n\nhttp://www.cs.ucl.ac.uk/people/S.Grunewalder.html
LOCATION:MR12\, CMS\, Wilberforce Road\, Cambridge\, CB3 0WB
END:VEVENT
END:VCALENDAR
