BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//Talks.cam//talks.cam.ac.uk//
X-WR-CALNAME:Talks.cam
BEGIN:VEVENT
SUMMARY:Policy Evaluation with Temporal Differences - Christoph Dann (Tech
 nische Universität Darmstadt)
DTSTART:20140328T110000Z
DTEND:20140328T120000Z
UID:TALK51736@talks.cam.ac.uk
CONTACT:Zoubin Ghahramani
DESCRIPTION:Value functions play an essential role in many reinforcement l
 earning approaches. Research on policy evaluation\, the problem of estimat
 ing the value function from samples\, has been dominated since the late 19
 80s by temporal-difference (TD) methods due to their data-efficiency. Howe
 ver\, core issues such as stability in off-policy estimation have only bee
 n tackled recently\, which has led to a large number of new approaches.\n\
 nI first present a short overview of TD methods from a unifying optimizati
 on perspective and the results of my experimental comparison highlighting 
 the strengths and weaknesses of each approach. Furthermore\, I show a nove
 l variant of the least-squares TD learning (LSTD) algorithm for off-policy
  estimation that outperforms all previous approaches.\n\nMost TD methods r
 ely on a linear parametrization of the value function with a concise set o
 f features which limits their use on large-scale problems. In the final pa
 rt of the presentation\, I introduce my recent work on the incremental fea
 ture dependency discovery (iFDD) algorithm. This approach efficiently hand
 les large-scale problems with discrete state-spaces by automatically const
 ructing features during estimation.
LOCATION:Engineering Department\, CBL Room BE-438
END:VEVENT
END:VCALENDAR