BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//Talks.cam//talks.cam.ac.uk//
X-WR-CALNAME:Talks.cam
BEGIN:VEVENT
SUMMARY:On-line Active Reward Learning for Policy Optimisation in Spoken D
 ialogue Systems - Pei-Hao Su (University of Cambridge)
DTSTART:20161125T120000Z
DTEND:20161125T130000Z
UID:TALK68783@talks.cam.ac.uk
CONTACT:Kris Cao
DESCRIPTION:The ability to compute an accurate reward function is essentia
 l for optimising a dialogue policy via reinforcement learning. In real-wor
 ld applications\, using explicit user feedback as the reward signal is oft
 en unreliable and costly to collect. This problem can be mitigated if the 
 user’s intent is known in advance or data is available to pre-train a ta
 sk success predictor off-line. In practice\, neither of these apply for mo
 st real world applications. In this talk\, a practical method to learn the
  dialogue system with the human user will be presented\, whereby the dialo
 gue policy is jointly trained alongside the reward model via active learni
 ng with a Gaussian process model. This Gaussian process operates on a cont
 inuous space dialogue representation generated in an unsupervised fashion 
 using a recurrent neural network encoder-decoder. The experimental results
  demonstrate that the proposed framework is able to significantly reduce d
 ata annotation costs and mitigate noisy user feedback to achieve truly on-
 line policy learning.\n
LOCATION:FW26\, Computer Laboratory
END:VEVENT
END:VCALENDAR
