BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//Talks.cam//talks.cam.ac.uk//
X-WR-CALNAME:Talks.cam
BEGIN:VEVENT
SUMMARY:Bayesian Smoothing for Language Models - Yee Whye Teh\, University
  College London
DTSTART:20120127T120000Z
DTEND:20120127T130000Z
UID:TALK35059@talks.cam.ac.uk
CONTACT:Ekaterina Kochmar
DESCRIPTION:Smoothing is a central component of language modelling technol
 ogies. It attempts to improve probabilities estimated from language data b
 y shifting mass from high probability areas to low or zero probability are
 as\, thus "smoothing" the distribution.  Many smoothing techniques\nhave b
 een proposed in the past based on a variety of principles and empirical ob
 servations.\n\nIn this talk I will present a Bayesian statistical approach
  to smoothing.  By using a hierarchical Bayesian methodology to effectivel
 y share information across the different parts of the language model\, and
  by incorporating the prior knowledge that languages obey power-law behavi
 ours using Pitman-Yor processes\, we are\nable to construct language model
 s with state-of-the-art results.  Our approach also gives an interesting n
 ew interpretation of interpolated Kneser-Ney and why it works so well.  Fi
 nally\, we describe an extension of our model from finite n-grams to "infi
 nite-grams" which\nwe call the sequence memoizer.\n\nThis is joint work wi
 th Frank Wood\, Jan Gasthaus\, Cedric Archambeau and Lancelot James\, and 
 is based on work most recently reported in the Communications of the ACM (
 Feb 2011 issue).\n
LOCATION:FW26\, Computer Laboratory
END:VEVENT
END:VCALENDAR