A Hierarchical Bayesian Language Model based on Pitman-Yor Processes
- 👤 Speaker: Hanna Wallach, University of Cambridge
- 📅 Date & Time: Thursday 24 August 2006, 10:00 - 11:00
- 📍 Venue: Room 911, Rutherford Building, Cavendish Laboratory, Department of Physics
Abstract
Paper (also Tech. report)
We propose a new hierarchical Bayesian n-gram model of natural languages. Our model makes use of a generalization of the commonly used Dirichlet distributions called Pitman-Yor processes which produce power-law distributions more closely resembling those in natural languages. We show that an approximation to the hierarchical Pitman-Yor language model recovers the exact formulation of interpolated Kneser-Ney, one of the best smoothing methods for n-gram language models. Experiments verify that our model gives cross entropy results superior to interpolated Kneser-Ney and comparable to modified Kneser-Ney.
Series This talk is part of the Machine Learning Journal Club series.
Included in Lists
- Cambridge talks
- Guy Emerson's list
- Hanchen DaDaDash
- Inference Group Journal Clubs
- Inference Group Summary
- Interested Talks
- Machine Learning Journal Club
- Machine Learning Summary
- ML
- Quantum Matter Journal Club
- Room 911, Rutherford Building, Cavendish Laboratory, Department of Physics
- rp587
- TQS Journal Clubs
- yk373's list
Note: Ex-directory lists are not shown.
![[Talks.cam]](/static/images/talkslogosmall.gif)


Thursday 24 August 2006, 10:00-11:00