Contextual dependencies in unsupervised word segmentation
- đ¤ Speaker: Keith Vertanen (University of Cambridge)
- đ Date & Time: Monday 20 August 2007, 11:00 - 12:00
- đ Venue: TCM Seminar Room, Cavendish Laboratory, Department of Physics
Abstract
We will be discussing the paper “Contextual dependencies in unsupervised word segmentation” by Sharon Goldwater, Thomas L. Griffiths and Mark Johnson.
Available from: http://cocosci.berkeley.edu/tom/papers/wordseg1.pdf
Abstract: Developing better methods for segmenting continuous text into words is important for improving the processing of Asian languages, and may shed light on how humans learn to segment speech. We propose two new Bayesian word segmentation methods that assume unigram and bigram models of word dependencies respectively. The bigram model greatly outperforms the unigram model (and previous probabilistic models), demonstrating the importance of such dependencies for word segmentation. We also show that previous probabilistic models rely crucially on sub-optimal search procedures.
Series This talk is part of the Machine Learning Journal Club series.
Included in Lists
- Cambridge talks
- Guy Emerson's list
- Hanchen DaDaDash
- Inference Group Journal Clubs
- Inference Group Summary
- Interested Talks
- Machine Learning Journal Club
- Machine Learning Summary
- ML
- Quantum Matter Journal Club
- rp587
- TCM Seminar Room, Cavendish Laboratory, Department of Physics
- TQS Journal Clubs
- yk373's list
Note: Ex-directory lists are not shown.
![[Talks.cam]](/static/images/talkslogosmall.gif)


Monday 20 August 2007, 11:00-12:00