Unsupervised Word Alignment and Part of Speech Induction with Undirected Models
- đ¤ Speaker: Chris Dyer, Carnegie Mellon University
- đ Date & Time: Friday 28 October 2011, 12:00 - 13:00
- đ Venue: FW26, Computer Laboratory
Abstract
This talk explores unsupervised learning in undirected graphical models for two problems in natural language processing. Undirected models can incorporate arbitrary, non-independent features computed over random variables, thereby overcoming the inherent limitation of directed models, which require that features factor according to the conditional independencies of an acyclic generative process. Using word alignment (finding lexical correspondences in parallel texts) and bilingual part-of-speech induction (jointly learning syntactic categories for two languages from parallel data) as case studies, we show that relaxing the acyclicity requirement lets us formulate more succinct models that make fewer counterintuitive independence assumptions. Experiments confirm that our undirected alignment model yields consistently better performance than directed model baselines, according to both intrinsic and extrinsic measures. With POS tagging, we find more tentative results. Analysis reveals that our parameter learner tends to get caught in shallow local optima corresponding to poor tagging solutions. Switching to an alternative learning objective (contrastive estimation; Smith and Eisner, 2005) improves the stability and performance, but it suggests that non-convex objectives may be a larger problem in undirected models than with directed models.
Series This talk is part of the NLIP Seminar Series series.
Included in Lists
- All Talks (aka the CURE list)
- bld31
- Cambridge Centre for Data-Driven Discovery (C2D3)
- Cambridge Forum of Science and Humanities
- Cambridge Language Sciences
- Cambridge talks
- Chris Davis' list
- Computer Education Research
- Computing Education Research
- Department of Computer Science and Technology talks and seminars
- FW26, Computer Laboratory
- Graduate-Seminars
- Guy Emerson's list
- Interested Talks
- Language Sciences for Graduate Students
- ndk22's list
- NLIP Seminar Series
- ob366-ai4er
- PMRFPS's
- rp587
- School of Technology
- Simon Baker's List
- Trust & Technology Initiative - interesting events
- yk449
Note: Ex-directory lists are not shown.
![[Talks.cam]](/static/images/talkslogosmall.gif)

Chris Dyer, Carnegie Mellon University
Friday 28 October 2011, 12:00-13:00