Learning hierarchical structure: strong learning of PCFGs
- đ¤ Speaker: Alexander Clark, King's College London
- đ Date & Time: Friday 01 June 2018, 12:00 - 13:00
- đ Venue: FW26, Computer Laboratory
Abstract
A central problem in language acquisition and NLP is the acquisition of hierarchically structured syntactic representations. The simplest model of this is strong learning of context-free grammars from strings: where the learner is required to recover isomorphic labeled parse trees while observing only unstructured strings of symbols. Recently developed algorithms for learning context-free grammars from strings typically use a rather unrealistic learning model which only guarantees asymptotic weak convergence using membership queries (the ability to test whether a given string is in the language or not). However the classes of languages that can be learned are quite large. Probabilistic variants of these algorithms have also been developed with PAC style guarantees for some smaller classes.
Here we convert the simplest of these algorithms—a primal learning algorithm called a 1-finite kernel learner—into a practical probabilistic learning algorithm that can learn the structure of probabilistic context-free grammars from strings. This involves converting a logical notion of the validity of a production in the grammar to its probabilistic equivalent. We combine this approach with some standard machine learning techniques: non-negative matrix factorisation, and the Inside-Outside algorithm.
We experiment using synthetic data restricting ourselves to CFGs that satisfy three reasonable conditions: being in Chomsky normal form; having many more terminals (words) than nonterminals (syntactic categories) and not being excessively ambiguous. We observe that on nearly all grammars that satisfy these conditions the learner approaches the theoretically optimal performance using standard parsing metrics. This work suggests that syntactic structure can be learned just from strings and more generally that the unrealistic learning models used in recent theoretical work can be converted quite easily into practical algorithms.
Series This talk is part of the NLIP Seminar Series series.
Included in Lists
- All Talks (aka the CURE list)
- bld31
- Cambridge Centre for Data-Driven Discovery (C2D3)
- Cambridge Forum of Science and Humanities
- Cambridge Language Sciences
- Cambridge talks
- Chris Davis' list
- Computer Education Research
- Computing Education Research
- Department of Computer Science and Technology talks and seminars
- FW26, Computer Laboratory
- Graduate-Seminars
- Guy Emerson's list
- Interested Talks
- Language Sciences for Graduate Students
- ndk22's list
- NLIP Seminar Series
- ob366-ai4er
- PMRFPS's
- rp587
- School of Technology
- Simon Baker's List
- Trust & Technology Initiative - interesting events
- yk449
Note: Ex-directory lists are not shown.
![[Talks.cam]](/static/images/talkslogosmall.gif)

Alexander Clark, King's College London
Friday 01 June 2018, 12:00-13:00