A Polya Urn Document Language Model for Information Retrieval
- đ¤ Speaker: Ronan Cummins, University of Cambridge
- đ Date & Time: Friday 21 November 2014, 12:00 - 13:00
- đ Venue: FW26, Computer Laboratory
Abstract
Although the multinomial language model has been one of the most effective unigram models of information retrieval for over a decade, it does not model one important linguistic phenomenon relating to term-dependency; namely the tendency of a term to repeat itself within a document (i.e. word burstiness).
In this talk I will begin with a brief review of language modelling as applied to information retrieval. I will then present some work near completion in which we model document generation as a random process with reinforcement (a multivariate Polya process) and develop a Dirichlet compound multinomial language model that captures word burstiness. I will show that the new reinforced language model can be computed as efficiently as current retrieval models and that it significantly outperforms the multinomial model for a number of standard effectiveness metrics. I will conclude by presenting an analysis of the retrieval method which shows that it adheres to what is called the “verbosity hypothesis” and will show that the method essentially combines the term and document event spaces giving theoretical justification to tf-idf type schemes.
Series This talk is part of the NLIP Seminar Series series.
Included in Lists
- All Talks (aka the CURE list)
- bld31
- Cambridge Centre for Data-Driven Discovery (C2D3)
- Cambridge Forum of Science and Humanities
- Cambridge Language Sciences
- Cambridge talks
- Chris Davis' list
- Computer Education Research
- Computing Education Research
- Department of Computer Science and Technology talks and seminars
- FW26, Computer Laboratory
- Graduate-Seminars
- Guy Emerson's list
- Interested Talks
- Language Sciences for Graduate Students
- ndk22's list
- NLIP Seminar Series
- ob366-ai4er
- PMRFPS's
- rp587
- School of Technology
- Simon Baker's List
- Trust & Technology Initiative - interesting events
- yk449
Note: Ex-directory lists are not shown.
![[Talks.cam]](/static/images/talkslogosmall.gif)


Friday 21 November 2014, 12:00-13:00