BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//Talks.cam//talks.cam.ac.uk//
X-WR-CALNAME:Talks.cam
BEGIN:VEVENT
SUMMARY:A Polya Urn Document Language Model for Information Retrieval - Ro
 nan Cummins\, University of Cambridge
DTSTART:20141121T120000Z
DTEND:20141121T130000Z
UID:TALK56175@talks.cam.ac.uk
CONTACT:Tamara Polajnar
DESCRIPTION:Although the multinomial language model has been one of the mo
 st effective unigram models of information retrieval for over a decade\, i
 t does not model one important linguistic phenomenon relating to term-depe
 ndency\; namely the tendency of a term to repeat itself within a document 
 (i.e. word burstiness). \n\nIn this talk I will begin with a brief review 
 of language modelling as applied to information retrieval. I will then pre
 sent some work near completion in which we model document generation as a 
 random process with reinforcement (a multivariate Polya process) and devel
 op a Dirichlet compound multinomial language model that captures word burs
 tiness. I will show that the new reinforced language model can be computed
  as efficiently as current retrieval models and that it significantly outp
 erforms the multinomial model for a number of standard effectiveness metri
 cs. I will conclude by presenting an analysis of the retrieval method whic
 h shows that it adheres to what is called the "verbosity hypothesis" and w
 ill show that the method essentially combines the term and document event 
 spaces giving theoretical justification to tf-idf type schemes.
LOCATION:FW26\, Computer Laboratory
END:VEVENT
END:VCALENDAR
