BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//Talks.cam//talks.cam.ac.uk//
X-WR-CALNAME:Talks.cam
BEGIN:VEVENT
SUMMARY:Statistical anaphora resolution in biomedical texts - Caroline Gas
 perin\, Computer Laboratory\, University of Cambridge
DTSTART:20081024T110000Z
DTEND:20081024T120000Z
UID:TALK14797@talks.cam.ac.uk
CONTACT:Johanna Geiss
DESCRIPTION:"I will present my PhD work on anaphora resolution in biomedic
 al texts. Biomedical literature has been the focus of relevant information
  extraction projects\, and resolving anaphora is an important step in the 
 identification of mentions of biomedical entities about which information 
 could be extracted.\n\nI propose a probabilistic model for the resolution 
 of anaphora in biomedical texts. The model results from a simple decomposi
 tion process applied to a conditional probability equation that involves s
 everal parameters (features). The decomposition makes use of Bayes' rule a
 nd independence assumptions\, and aims to decrease the impact of data spar
 seness on the model. The model seeks to find the antecedents of anaphoric 
 expressions\, both coreferent and associative ones\, and also to identify 
 discourse-new expressions. The model is able to reach state-of-the art per
 formance despite being trained on a small corpus\; it achieves 55-69\\% pr
 ecision and 57-71\\% recall on coreferent cases\, and reasonable performan
 ce on different classes of associative cases.\n\nI have created a corpus o
 f 5 biomedical articles to train and evaluate the model. The corpus is ann
 otated with anaphoric links between noun phrases referring to the biomedic
 al entities of interest. Such noun phrases are typed according to a scheme
  that is based on the Sequence Ontology\; it distinguishes 7 types of enti
 ties: gene\, part of gene\, product of gene\, part of product\, subtype of
  gene\, supertype of gene and gene variant. This corpus is publicly availa
 ble." 
LOCATION:SW01\, Computer Laboratory
END:VEVENT
END:VCALENDAR
