Statistical anaphora resolution in biomedical texts
- đ¤ Speaker: Caroline Gasperin, Computer Laboratory, University of Cambridge
- đ Date & Time: Friday 24 October 2008, 12:00 - 13:00
- đ Venue: SW01, Computer Laboratory
Abstract
“I will present my PhD work on anaphora resolution in biomedical texts. Biomedical literature has been the focus of relevant information extraction projects, and resolving anaphora is an important step in the identification of mentions of biomedical entities about which information could be extracted.
I propose a probabilistic model for the resolution of anaphora in biomedical texts. The model results from a simple decomposition process applied to a conditional probability equation that involves several parameters (features). The decomposition makes use of Bayes’ rule and independence assumptions, and aims to decrease the impact of data sparseness on the model. The model seeks to find the antecedents of anaphoric expressions, both coreferent and associative ones, and also to identify discourse-new expressions. The model is able to reach state-of-the art performance despite being trained on a small corpus; it achieves 55-69\ precision and 57-71\ recall on coreferent cases, and reasonable performance on different classes of associative cases.
I have created a corpus of 5 biomedical articles to train and evaluate the model. The corpus is annotated with anaphoric links between noun phrases referring to the biomedical entities of interest. Such noun phrases are typed according to a scheme that is based on the Sequence Ontology; it distinguishes 7 types of entities: gene, part of gene, product of gene, part of product, subtype of gene, supertype of gene and gene variant. This corpus is publicly available.”
Series This talk is part of the NLIP Seminar Series series.
Included in Lists
- All Talks (aka the CURE list)
- bld31
- Cambridge Centre for Data-Driven Discovery (C2D3)
- Cambridge Forum of Science and Humanities
- Cambridge Language Sciences
- Cambridge talks
- Chris Davis' list
- Computer Education Research
- Computing Education Research
- Department of Computer Science and Technology talks and seminars
- Graduate-Seminars
- Guy Emerson's list
- Interested Talks
- Language Sciences for Graduate Students
- ndk22's list
- NLIP Seminar Series
- ob366-ai4er
- PMRFPS's
- rp587
- School of Technology
- Simon Baker's List
- SW01, Computer Laboratory
- Trust & Technology Initiative - interesting events
- yk449
Note: Ex-directory lists are not shown.
![[Talks.cam]](/static/images/talkslogosmall.gif)

Caroline Gasperin, Computer Laboratory, University of Cambridge
Friday 24 October 2008, 12:00-13:00