Annotating Genericity: How Do Humans Decide?- A Case Study in Ontology Extraction
- đ¤ Speaker: Aurelie Herbelot, Computer Laboratory, University of Cambridge
- đ Date & Time: Friday 25 January 2008, 12:00 - 13:00
- đ Venue: SW01 Computer Laboratory
Abstract
This talk deals with the identification of kind versus non-kind entities in natural language text for ontology extraction. The following two sentences illustrate the relevance of obtaining genericity annotations for the creation of ontologies. —the whale is a mammal—the whale rescued the scuba diver. Given this input, an ontology extraction system would typically output the relationships ‘whale—is_a—mammal’ and ‘whale—rescue—scuba diver’. When inserted as such in an real-world ontology, these relations may give the user the false impression that ‘one general feature of whales is that they rescue scuba divers.’ In order to prevent this reading, it is necessary to tag the first whale with a generic label and the second with a specific label.
The task of genericity annotation using machine learning relies on a training corpus. Available corpora, however, are limited in the genres they cover and more importantly in the range of labels that they use to describe the genericity phenomenon. The public annotation schemes linked to those corpora are also often simplified and/or domain-specific. With the view of producing our own training corpus, we propose here an annotation scheme that covers the kind versus object distinction, the specificity phenomenon and reference resolution. The scheme is not domain-specific and produced, over a small test set from the British National Corpus, an inter-annotator agreement of Kappa = 0.74.
We will discuss the scheme, our choice of labels, and the various problems attached to the manual annotation of genericity. In particular, we will show the importance of reference resolution for accurate annotation.
Series This talk is part of the NLIP Seminar Series series.
Included in Lists
- All Talks (aka the CURE list)
- bld31
- Cambridge Centre for Data-Driven Discovery (C2D3)
- Cambridge Forum of Science and Humanities
- Cambridge Language Sciences
- Cambridge talks
- Chris Davis' list
- Computer Education Research
- Computing Education Research
- Department of Computer Science and Technology talks and seminars
- Graduate-Seminars
- Guy Emerson's list
- Interested Talks
- Language Sciences for Graduate Students
- ndk22's list
- NLIP Seminar Series
- ob366-ai4er
- PMRFPS's
- rp587
- School of Technology
- Simon Baker's List
- SW01 Computer Laboratory
- Trust & Technology Initiative - interesting events
- yk449
Note: Ex-directory lists are not shown.
![[Talks.cam]](/static/images/talkslogosmall.gif)

Aurelie Herbelot, Computer Laboratory, University of Cambridge
Friday 25 January 2008, 12:00-13:00