BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//Talks.cam//talks.cam.ac.uk//
X-WR-CALNAME:Talks.cam
BEGIN:VEVENT
SUMMARY:Adapting a WSJ-trained Lexicalized-Grammar Parser to New Domains -
  Laura Rimell\, Oxford University
DTSTART:20081114T120000Z
DTEND:20081114T130000Z
UID:TALK14799@talks.cam.ac.uk
CONTACT:Johanna Geiss
DESCRIPTION:In this talk I will describe some experiments on adapting the 
 C&C CCG parser to new domains. The parser was originally developed using C
 CGbank\, the CCG version of the Penn Treebank\, and is therefore tuned to 
 newspaper text. The two new domains we consider are (1) biomedical abstrac
 ts and (2) questions for a QA system (using the term "domain" somewhat loo
 sely in the latter case).\n\nThe porting approach we use is to train the p
 arser at lower levels of representation than full syntactic derivations. T
 he lexicalized nature of CCG (in which words are assigned syntactic catego
 ries that include subcategorization information) makes it possible to use 
 a level of representation intermediate between POS tags and full derivatio
 ns. For the biomedical data\, we find that simply retraining the POS tagge
 r leads to a large improvement in performance\, and that using annotated d
 ata at the intermediate CCG lexical category level improves parsing accura
 cy further. A similar result is obtained for the question data\, but the i
 mpact of retraining at the CCG lexical category level is much greater. We 
 suggest that this is because the syntax of questions differs more from tha
 t of newspaper text than does the syntax of biomedical sentences\, and we 
 discuss some measures supporting this idea.\n\nThe parsing accuracies obta
 ined for both biomedical and question data are in the same range as those 
 reported for newspaper text\, and higher than those previously reported fo
 r the biomedical domain on the same evaluation resource.  The conclusion i
 s that porting newspaper-trained parsers to new domains may not be as diff
 icult as first thought (at least for parsers which use lexicalized grammar
 s)\, but we note that different levels of representation may have differen
 t impacts on the porting process\, depending on the characteristics of the
  target domain.
LOCATION:SW01\, Computer Laboratory
END:VEVENT
END:VCALENDAR
