BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//Talks.cam//talks.cam.ac.uk//
X-WR-CALNAME:Talks.cam
BEGIN:VEVENT
SUMMARY:Surprisingly Efficient Parsing for a Wide-Coverage Lexicalised-Gra
 mmar Parser - Stephen Clark and Yue Zhang - University of Cambridge
DTSTART:20110304T120000Z
DTEND:20110304T130000Z
UID:TALK29966@talks.cam.ac.uk
CONTACT:Thomas Lippincott
DESCRIPTION:In this talk we will describe two approaches to improving the 
 efficiency of a wide-coverage CCG parser. The C&C parser (Clark and Curran
 \, 2007) is already surprisingly efficient\, parsing at around 30 sentence
 s per second on standard hardware. This is surprising because the parser d
 oes not do any pruning at the parsing stage\, but builds a complete packed
  chart. The efficiency comes from the use of a linear-time supertagger\, w
 hich greatly reduces the search space\, and highly optimised C++.\n\nDespi
 te the use of the supertagger\, there is still a huge amount of ambiguity 
 left in the chart. The first approach to improving speed will be to perfor
 m some pruning on the chart. We investigate standard beam search for chart
 -parsing -- removing low-scoring items -- as well as a novel\, more aggres
 sive technique which removes complete cells from the chart. Both technique
 s result in significant speed-ups.\n\nThe second approach is a novel techn
 ique which involves self-training the supertagger on large amounts of pars
 er output. The speed of the parser is directly related to the number of su
 pertags (CCG lexical categories) supplied by the supertagger for each word
  on average. The insight behind this approach is to recognise that the sup
 ertagger can easily be trained to predict which supertags the parsing mode
 l will eventually choose\, resulting in a supertagger model which is much 
 more tightly integrated with the parser.
LOCATION:FW26\, Computer Laboratory
END:VEVENT
END:VCALENDAR