BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//Talks.cam//talks.cam.ac.uk//
X-WR-CALNAME:Talks.cam
BEGIN:VEVENT
SUMMARY:Machine Learning Applications / Challenges in Natural Language Par
 sing - Ted Briscoe\, Computer Laboratory
DTSTART:20071115T160000Z
DTEND:20071115T180000Z
UID:TALK8211@talks.cam.ac.uk
CONTACT:Zoubin Ghahramani
DESCRIPTION:A decade or so ago\, the consensus was that full syntactic par
 sing (i.e. recovering all the grammatical relations between words in a sen
 tence) was too brittle to be viable. Data-driven approaches building on la
 rge treebanks have changed this\, and today full parsers\nare being deploy
 ed in applications such as information extraction.\n\nI'll describe the pa
 rsing task\, a standard intrinsic evaluation scheme\, and two state-of-the
 -art contenders: our RASP system and Clark and Curran's CCG parser. The la
 tter relies heavily on fully supervised training to estimate both configur
 ational and (bi)lexical parameters to resolve syntactic ambiguity\, which 
 makes it more accurate on in-domain test data (financial news) but harder 
 to move to a new domain (e.g. biomedical scientific papers). I'll describe
  recent work on\nsemi-supervised training / bootstrapping of RASP\, which 
 relies a lot less on large in-domain treebanks\, and the consequent applic
 ations and challenges for machine learning.\n\nTo acquire configurational 
 parameter estimates for RASP\, we used self-training over partially bracke
 ted input\, bootstrapping an initial model from the unambiguous portion of
  the data and then using this to\nweight counts from the ambiguous data. T
 o acquire lexical subclasses\, we use unlexicalized RASP to parse data and
  then subclassify words according to the contexts in which they occur. The
 se subclassifications (e.g. of verbs into (in|di)transitive uses) are used
  to estimate parameters like P(subclass_i | verb_j)\, and these are then i
 ntegrated into parse ranking. \n\nIf there is time\, I'll talk about possi
 ble extensions of this work. Most parsers output a directed graph in which
  each node is labelled with a word token and each edge is labelled with a 
 grammatical relation. RASP can also output a weighted directed graph of al
 l relations hypothesised by the N best parses. To acquire bilexical colloc
 ational information to rank parses or to extract\nnuggets of information f
 rom documents\, we would like to develop domain appropriate and efficient 
 methods to compute (sub)graph (dis)similarity. \n\n\nBriscoe\, E.J.\, J. C
 arroll and R. Watson (2006) The Second Release of the RASP System\, acl.ld
 c.upenn.edu/P/P06/P06-4020.pdf\n\nClark\, S. and J. Curran (2007) Formalis
 m-Independent Parser Evaluation with CCG and DepBank\, acl.ldc.upenn.edu/P
 /P07/P07-1032.pdf\n
LOCATION:LR4\, Engineering\, Department of
END:VEVENT
END:VCALENDAR
