BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//Talks.cam//talks.cam.ac.uk//
X-WR-CALNAME:Talks.cam
BEGIN:VEVENT
SUMMARY:Applications of Lexicographic Semirings in Speech and Language Pro
 cessing - Brian Roark\, Oregon Health &amp\; Science University (OHSU)
DTSTART:20111003T120000Z
DTEND:20111003T133000Z
UID:TALK32942@talks.cam.ac.uk
CONTACT:Bill Byrne
DESCRIPTION:In this talk\, I'll present a couple of applications of lexico
 graphic semirings for encoding sequence models\, which yield useful algori
 thms based on weighted finite-state determinization. Lexicographic semirin
 gs involve an ordered set of dimensions\, each of which is itself a semiri
 ng. First\, I'll briefly introduce weighted finite-state automata and tran
 sducers\, semirings\, and lexicographic semirings\, followed by a presenta
 tion of two special cases. The first lexicographic semiring we examine inv
 olves a pair of tropical semirings\, which provides an exact automata enco
 ding of smoothed n-gram models using simple epsilon transitions rather tha
 n failure transitions. This allows for off-line optimization of exact mode
 ls represented as large weighted finite-state transducers in contrast to i
 mplicit (on-line) failure transition representations. The second lexicogra
 phic semiring is a pair of a tropical semiring and a new string semiring w
 hich we call a "categorial semiring". The categorial semiring is inspired 
 by categorial grammar and includes an operation of string division. This s
 emiring allows us to use weighted finite-state determinization on a weight
 ed transducer so that every input sequence has exactly one (minimum cost) 
 output sequence. For example\, a part-of-speech tagged word lattice can be
  determinized so that every word string in the original lattice has just o
 ne path in the tagged lattice\, corresponding to the Viterbi-best POS-tag 
 sequence for that word string. Tools based on both of these methods will b
 e available as part of the new ngram library available from OpenGrm.org.  
  (Joint work with Richard Sproat\, Izhak Shafran and Mahsa Yarmohammadi) \
 n\nBrian Roark is an Associate Professor in the Center for Spoken Language
  Understanding (CSLU) and Dept. of Biomedical Engineering at Oregon Health
  & Science University (OHSU).  He received his PhD from Brown University i
 n 2001 and spent 3 years in the Speech Algorithms Department at AT&T Labs 
 - Research before joining CSLU.  His research interests include natural la
 nguage processing\, language modeling for various applications\, assistive
  technology\, and spoken language understanding.
LOCATION: Cambridge University Engineering Department\,  Room LR10
END:VEVENT
END:VCALENDAR