BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//Talks.cam//talks.cam.ac.uk//
X-WR-CALNAME:Talks.cam
BEGIN:VEVENT
SUMMARY:Factors Affecting ASR Model Self-Training - Scott Novotney (HLTCOE
  and BBN Technologies)
DTSTART:20090901T100000Z
DTEND:20090901T110000Z
UID:TALK19683@talks.cam.ac.uk
CONTACT:Dr Marcus Tomalin
DESCRIPTION:Low-resource ASR self-training seeks to minimize resource requ
 irements \nsuch as manual transcriptions or language modeling text. This i
 s \naccomplished by training on large quantities of audio automatically \n
 labeled by a small initial model. By analyzing our previous experiments \n
 with the conversational telephone English Fisher corpus\, we demonstrate \
 nwhere self-training succeeds and under what resource conditions it \nprov
 ides the most benefit. Additionally\, we will show success on Spanish \nan
 d Levantine conversational speech as well as the tougher English \nCallhom
 e set\, despite initial WER of more than 60%. Finally\, by digging \nbenea
 th average word error rate and analyzing individual word \nperformance\, w
 e show that self-trained models successfully learn new \nwords. More impor
 tantly\, self-training benefits most words which appear \nin the unlabeled
  audio but do not appear in the manual transcriptions.
LOCATION:LR5\, Engineering Department\, Baker Building
END:VEVENT
END:VCALENDAR