BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//Talks.cam//talks.cam.ac.uk//
X-WR-CALNAME:Talks.cam
BEGIN:VEVENT
SUMMARY:A Comparison of VTLN and Gender-Dependent Models - Thomas Schaaf (
 Multimodal Technologies\, Inc)
DTSTART:20080626T110000Z
DTEND:20080626T120000Z
UID:TALK12576@talks.cam.ac.uk
CONTACT:Dr Marcus Tomalin
DESCRIPTION:After an introduction of Multimodal Technologies\, Inc\, Pitts
 burgh\, PA (MModal)\, l describe the current challenges in dictation based
  health care documentation. This will be followed by an overview of  \nMMo
 dal's contribution in this space: a unique blend of speech  \nrecognition 
 and natural language processing technologies for turning  \nconversational
  dictations of clinical encounters into structured and  \nencoded clinical
  documents. Using a centralized\, hosted architecture  \nbased on a web se
 rvices infrastructure\, allows us to collect vast  \namounts of audio and 
 proof-read textual data\, enabling us to make  \nuse of highly speaker-spe
 cific models.  Rapid adaptation to new  \nspeakers with minimal or no impa
 ct on physicians’ workflow is an  \nimportant aspect which affects the a
 cceptability of the solution.  \nOne difference between speakers is the va
 riation in the length of  \nthe vocal tract. It is well established that t
 his can be partially  \ncompensated for with gender-dependent or vocal-tra
 ct-normalized  \nacoustic models.  I will present several ways of building
  gender- \ndependent models by splitting the database along the gender or 
 the  \nusage of a gender question in the context cluster tree. This is the
 n  \ncompared with Vocal Tract Length Normalized (VTLN) acoustic models  \
 nusing data from a Radiology reporting domain.  Although gender  \ndepende
 nt models result in considerable gains they did not  \noutperform VTLN.  F
 rom a business point of view scalability is an  \nimportant issue and in a
 ddition to better performance practical  \nconstraints are also in favor o
 f VTLN. For example it is possible to  \nestimate the VTLN based on a simp
 le Gaussian Mixture Model during  \nfrontend processing allowing a single-
 pass decoding\, and still be  \nable to adapt quickly if unexpected speake
 r change occurs. I will  \nend the presentation with a selection of resear
 ch topics that arise  \nfrom running an automatic transcription service.
LOCATION:LR4\, Engineering Department\, Baker Building
END:VEVENT
END:VCALENDAR