BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//Talks.cam//talks.cam.ac.uk//
X-WR-CALNAME:Talks.cam
BEGIN:VEVENT
SUMMARY:General teacher-student learning for automatic speech recognition 
 - Jeremy Wong\, University of Cambridge
DTSTART:20180731T110000Z
DTEND:20180731T120000Z
UID:TALK108493@talks.cam.ac.uk
CONTACT:Anton Ragni
DESCRIPTION:Teacher-student learning is a general framework that can be us
 ed to transfer knowledge from one or more models to another.  This has fou
 nd various applications in the field of automatic speech recognition\, to 
 perform tasks such as compressing a large model or ensemble of models\, an
 d domain adaptation.  In its standard form\, teacher-student learning prop
 agates information from one or more teacher models to a student model\, by
  minimising the KL-divergence between their per-frame state-cluster poster
 ior distributions\, at the Neural Network (NN) outputs.  This form of teac
 her-student learning is limited in two aspects.  First\, only frame-level 
 posterior information is propagated from the teachers to the student.  Thi
 s form of information may not effectively capture the sequential nature of
  speech data\, or the interactions between the acoustic\, alignment\, and 
 language models.  Second\, all models are required to use the same set of 
 state clusters.  This in turn requires that all models must also use the s
 ame set of sub-word units\, Hidden Markov Model (HMM) alignment model topo
 logy\, context-dependency\, and language model.  Furthermore\, all models 
 are required to use the NN-HMM topology.  This restricts the situations fo
 r which teacher-student learning may be applied.  In particular\, the allo
 wed forms of diversity are limited within an ensemble that can be compress
 ed using teacher-student learning.  This talk presents several proposals t
 o generalise the teacher-student learning framework to overcome these limi
 tations.  Different sets of state cluster can be allowed between the teach
 er and student models\, by minimising the KL-divergence between per-frame 
 logical context-dependent state posteriors.  The sequential nature of spee
 ch data can be taken into account by using sequence-level criteria.  These
  sequence-level criteria can potentially also remove all restrictions on t
 he required topological similarities between the teacher and student model
 s.
LOCATION:Department of Engineering - James Dyson Building Seminar Room
END:VEVENT
END:VCALENDAR