General teacher-student learning for automatic speech recognition
- đ¤ Speaker: Jeremy Wong, University of Cambridge
- đ Date & Time: Tuesday 31 July 2018, 12:00 - 13:00
- đ Venue: Department of Engineering - James Dyson Building Seminar Room
Abstract
Teacher-student learning is a general framework that can be used to transfer knowledge from one or more models to another. This has found various applications in the field of automatic speech recognition, to perform tasks such as compressing a large model or ensemble of models, and domain adaptation. In its standard form, teacher-student learning propagates information from one or more teacher models to a student model, by minimising the KL-divergence between their per-frame state-cluster posterior distributions, at the Neural Network (NN) outputs. This form of teacher-student learning is limited in two aspects. First, only frame-level posterior information is propagated from the teachers to the student. This form of information may not effectively capture the sequential nature of speech data, or the interactions between the acoustic, alignment, and language models. Second, all models are required to use the same set of state clusters. This in turn requires that all models must also use the same set of sub-word units, Hidden Markov Model (HMM) alignment model topology, context-dependency, and language model. Furthermore, all models are required to use the NN-HMM topology. This restricts the situations for which teacher-student learning may be applied. In particular, the allowed forms of diversity are limited within an ensemble that can be compressed using teacher-student learning. This talk presents several proposals to generalise the teacher-student learning framework to overcome these limitations. Different sets of state cluster can be allowed between the teacher and student models, by minimising the KL-divergence between per-frame logical context-dependent state posteriors. The sequential nature of speech data can be taken into account by using sequence-level criteria. These sequence-level criteria can potentially also remove all restrictions on the required topological similarities between the teacher and student models.
Series This talk is part of the CUED Speech Group Seminars series.
Included in Lists
- Cambridge Forum of Science and Humanities
- Cambridge Language Sciences
- Cambridge talks
- Chris Davis' list
- CUED Speech Group Seminars
- Department of Engineering - James Dyson Building Seminar Room
- Guy Emerson's list
- Information Engineering Division seminar list
- PhD related
Note: Ex-directory lists are not shown.
![[Talks.cam]](/static/images/talkslogosmall.gif)


Tuesday 31 July 2018, 12:00-13:00