BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//Talks.cam//talks.cam.ac.uk//
X-WR-CALNAME:Talks.cam
BEGIN:VEVENT
SUMMARY:Interspeech practice session - Various
DTSTART:20130802T110000Z
DTEND:20130802T123000Z
UID:TALK45995@talks.cam.ac.uk
CONTACT:Rogier van Dalen
DESCRIPTION:12:00 - 12:40 Oral presentations\n\n* Shakti P. Rath (with Dan
 iel Povey\, Karel Vesely\, and Jan Cernocky)\, _Improved Feature Processin
 g for Deep Neural Networks_\n\n* Yongqiang (Eric) Wang (with Mark Gales)\,
 \n_An Explicit Independence Constraint for Factorised Adaptation in Speech
  Recognition_\n\n\n12:40 - 13:30 Posters and sandwiches:\n\n* Pierre Lanch
 antin\,\n_Improving Lightly Supervised Training for Broadcast Transcriptio
 n_\n\n* Jingzhou (Justin) Yang (with Rogier van Dalen and Mark Gales)\,\n_
 Infinite Support Vector Machines in Speech Recognition_\n\n\n*Abstracts*\n
 \nShakti P. Rath\, Daniel Povey\, Karel Vesely\, Jan Cernocky\n\n_Improved
  Feature Processing for Deep Neural Networks_\n\nIn this paper\, we invest
 igate alternative ways of processing MFCC-based features to use as the inp
 ut to Deep Neural Networks (DNNs). Our baseline is a conventional feature 
 pipeline that involves splicing the 13-dimensional front-end MFCCs across 
 9 frames\, followed by applying LDA to reduce the dimension to 40 and then
  further decorrelation using MLLT. Confirming the results of other groups\
 , we show that speaker adaptation applied on the top of these features usi
 ng feature-space MLLR is helpful. The fact that the number of parameters o
 f a DNN is not strongly sensitive to the input feature dimension (unlike G
 MM-based systems) motivated us to investigate ways to increase the dimensi
 on of the features. In this paper\, we investigate several approaches to d
 erive higher-dimensional features and verify their performance with DNN. O
 ur best result is obtained from splicing our baseline 40-dimensional speak
 er adapted features again across 9 frames\, followed by reducing the dimen
 sion to 200 or 300 using another LDA. Our final result is about 3% absolut
 e better than our best GMM system\, which is a discriminatively trained mo
 del. \n\nYongqiang Wang and Mark Gales\n\n_An Explicit Independence Constr
 aint for Factorised Adaptation in Speech Recognition_\n\nSpeech signals ar
 e usually affected by multiple acoustic factors\, such as speaker characte
 ristics and environment differences. Usually\, the combined effect of thes
 e factors is modelled by a single transform. Acoustic factorisation splits
  the transform into several factor transforms\, each modelling only one fa
 ctor. This allows\, for example\, estimating a speaker transform in a nois
 e condition and applying the same speaker transform in a different noise c
 ondition. To achieve this factorisation\, it is crucial to keep factor tra
 nsforms independent of each other. Previous work on acoustic factorisation
  relies on using different forms of factor transforms and/or the attribute
  of the data to enforce this independence. In this work\, the independence
  is formulated in mathematically\, and an explicit constraint is derived t
 o enforce the independence. Using factorised cluster adaptive training (fC
 AT) as an application\, experimental results demonstrates that the propose
 d explicit independence constraint helps factorisation when imbalanced ada
 ptation data is used. \n\n\nY. Long\, M.J.F. Gales\, P. Lanchantin\, X. Li
 u\, M.S. Seigel\, P.C. Woodland\n\n_Improving Lightly Supervised Training 
 for Broadcast Transcription_\n\nThis paper investigates improving lightly 
 supervised acoustic model training for an archive of broadcast data. Stand
 ard lightly supervised training uses automatically derived decoding hypoth
 eses using a biased language model. However\, as the actual speech can dev
 iate significantly from the original programme scripts that are supplied\,
  the quality of standard lightly supervised hypotheses can be poor. To add
 ress this issue\, word and segment level combination approaches are used b
 etween the lightly supervised transcripts and the original programme scrip
 ts which yield improved transcriptions. Experimental results show that sys
 tems trained using these improved transcriptions consistently outperform t
 hose trained using only the original lightly supervised decoding hypothese
 s. This is shown to be the case for both the maximum likelihood and minimu
 m phone error trained systems.\n\nJingzhou Yang\, Rogier van Dalen\, Mark 
 Gales\n\n_Infinite Support Vector Machines in Speech Recognition_\n\nGener
 ative feature spaces provide an elegant way to apply discriminative models
  in speech recognition\, and system performance has been improved by adapt
 ing this framework. However\, the classes in the feature space may be not 
 linearly separable. Applying a linear classifier then limits performance. 
 Instead of a single classifier\, this paper applies a mixture of experts. 
 This model trains different classifiers as experts focusing on different r
 egions of the feature space. However\, the number of experts is not known 
 in advance. This problem can be bypassed by employing a Bayesian non-param
 etric model. In this paper\, a specific mixture of experts based on the Di
 richlet process\, namely the infinite support vector machine\, is studied.
  Experiments conducted on the noise-corrupted continuous digit task AURORA
  2 show the advantages of this Bayesian non-parametric approach.
LOCATION:Department of Engineering - LR6
END:VEVENT
END:VCALENDAR
