BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//Talks.cam//talks.cam.ac.uk//
X-WR-CALNAME:Talks.cam
BEGIN:VEVENT
SUMMARY:Very deep convolutional neural networks for speech recognition - T
 om Sercu\, IBM Watson\, USA
DTSTART:20160819T110000Z
DTEND:20160819T120000Z
UID:TALK67100@talks.cam.ac.uk
CONTACT:Anton Ragni
DESCRIPTION:Convolutional Neural Networks are one of the main drivers of t
 he recent deep learning explosion\, with the “Alexnet" (2012) result on 
 the imagenet competition\, and consecutive models like Overfeat (2013)\, V
 GG net (2014)\, GoogLeNet (2014)\, and residual networks (2015). In the sp
 eech recognition domain\, CNNs with 2 convolutional layers were introduced
  around 2012 and have not seen major updates since. We will present a numb
 er of recent architectural advances in CNNs for speech recognition. We int
 roduce a very deep convolutional network architecture with up to 14 weight
  layers. There are multiple convolutional layers before each pooling layer
 \, with small 3x3 kernels\, inspired by the VGG Imagenet 2014 architecture
 . We will discuss the design choice of strided pooling and zero-padding al
 ong the time direction\, which renders convolutional evaluation of sequenc
 es highly inefficient. This can be phrased in the computer vision terminol
 ogy of classification vs dense pixelwise prediction. We define the archite
 ctural constraints to make efficient evaluation of full utterances possibl
 e. This allows batch normalization to be adopted during full-utterance seq
 uence training\, resulting in faster training and improved performance. We
  show state of the art results on the benchmark switchboard 2000 hour data
 set (Hub5 eval). We also adapted our architecture to the multilingual sett
 ing and got strong results on the babel OP3 surprise language after multil
 ingual training on 25 languages.
LOCATION:Department of Engineering - LR5
END:VEVENT
END:VCALENDAR