BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//Talks.cam//talks.cam.ac.uk//
X-WR-CALNAME:Talks.cam
BEGIN:VEVENT
SUMMARY:SoundStream: An End-to-End Neural Audio Codec - Neil Zeghidour (Go
 ogle)
DTSTART:20220404T110000Z
DTEND:20220404T120000Z
UID:TALK172100@talks.cam.ac.uk
CONTACT:Dr Jie Pu
DESCRIPTION:*Abstract*: Audio codecs (mp3\, Opus)\, are compression algori
 thms used whenever one needs to transmit audio\, whether when streaming a 
 song or during a conference call. In this talk\, I will present SoundStrea
 m\, a novel neural audio codec that can efficiently compress speech\, musi
 c and general audio at bitrates normally targeted by speech-tailored codec
 s. SoundStream relies on a model architecture composed by a fully convolut
 ional encoder/decoder network and a residual vector quantizer\, which are 
 trained jointly end-to-end. Training leverages recent advances in text-to-
 speech and speech enhancement\, which combine adversarial and reconstructi
 on losses to allow the generation of high-quality audio content from quant
 ized embeddings. By training with structured dropout applied to quantizer 
 layers\, a single model can operate across variable bitrates from 3kbps to
  18kbps\, with a negligible quality loss when compared with models trained
  at fixed bitrates. In addition\, the model is amenable to a low latency i
 mplementation\, which supports streamable inference and runs in real time 
 on a smartphone CPU. In subjective evaluations using audio at 24kHz sampli
 ng rate\, SoundStream at 3kbps outperforms Opus at 12kbps and approaches E
 VS at 9.6kbps. Moreover\, we are able to perform joint compression and enh
 ancement either at the encoder or at the decoder side with no additional l
 atency\, which we demonstrate through background noise suppression for spe
 ech.\n\n*Bio*: Neil Zeghidour is a Senior Research Scientist at Google Bra
 in in Paris\, and teaches automatic speech processing at Ecole Normale Sup
 érieure. He previously graduated with a PhD in Machine Learning from Ecol
 e Normale Superieure in Paris\, jointly with Facebook AI Research. His mai
 n research interest is to integrate signal processing and deep learning in
 to fully learnable architectures for audio understanding and generation.\n
LOCATION:Zoom: https://eng-cam.zoom.us/j/81927138251?pwd=TVd3MXliV003dUdYV
 lFwU2NDWGpmdz09
END:VEVENT
END:VCALENDAR
