BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//Talks.cam//talks.cam.ac.uk//
X-WR-CALNAME:Talks.cam
BEGIN:VEVENT
SUMMARY:Unsupervised Speech Disentanglement for Speech Style Transfer - Ka
 izhi Qian (IBM)
DTSTART:20211129T150000Z
DTEND:20211129T160000Z
UID:TALK166543@talks.cam.ac.uk
CONTACT:Dr Jie Pu
DESCRIPTION:*Abstract*: Speech information can be roughly decomposed into 
 four components: linguistic content\, timbre\, pitch\, and rhythm. Obtaini
 ng disentangled representations of these components is useful in speech an
 alysis and generation applications. Among them\, non-parallel many-to-many
  voice conversion can convert between many speakers without training on pa
 rallel data\, which is the most challenging speech style transfer paradigm
 . We did a series of three works to solve the challenges progressively.\nF
 irst\, we proposed AutoVC\, the first zero-shot non-parallel timbre conver
 sion framework that solves the over-smoothness problem of the VAE-based me
 thods and the unstable training problem of the GAN-based methods using a s
 imple autoencoder with a carefully designed bottleneck. The second work pr
 oposed SpeechSplit\, which can blindly decompose speech into its four comp
 onents by introducing three carefully designed bottlenecks. SpeechSplit is
  among the first algorithms to separately perform style transfer on timbre
 \, pitch\, and rhythm without text transcriptions. The third work proposed
  AutoPST\, which can disentangle global prosody style from speech without 
 relying on any text transcriptions. AutoPST is an Autoencoder-based Prosod
 y Style Transfer framework with a thorough rhythm removal module guided by
  self-expressive representation learning. AutoPST is among the first algor
 ithms to effectively convert prosody style in an unsupervised manner.\n\n*
 Bio*: Kaizhi Qian is currently doing research in MIT-IBM AI Waston Lab. He
  received his Ph.D. in Electrical and Computer Engineering from UIUC under
  the supervision of Prof. Mark Hasegawa-Johnson. His work focuses specific
 ally on applications of deep generative models for speech and time-series 
 processing. He has recently been working on unsupervised speech disentangl
 ement for low-resource language processing.
LOCATION:Zoom: https://eng-cam.zoom.us/j/81927138251?pwd=TVd3MXliV003dUdYV
 lFwU2NDWGpmdz09
END:VEVENT
END:VCALENDAR
