BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//Talks.cam//talks.cam.ac.uk//
X-WR-CALNAME:Talks.cam
BEGIN:VEVENT
SUMMARY:Low-resource expressive text-to-speech using data augmentation - D
 r Thomas Merritt\, Amazon
DTSTART:20210629T110000Z
DTEND:20210629T120000Z
UID:TALK157948@talks.cam.ac.uk
CONTACT:Dr Kate Knill
DESCRIPTION:*Abstract:* While recent neural text-to-speech (TTS) systems p
 erform remarkably well\, they typically require a substantial amount of re
 cordings from the target speaker reading in the desired speaking style. In
  this work\, we present a novel 3-step methodology to circumvent the costl
 y operation of recording large amounts of target data in order to build ex
 pressive style voices with as little as 15 minutes of such recordings. Fir
 st\, we augment data via voice conversion by leveraging recordings in the 
 desired speaking style from other speakers. Next\, we use that synthetic d
 ata on top of the available recordings to train a TTS model. Finally\, we 
 fine-tune that model to further increase quality. Our evaluations show tha
 t the proposed changes bring significant improvements over non-augmented m
 odels across many perceived aspects of synthesised speech. We demonstrate 
 the proposed approach on 2 styles (newscaster and conversational)\, on var
 ious speakers\, and on both single and multi-speaker models\, illustrating
  the robustness of our approach.\n \n*Bio:* Thomas Merritt is an applied s
 cientist at Amazon\, based in Cambridge. Thomas received his PhD from the 
 University of Edinburgh in 2016. The title of his thesis is: Overcoming th
 e limitations of statistical parametric speech synthesis. Since graduating
  he has been working on text-to-speech research at Amazon\, focusing on im
 provements to prosody and overall naturalness of synthesised speech.\n 
LOCATION:Zoom: https://zoom.us/j/95352633552?pwd=RzJVK2UzOGZyNU5mVHd1Y1VPT
 2tDUT09
END:VEVENT
END:VCALENDAR
