Prosody transfer evaluation and temporal prosody control in speech synthesis
- đ¤ Speaker: Papercup
- đ Date & Time: Tuesday 06 July 2021, 12:00 - 13:00
- đ Venue: Zoom: https://zoom.us/j/95352633552?pwd=RzJVK2UzOGZyNU5mVHd1Y1VPT2tDUT09
Abstract
Ctrl-P: Temporal Control of Prosodic Variation for Speech Synthesis
Abstract: We propose a model that generates speech explicitly conditioned on the three primary acoustic correlates of prosody: F0, energy and duration. The model is flexible about how the values of these features are specified: they can be externally provided, or predicted from text, or predicted then subsequently modified. Compared to a model that employs a variational auto-encoder to learn unsupervised latent features, our model provides more interpretable, temporally-precise, and disentangled control.
ADEPT: A Dataset for Evaluating Prosody Transfer
Abstract: We introduce an English corpus of prosodically-varied reference natural speech samples for evaluating prosody transfer. The samples include global and local variations across utterances. The corpus only includes prosodic variations that listeners are able to distinguish with reasonable accuracy, and we report these figures as a benchmark against which text-to-speech prosody transfer can be compared. We also propose a subjective prosody transfer evaluation methodology.
Speaker bios:
Tian Huey Teh is a machine learning engineer at Papercup, based in London. She completed the MSc Computational Statistics and Machine Learning programme at University College London in 2018. Since graduating she has been working on TTS research and development, focusing on prosody modelling and scaling systems across languages.
Alexandra Torresquintero is a Data Engineer on the machine learning team at Papercup. She completed her MSc in Speech and Language processing at the University of Edinburgh in 2019. Whilst at Papercup, she has worked on formalising the processing behind the TTS training data, including Linguistic Frontend optimisations, research into g2p modelling, and building a database to store our data.
Series This talk is part of the CUED Speech Group Seminars series.
Included in Lists
- Cambridge Forum of Science and Humanities
- Cambridge Language Sciences
- Cambridge talks
- Chris Davis' list
- CUED Speech Group Seminars
- Guy Emerson's list
- Information Engineering Division seminar list
- PhD related
- Zoom: https://zoom.us/j/95352633552?pwd=RzJVK2UzOGZyNU5mVHd1Y1VPT2tDUT09
Note: Ex-directory lists are not shown.
![[Talks.cam]](/static/images/talkslogosmall.gif)


Tuesday 06 July 2021, 12:00-13:00