The applications of discrete speech tokens for robust and context-aware text-to-speech synthesis
- đ¤ Speaker: Chenpeng Du
- đ Date & Time: Monday 11 December 2023, 12:00 - 13:00
- đ Venue: In-person for Cambridge University members only: JDB Teaching Room, Engineering Department
Abstract
In a conventional neural text-to-speech (TTS) pipeline, there are typically two stages: firstly, the prediction of a mel-spectrogram from text through an acoustic model, followed by the generation of waveform data from the mel-spectrogram with a vocoder. However, such systems often suffer from suboptimal quality and sensitivity to the quality of the training data. We propose for the first time to leverage discrete speech tokens from self-supervised models as the intermediate feature of TTS pipeline, leading to a significant improvement in the robustness. Building upon this novel pipeline, we extend its applications to context-aware TTS tasks, where speech coherence with the context is taken into account during the speech generation process.
Series This talk is part of the CUED Speech Group Seminars series.
Included in Lists
- Cambridge Forum of Science and Humanities
- Cambridge Language Sciences
- Cambridge talks
- Chris Davis' list
- CUED Speech Group Seminars
- Guy Emerson's list
- Information Engineering Division seminar list
- In-person for Cambridge University members only: JDB Teaching Room, Engineering Department
- PhD related
Note: Ex-directory lists are not shown.
![[Talks.cam]](/static/images/talkslogosmall.gif)


Monday 11 December 2023, 12:00-13:00