Modelling trajectories in statistical speech synthesis
- đ¤ Speaker: Matt Shannon (Cambridge) and Heiga Zen (Toshiba Research Europe Ltd.)
- đ Date & Time: Wednesday 26 January 2011, 13:00 - 15:00
- đ Venue: Cambridge University Engineering Department, Lecture Room 2
Abstract
In statistical speech synthesis we build a probabilistic model of (processed) speech given (processed) text. The processed speech is in the form of a sequence of acoustic feature vectors, and the sequence over time of each component of this feature vector forms a trajectory. In this talk we’ll discuss how to model these trajectories.
We will first review a few ways in which the standard HMM synthesis model is unsatisfactory. In particular the standard model is unnormalized, and we’ll discuss the practical impact of this lack of normalization. We’ll then look at normalized approaches, including the trajectory HMM (a globally normalized model) and the autoregressive HMM (a locally normalized model). Finally we’ll discuss some other possible enhancements including minimum generation error (MGE) training.
Series This talk is part of the speech synthesis seminar series series.
Included in Lists
- Cambridge Forum of Science and Humanities
- Cambridge Language Sciences
- Cambridge talks
- Cambridge University Engineering Department, Lecture Room 2
- Chris Davis' list
- CUED Speech Group Seminars
- Guy Emerson's list
- Information Engineering Division seminar list
- PhD related
- speech synthesis seminar series
Note: Ex-directory lists are not shown.
![[Talks.cam]](/static/images/talkslogosmall.gif)


Wednesday 26 January 2011, 13:00-15:00