Abstract:
Naturalness in TTS systems plays a big role in the acceptability of the TTS synthesis outputs. Rhythm, intonation, stress pattern, pitch and duration (timing) are the most important parameters which effect naturalness of the TTS system output. The task of the timing component in a TTS system is to compute duration information for sub-elements which are to be used in synthesis output. Duration modelling is a very challenging part of a TTS system since very little is known about the underlying process responsible for speech timing of humans.To analyze and model duration for Turkish TTS systems, spoken utterances of 1-words and sentences of an adult male are used which are recorded at high digital quality. Firstly, coverage of the Turkish by this spoken text corpus is investigated, which is found to be well enough. Afterwards, analysis of the durations of Turkish phonemes is done. Effects of factors that can be computed from text on the durations are found to determine which of them should be included in the duration models.To model duration, four models have been implemented. First two models use mean durations of the phonemes and mean durations of the triphones. Third model uses mean durations of the nodes of trees for triphones for duration prediction. The last model is an additive model where the effects of factors are found by regression analysis..