Speaker adapted speech synthesis with deep neural networks

Öztürk, Miraç Göksu.

Arşiv ve Dokümantasyon Merkezi Dijital Arşivi Ana Sayfası
→
Boğaziçi Üniversitesi Tezleri
→
Fen Bilimleri Enstitüsü
→
Bilgisayar Mühendisliği
→
M.S. Theses
→
Öğe Göster

Speaker adapted speech synthesis with deep neural networks

Öztürk, Miraç Göksu.

URI: http://digitalarchive.boun.edu.tr/handle/123456789/12369

Tarih: 2018.

Özet:

Text-to-speech (TTS) systems have been an assisting technology since the 1970s. Although commercial use has begun decades ago, synthetic speech quality is still not as good as recorded speech. One particular subject of this ﬁeld focused by this study is the speaker adaptation in TTS systems. Speaker adaptation is the task of modifying a given TTS model such that the modiﬁed model synthesizes speech samples with the voice characteristic of a desired speaker. In this study, deep neural network (DNN) based novel speaker adaptation techniques incorporating transfer learning methods are presented. We replaced the high dimensional speaker embeddings with few dimensional vectors using clustering methods. Objective results indicate signiﬁcant improvement to the adaptation performance compared to baseline techniques in addition to a signiﬁcant drop in the number of parameters. The second aspect of this study is the speaker adaptation performed on DNN-based postﬁltering methods. The subjective results show that the adaptation of postﬁltering increases the similarity of synthetic speech to the desired speaker’s voice although no signiﬁcant improvement in quality is observed. The techniques proposed in this study are independent of the choice of the DNN architecture and speaker embedding, thus, can be extended and used for experiments of relevant ﬁelds such as speech recognition in the future.

Tüm öğe kaydını göster