dc.contributor |
Graduate Program in Computer Engineering. |
|
dc.contributor.advisor |
Gürgen, Fikret. |
|
dc.contributor.author |
Yeşilkanat, Ali. |
|
dc.date.accessioned |
2023-03-16T10:04:42Z |
|
dc.date.available |
2023-03-16T10:04:42Z |
|
dc.date.issued |
2020. |
|
dc.identifier.other |
CMPE 2020 Y47 |
|
dc.identifier.uri |
http://digitalarchive.boun.edu.tr/handle/123456789/12430 |
|
dc.description.abstract |
In this thesis, a new adversarial one-shot voice conversion (VC) method is introduced by enhancing one of the latest variational autoencoder based one-shot VC methods. The proposed method utilizes acoustic features as Mel-spectrograms and relies on disentangled representations by separating speaker and content representations of the spoken content. An adversarial loss and perceptual loss are combined in order to increase the quality of generated Mel-spectrograms. We train a speaker classi er by utilizing the architecture of a well-known model in the computer vision area, to be able to adapt perceptual loss during the training of the VC model. We conduct experiments on the Voice Cloning Toolkit dataset and evaluate the proposed approach in terms of Global Variance and MOSNet, a humanoid opinion score simulator. Experimental results indicate that our approach improves VC quality remarkably. |
|
dc.format.extent |
30 cm. |
|
dc.publisher |
Thesis (M.S.) - Bogazici University. Institute for Graduate Studies in Science and Engineering, 2020. |
|
dc.subject.lcsh |
Voice output communication aids. |
|
dc.subject.lcsh |
Speech processing systems. |
|
dc.title |
Adversarial one-shot voice conversion using disentangled representations |
|
dc.format.pages |
xiv, 64 leaves ; |
|