Archives and Documentation Center
Digital Archives

Adversarial one-shot voice conversion using disentangled representations

Show simple item record

dc.contributor Graduate Program in Computer Engineering.
dc.contributor.advisor Gürgen, Fikret.
dc.contributor.author Yeşilkanat, Ali.
dc.date.accessioned 2023-03-16T10:04:42Z
dc.date.available 2023-03-16T10:04:42Z
dc.date.issued 2020.
dc.identifier.other CMPE 2020 Y47
dc.identifier.uri http://digitalarchive.boun.edu.tr/handle/123456789/12430
dc.description.abstract In this thesis, a new adversarial one-shot voice conversion (VC) method is introduced by enhancing one of the latest variational autoencoder based one-shot VC methods. The proposed method utilizes acoustic features as Mel-spectrograms and relies on disentangled representations by separating speaker and content representations of the spoken content. An adversarial loss and perceptual loss are combined in order to increase the quality of generated Mel-spectrograms. We train a speaker classi er by utilizing the architecture of a well-known model in the computer vision area, to be able to adapt perceptual loss during the training of the VC model. We conduct experiments on the Voice Cloning Toolkit dataset and evaluate the proposed approach in terms of Global Variance and MOSNet, a humanoid opinion score simulator. Experimental results indicate that our approach improves VC quality remarkably.
dc.format.extent 30 cm.
dc.publisher Thesis (M.S.) - Bogazici University. Institute for Graduate Studies in Science and Engineering, 2020.
dc.subject.lcsh Voice output communication aids.
dc.subject.lcsh Speech processing systems.
dc.title Adversarial one-shot voice conversion using disentangled representations
dc.format.pages xiv, 64 leaves ;


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search Digital Archive


Browse

My Account