Adversarial one-shot voice conversion using disentangled representations

Yeşilkanat, Ali.

Arşiv ve Dokümantasyon Merkezi Dijital Arşivi Ana Sayfası
→
Boğaziçi Üniversitesi Tezleri
→
Fen Bilimleri Enstitüsü
→
Bilgisayar Mühendisliği
→
M.S. Theses
→
Öğe Göster

dc.contributor	Graduate Program in Computer Engineering.
dc.contributor.advisor	Gürgen, Fikret.
dc.contributor.author	Yeşilkanat, Ali.
dc.date.accessioned	2023-03-16T10:04:42Z
dc.date.available	2023-03-16T10:04:42Z
dc.date.issued	2020.
dc.identifier.other	CMPE 2020 Y47
dc.identifier.uri	http://digitalarchive.boun.edu.tr/handle/123456789/12430
dc.description.abstract	In this thesis, a new adversarial one-shot voice conversion (VC) method is introduced by enhancing one of the latest variational autoencoder based one-shot VC methods. The proposed method utilizes acoustic features as Mel-spectrograms and relies on disentangled representations by separating speaker and content representations of the spoken content. An adversarial loss and perceptual loss are combined in order to increase the quality of generated Mel-spectrograms. We train a speaker classi er by utilizing the architecture of a well-known model in the computer vision area, to be able to adapt perceptual loss during the training of the VC model. We conduct experiments on the Voice Cloning Toolkit dataset and evaluate the proposed approach in terms of Global Variance and MOSNet, a humanoid opinion score simulator. Experimental results indicate that our approach improves VC quality remarkably.
dc.format.extent	30 cm.
dc.publisher	Thesis (M.S.) - Bogazici University. Institute for Graduate Studies in Science and Engineering, 2020.
dc.subject.lcsh	Voice output communication aids.
dc.subject.lcsh	Speech processing systems.
dc.title	Adversarial one-shot voice conversion using disentangled representations
dc.format.pages	xiv, 64 leaves ;