Single-channel speech-music separation for robust ASR with mixture of NMF models

Demir, Cemil.

Arşiv ve Dokümantasyon Merkezi Dijital Arşivi Ana Sayfası
→
Boğaziçi Üniversitesi Tezleri
→
Fen Bilimleri Enstitüsü
→
Elektrik- Elektronik Mühendisliği
→
Ph.D. Theses
→
Öğe Göster

dc.contributor	Ph.D. Program in Electrical and Electronic Engineering.
dc.contributor.advisor	Saraçlar, Murat.
dc.contributor.advisor	Cemgil, Ali Taylan.
dc.contributor.author	Demir, Cemil.
dc.date.accessioned	2023-03-16T10:25:09Z
dc.date.available	2023-03-16T10:25:09Z
dc.date.issued	2014.
dc.identifier.other	EE 2014 D46 PhD
dc.identifier.uri	http://digitalarchive.boun.edu.tr/handle/123456789/13116
dc.description.abstract	In this dissertation, we analyze the single-channel speech-music separation problem for automatic speech recognition (ASR). The motivation of the study is to increase the performance of the ASR systems by decreasing the effect of background music. We describe a single-channel speech-music separation method based on a mixture of nonnegative matrix factorization (NMF) model. Given a catalog of background music material, we propose a generative model for the superposed speech and music spectrograms. The background music signal is assumed to be generated by a jingle in the catalog and it is modeled by a scaled conditional mixture model representing the jingle. The speech signal is modeled by an NMF model that is estimated in a semi-supervised manner from the mixed signal. The approach is tested with Poisson and complex Gaussian observation models that correspond respectively to Kullback-Leibler (KL) and Itakura-Saito (IS) divergence measures. Our experiments show that the proposed mixture model outperforms a standard NMF method both in speech-music separation and automatic speech recognition (ASR) tasks. Moreover, we extend the mixture of NMF based single-channel speech-music separation method such that it incorporates prior speech information to enhance the separation performance of the method. Finally, we propose to use sub-word NMF-based speech models for the separation of speech and music signals. By applying such a strategy, it is demonstrated that the recognition accuracy can be improved as compared to using a general speech model.
dc.format.extent	30 cm.
dc.publisher	Thesis (Ph.D.) - Bogazici University. Institute for Graduate Studies in Science and Engineering, 2014.
dc.subject.lcsh	Automatic speech recognition.
dc.title	Single-channel speech-music separation for robust ASR with mixture of NMF models
dc.format.pages	xx, 166 leaves ;