Archives and Documentation Center
Digital Archives

Single-channel speech-music separation for robust ASR with mixture of NMF models

Show simple item record

dc.contributor Ph.D. Program in Electrical and Electronic Engineering.
dc.contributor.advisor Saraçlar, Murat.
dc.contributor.advisor Cemgil, Ali Taylan.
dc.contributor.author Demir, Cemil.
dc.date.accessioned 2023-03-16T10:25:09Z
dc.date.available 2023-03-16T10:25:09Z
dc.date.issued 2014.
dc.identifier.other EE 2014 D46 PhD
dc.identifier.uri http://digitalarchive.boun.edu.tr/handle/123456789/13116
dc.description.abstract In this dissertation, we analyze the single-channel speech-music separation problem for automatic speech recognition (ASR). The motivation of the study is to increase the performance of the ASR systems by decreasing the effect of background music. We describe a single-channel speech-music separation method based on a mixture of nonnegative matrix factorization (NMF) model. Given a catalog of background music material, we propose a generative model for the superposed speech and music spectrograms. The background music signal is assumed to be generated by a jingle in the catalog and it is modeled by a scaled conditional mixture model representing the jingle. The speech signal is modeled by an NMF model that is estimated in a semi-supervised manner from the mixed signal. The approach is tested with Poisson and complex Gaussian observation models that correspond respectively to Kullback-Leibler (KL) and Itakura-Saito (IS) divergence measures. Our experiments show that the proposed mixture model outperforms a standard NMF method both in speech-music separation and automatic speech recognition (ASR) tasks. Moreover, we extend the mixture of NMF based single-channel speech-music separation method such that it incorporates prior speech information to enhance the separation performance of the method. Finally, we propose to use sub-word NMF-based speech models for the separation of speech and music signals. By applying such a strategy, it is demonstrated that the recognition accuracy can be improved as compared to using a general speech model.
dc.format.extent 30 cm.
dc.publisher Thesis (Ph.D.) - Bogazici University. Institute for Graduate Studies in Science and Engineering, 2014.
dc.subject.lcsh Automatic speech recognition.
dc.title Single-channel speech-music separation for robust ASR with mixture of NMF models
dc.format.pages xx, 166 leaves ;


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search Digital Archive


Browse

My Account