dc.description.abstract |
Retrieval of spoken content is one key endeavor, not only for finding the speech parts of interest, but also for an automated and facilitated speech mining towards better automatic speech recognition (ASR) systems. In particular, keyword search (KWS) systems aims to address these goals, by locating the specific parts of speech where a user provided keyword uttered. The most intuitive and convenient method for keyword search is to obtain text transcriptions from speech using ASR systems, and then conduct text based search on this ASR output. However, for low resource languages, for which available labeled speech training data is not sufficient, reliable ASR systems cannot be built and, KWS systems that depend on them will fail. Furthermore, if the keyword of interest is not within the vocabulary of the ASR system, it can never be found in the word level transcriptions. In this thesis, we address the above mentioned issues of KWS for the low resource languages. We aim to build a KWS system, using a completely different approach, with ideas inspired by the similarity search techniques of the query by example retrieval tasks. For this, we utilize a subsequence dynamic time warping-based search, after artificially modeling “pseudo examples” for text queries. Furthermore, we investigate a joint learning of these query representations and a proper distance metric for use in dynamic time warping. We show that, this new KWS system, we propose, outperforms the state of the art KWS techniques for retrieval of out of-vocabulary terms, and provides significant improvements when combined with the conventional ASR-based KWS system due to its heterogeneity. |
|