Keyword search for low resource languages

Gündoğdu, M. Batuhan.

Archives and Documentation Center Digital Archives Home
→
Boğaziçi Üniversitesi Tezleri
→
Fen Bilimleri Enstitüsü
→
Elektrik- Elektronik Mühendisliği
→
Ph.D. Theses
→
View Item

dc.contributor	Ph.D. Program in Electrical and Electronic Engineering.
dc.contributor.advisor	Saraçlar, Murat.
dc.contributor.author	Gündoğdu, M. Batuhan.
dc.date.accessioned	2023-03-16T10:25:20Z
dc.date.available	2023-03-16T10:25:20Z
dc.date.issued	2018.
dc.identifier.other	EE 2018 G86 PhD
dc.identifier.uri	http://digitalarchive.boun.edu.tr/handle/123456789/13145
dc.description.abstract	Retrieval of spoken content is one key endeavor, not only for ﬁnding the speech parts of interest, but also for an automated and facilitated speech mining towards better automatic speech recognition (ASR) systems. In particular, keyword search (KWS) systems aims to address these goals, by locating the speciﬁc parts of speech where a user provided keyword uttered. The most intuitive and convenient method for keyword search is to obtain text transcriptions from speech using ASR systems, and then conduct text based search on this ASR output. However, for low resource languages, for which available labeled speech training data is not suﬃcient, reliable ASR systems cannot be built and, KWS systems that depend on them will fail. Furthermore, if the keyword of interest is not within the vocabulary of the ASR system, it can never be found in the word level transcriptions. In this thesis, we address the above mentioned issues of KWS for the low resource languages. We aim to build a KWS system, using a completely diﬀerent approach, with ideas inspired by the similarity search techniques of the query by example retrieval tasks. For this, we utilize a subsequence dynamic time warping-based search, after artiﬁcially modeling “pseudo examples” for text queries. Furthermore, we investigate a joint learning of these query representations and a proper distance metric for use in dynamic time warping. We show that, this new KWS system, we propose, outperforms the state of the art KWS techniques for retrieval of out of-vocabulary terms, and provides signiﬁcant improvements when combined with the conventional ASR-based KWS system due to its heterogeneity.
dc.format.extent	30 cm.
dc.publisher	Thesis (Ph.D.) - Bogazici University. Institute for Graduate Studies in Science and Engineering, 2018.
dc.subject.lcsh	Keyword searching.
dc.subject.lcsh	Automatic speech recognition.
dc.title	Keyword search for low resource languages
dc.format.pages	xx, 115 leaves ;