Investigation of automatically derived subword units for Turkish LVCSR

Aksungurlu, Tuncay.

Arşiv ve Dokümantasyon Merkezi Dijital Arşivi Ana Sayfası
→
Boğaziçi Üniversitesi Tezleri
→
Fen Bilimleri Enstitüsü
→
Elektrik- Elektronik Mühendisliği
→
M.S. Theses
→
Öğe Göster

dc.contributor	Graduate Program in Electrical and Electronic Engineering.
dc.contributor.advisor	Saraçlar, Murat.
dc.contributor.author	Aksungurlu, Tuncay.
dc.date.accessioned	2023-03-16T10:17:10Z
dc.date.available	2023-03-16T10:17:10Z
dc.date.issued	2008.
dc.identifier.other	EE 2008 A37
dc.identifier.uri	http://digitalarchive.boun.edu.tr/handle/123456789/12716
dc.description.abstract	In this thesis, we performed large vocabulary continuous speech recognition (LVCSR) experiments using language models that are built upon different recognition units in order to create a suitable and successful language modeling scheme for Turkish. Since Turkish is an agglutinative language, how you build the language model dras- tically affects the recognition performance. Whereas traditional word based language models give satisfactory results for English; they do not work well for Turkish due to the inductive morphology. Different language modeling strategies, mainly based on sub-word units like morphemes and stem-endings, are proposed in order to overcome this problem. In this work, the sub-words that are derived in an unsupervised manner, are investigated. Segmentation obtained using different approaches are compared due to their performance in speech recognition. The best WER that has been obtained is 25.24 whereas it has been obtained as 26.90 using the word-based language models.
dc.format.extent	30cm.
dc.publisher	Thesis (M.S.)-Bogazici University. Institute for Graduate Studies in Science and Engineering, 2008.
dc.subject.lcsh	Automatic speech recognition.
dc.subject.lcsh	Turkish language -- Morphology.
dc.title	Investigation of automatically derived subword units for Turkish LVCSR
dc.format.pages	xi, 47 leaves;