Archives and Documentation Center
Digital Archives

Investigation of automatically derived subword units for Turkish LVCSR

Show simple item record

dc.contributor Graduate Program in Electrical and Electronic Engineering.
dc.contributor.advisor Saraçlar, Murat.
dc.contributor.author Aksungurlu, Tuncay.
dc.date.accessioned 2023-03-16T10:17:10Z
dc.date.available 2023-03-16T10:17:10Z
dc.date.issued 2008.
dc.identifier.other EE 2008 A37
dc.identifier.uri http://digitalarchive.boun.edu.tr/handle/123456789/12716
dc.description.abstract In this thesis, we performed large vocabulary continuous speech recognition (LVCSR) experiments using language models that are built upon different recognition units in order to create a suitable and successful language modeling scheme for Turkish. Since Turkish is an agglutinative language, how you build the language model dras- tically affects the recognition performance. Whereas traditional word based language models give satisfactory results for English; they do not work well for Turkish due to the inductive morphology. Different language modeling strategies, mainly based on sub-word units like morphemes and stem-endings, are proposed in order to overcome this problem. In this work, the sub-words that are derived in an unsupervised manner, are investigated. Segmentation obtained using different approaches are compared due to their performance in speech recognition. The best WER that has been obtained is 25.24 whereas it has been obtained as 26.90 using the word-based language models.
dc.format.extent 30cm.
dc.publisher Thesis (M.S.)-Bogazici University. Institute for Graduate Studies in Science and Engineering, 2008.
dc.subject.lcsh Automatic speech recognition.
dc.subject.lcsh Turkish language -- Morphology.
dc.title Investigation of automatically derived subword units for Turkish LVCSR
dc.format.pages xi, 47 leaves;


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search Digital Archive


Browse

My Account