Archives and Documentation Center
Digital Archives

Statistical language models for large vocabulary Turkish speech recognition

Show simple item record

dc.contributor Graduate Program in Electrical and Electronic Engineering.
dc.contributor.advisor Arslan, Levent M.
dc.contributor.author Dutağacı, Helin.
dc.date.accessioned 2023-03-16T10:16:43Z
dc.date.available 2023-03-16T10:16:43Z
dc.date.issued 2002.
dc.identifier.other EE 2002 D88
dc.identifier.uri http://digitalarchive.boun.edu.tr/handle/123456789/12644
dc.description.abstract In this thesis we have compared four statistical language models for large vocabulary Turkish speech recognition. Turkish is an agglutinative language and has a productive morphotactics. This property of Turkish results in a vocabulary explosion and misestimation of N-gram probabilities while designing speech recognition systems. The solution is to parse the words, in order to get smaller base units that are capable of covering the language with relatively small vocabulary size. Three different ways of decomposing words into base units are described: Morpheme-based model, stem-ending-based model and syllable-based model. These models with the word-based model are compared with respect to vocabulary size, text coverage, bigram perplexity and speech recognition performance. We have constructed a Turkish text corpus of size 10 million words, containing various texts collected from the Web. These texts have been parsed into their morphemes, stems, endings and syllables and statistics of these base units are estimated. Finally we have performed speech recognition experiments with models constructed with these base units.
dc.format.extent 30 cm.
dc.publisher Thesis (M.S.) - Bogazici University. Institue for Graduate Studies in Science and Engineering, 2002.
dc.relation Includes appendices.
dc.relation Includes appendices.
dc.subject.lcsh Automatic speech recognition -- Statistical methods.
dc.subject.lcsh Turkish language -- Morphology.
dc.subject.lcsh Turkish language -- Word formation.
dc.subject.lcsh Turkish language -- Data processing.
dc.title Statistical language models for large vocabulary Turkish speech recognition
dc.format.pages xv, 89 leaves ;


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search Digital Archive


Browse

My Account