Özet:
In this thesis, we performed large vocabulary continuous speech recognition (LVCSR) experiments using language models that are built upon different recognition units in order to create a suitable and successful language modeling scheme for Turkish. Since Turkish is an agglutinative language, how you build the language model dras- tically affects the recognition performance. Whereas traditional word based language models give satisfactory results for English; they do not work well for Turkish due to the inductive morphology. Different language modeling strategies, mainly based on sub-word units like morphemes and stem-endings, are proposed in order to overcome this problem. In this work, the sub-words that are derived in an unsupervised manner, are investigated. Segmentation obtained using different approaches are compared due to their performance in speech recognition. The best WER that has been obtained is 25.24 whereas it has been obtained as 26.90 using the word-based language models.