Basit öğe kaydını göster

dc.contributor Graduate Program in Electrical and Electronic Engineering.
dc.contributor.advisor Saraçlar, Murat.
dc.contributor.author Parlak, Sıddıka.
dc.date.accessioned 2023-03-16T10:17:08Z
dc.date.available 2023-03-16T10:17:08Z
dc.date.issued 2008.
dc.identifier.other EE 2008 P37
dc.identifier.uri http://digitalarchive.boun.edu.tr/handle/123456789/12709
dc.description.abstract Speech retrieval is a recently emerging field of information retrieval, in which the information is spoken, instead of written. So far, spoken information retrieval has been studied in several languages. In this thesis, we concentrate on the retrieval of Turkish Broadcast News. We implement two tasks: Spoken Term Detection (STD) and Spoken Document Retrieval (SDR). Although they both combine Automatic Speech Recognition (ASR) and Information Retrieval (IR) techniques to retrieve spoken data, their main goals are different. STD retrieves specific occurrences and requires an exact match, while SDR retrieves related documents and cares more about context. Automatic transcription and retrieval of speech is more complicated in agglutinative languages because a standard size recognition vocabulary is able to cover only a limited portion of the language. A common solution is segmenting the words into subwords and using subwords units in recognition. We employed grammatical and statistical subword units in recognition and indexing for STD. Best scores are obtained via combining word and statistical subword based approaches. Word segmentation algorithms are also useful in SDR since stems bear the meaning and provide a better representation of context. Experiments showed that stemming improves SDR performance but the segmenting methods do not have a significant difference. We also studied language-independent ASR errors. Indexing the alternative ASR hypotheses, as well as the best one, was shown to be effective on the STD task. Results are presented on our Turkish Broadcast News Corpus.
dc.format.extent 30cm.
dc.publisher Thesis (M.S.)-Bogazici University. Institute for Graduate Studies in Science and Engineering, 2008.
dc.relation Includes appendices.
dc.relation Includes appendices.
dc.subject.lcsh Speech perception.
dc.subject.lcsh Information retrieval.
dc.title Speech retrieval for Turkish broadcast news
dc.format.pages xix, 100 leaves;


Bu öğenin dosyaları

Bu öğe aşağıdaki koleksiyon(lar)da görünmektedir.

Basit öğe kaydını göster

Dijital Arşivde Ara


Göz at

Hesabım