Speech retrieval for Turkish broadcast news

Parlak, Sıddıka.

Arşiv ve Dokümantasyon Merkezi Dijital Arşivi Ana Sayfası
→
Boğaziçi Üniversitesi Tezleri
→
Fen Bilimleri Enstitüsü
→
Elektrik- Elektronik Mühendisliği
→
M.S. Theses
→
Öğe Göster

dc.contributor	Graduate Program in Electrical and Electronic Engineering.
dc.contributor.advisor	Saraçlar, Murat.
dc.contributor.author	Parlak, Sıddıka.
dc.date.accessioned	2023-03-16T10:17:08Z
dc.date.available	2023-03-16T10:17:08Z
dc.date.issued	2008.
dc.identifier.other	EE 2008 P37
dc.identifier.uri	http://digitalarchive.boun.edu.tr/handle/123456789/12709
dc.description.abstract	Speech retrieval is a recently emerging field of information retrieval, in which the information is spoken, instead of written. So far, spoken information retrieval has been studied in several languages. In this thesis, we concentrate on the retrieval of Turkish Broadcast News. We implement two tasks: Spoken Term Detection (STD) and Spoken Document Retrieval (SDR). Although they both combine Automatic Speech Recognition (ASR) and Information Retrieval (IR) techniques to retrieve spoken data, their main goals are different. STD retrieves specific occurrences and requires an exact match, while SDR retrieves related documents and cares more about context. Automatic transcription and retrieval of speech is more complicated in agglutinative languages because a standard size recognition vocabulary is able to cover only a limited portion of the language. A common solution is segmenting the words into subwords and using subwords units in recognition. We employed grammatical and statistical subword units in recognition and indexing for STD. Best scores are obtained via combining word and statistical subword based approaches. Word segmentation algorithms are also useful in SDR since stems bear the meaning and provide a better representation of context. Experiments showed that stemming improves SDR performance but the segmenting methods do not have a significant difference. We also studied language-independent ASR errors. Indexing the alternative ASR hypotheses, as well as the best one, was shown to be effective on the STD task. Results are presented on our Turkish Broadcast News Corpus.
dc.format.extent	30cm.
dc.publisher	Thesis (M.S.)-Bogazici University. Institute for Graduate Studies in Science and Engineering, 2008.
dc.relation	Includes appendices.
dc.relation	Includes appendices.
dc.subject.lcsh	Speech perception.
dc.subject.lcsh	Information retrieval.
dc.title	Speech retrieval for Turkish broadcast news
dc.format.pages	xix, 100 leaves;