Intrinsic and extrinsic evaluation of word embedding models

Yeşiltaş, Gökçe.

Archives and Documentation Center Digital Archives Home
→
Boğaziçi Üniversitesi Tezleri
→
Fen Bilimleri Enstitüsü
→
Bilgisayar Mühendisliği
→
M.S. Theses
→
View Item

dc.contributor	Graduate Program in Computer Engineering.
dc.contributor.advisor	Güngör, Tunga.
dc.contributor.author	Yeşiltaş, Gökçe.
dc.date.accessioned	2023-03-16T10:04:36Z
dc.date.available	2023-03-16T10:04:36Z
dc.date.issued	2019.
dc.identifier.other	CMPE 2019 Y47
dc.identifier.uri	http://digitalarchive.boun.edu.tr/handle/123456789/12424
dc.description.abstract	In natural language processing tasks, representing a word is an important issue. After Bengio et al. introduced a simple neural network language model that learns word vector representations in 2003, representing words in continuous vector space has become more popular. Mikolov et al. introduced a method named word2vec and showed that word embedding could capture meaningful syntactic and semantic similarities in 2013. Many methods and implementations have been proposed for English since then. However, there are only a few studies on word representations in Turkish. In this study, we aimed to understand and analyze how word embedding models work on both Turkish and English. We focused on the word2vec word embedding model and tried to modify it to improve the quality of word representations. Additionally, we trained many models with di↵erent window sizes and dimensions. The impact of di↵erent configurations on the quality of word representations was analyzed both intrinsically and extrinsically. We reported the accuracy on word analogy tasks for intrinsic evaluation and word similarity tasks for extrinsic evaluation. Our results show that our proposed models perform better on most of the word analogy task categories for Turkish. We also showed that increasing window sizes and dimensions does not always a↵ect the accuracy in a positive direction. For some analogy and word similarity tasks, it a↵ects negatively.
dc.format.extent	30 cm.
dc.publisher	Thesis (M.S.) - Bogazici University. Institute for Graduate Studies in Science and Engineering, 2019.
dc.subject.lcsh	Natural language processing (Computer science)
dc.subject.lcsh	Word games.
dc.title	Intrinsic and extrinsic evaluation of word embedding models
dc.format.pages	xiv, 71 leaves ;