Arşiv ve Dokümantasyon Merkezi
Dijital Arşivi

Intrinsic and extrinsic evaluation of word embedding models

Basit öğe kaydını göster

dc.contributor Graduate Program in Computer Engineering.
dc.contributor.advisor Güngör, Tunga.
dc.contributor.author Yeşiltaş, Gökçe.
dc.date.accessioned 2023-03-16T10:04:36Z
dc.date.available 2023-03-16T10:04:36Z
dc.date.issued 2019.
dc.identifier.other CMPE 2019 Y47
dc.identifier.uri http://digitalarchive.boun.edu.tr/handle/123456789/12424
dc.description.abstract In natural language processing tasks, representing a word is an important issue. After Bengio et al. introduced a simple neural network language model that learns word vector representations in 2003, representing words in continuous vector space has become more popular. Mikolov et al. introduced a method named word2vec and showed that word embedding could capture meaningful syntactic and semantic similarities in 2013. Many methods and implementations have been proposed for English since then. However, there are only a few studies on word representations in Turkish. In this study, we aimed to understand and analyze how word embedding models work on both Turkish and English. We focused on the word2vec word embedding model and tried to modify it to improve the quality of word representations. Additionally, we trained many models with di↵erent window sizes and dimensions. The impact of di↵erent configurations on the quality of word representations was analyzed both intrinsically and extrinsically. We reported the accuracy on word analogy tasks for intrinsic evaluation and word similarity tasks for extrinsic evaluation. Our results show that our proposed models perform better on most of the word analogy task categories for Turkish. We also showed that increasing window sizes and dimensions does not always a↵ect the accuracy in a positive direction. For some analogy and word similarity tasks, it a↵ects negatively.
dc.format.extent 30 cm.
dc.publisher Thesis (M.S.) - Bogazici University. Institute for Graduate Studies in Science and Engineering, 2019.
dc.subject.lcsh Natural language processing (Computer science)
dc.subject.lcsh Word games.
dc.title Intrinsic and extrinsic evaluation of word embedding models
dc.format.pages xiv, 71 leaves ;


Bu öğenin dosyaları

Bu öğe aşağıdaki koleksiyon(lar)da görünmektedir.

Basit öğe kaydını göster

Dijital Arşivde Ara


Göz at

Hesabım