Learning word representations with deep neural networks for Turkish

Dündar, Enes Burak.

Archives and Documentation Center Digital Archives Home
→
Boğaziçi Üniversitesi Tezleri
→
Fen Bilimleri Enstitüsü
→
Bilgisayar Mühendisliği
→
M.S. Theses
→
View Item

Learning word representations with deep neural networks for Turkish

Dündar, Enes Burak.

URI: http://digitalarchive.boun.edu.tr/handle/123456789/12373

Date: 2019.

Abstract:

In this study, we analyze the eﬀect of diﬀerent word embedding methods in representing Turkish texts, namely word2vec, fastText, and ELMo. Word embeddings are used for representing words in a high dimensional vector space such that similar words are placed nearby. This will help in diﬀerent tasks, such as document classiﬁ cation, machine translation, and so on. We conduct experiments on Turkish corpora of diﬀerent sizes using word2vec, fastText, and ELMo, and compare them with bag of-words (BOW). Word2vec works at the word level; fastText works at the character (subword) level and the representation of a word is calculated by combining the rep resentations of subwords. ELMo is context-dependent, that is, the representation of a vector depends on other words in the sentence, whereas word2vec and fastText are context-independent. Learned word embeddings are evaluated on noun and verb inﬂec tions, semantic analogy tests, as well as on topic classiﬁcation of news documents. Our experiments indicate that fastText vectors are better on classiﬁcation tasks. Word2vec vectors are more useful on semantic analogies.

Show full item record