Abstract:
The objective of this study is to bring an approach that incorporates word embeddings into Turkish text classification process, and to evaluate the applicability and performance of this approach by applying it for Turkish music mood detection. The methodology followed in this study consists of two main parts. In the first part, word embeddings are trained through a large collection of textual data, which includes more than 2.5 million Turkish documents gathered from the Internet, by using Word2Vec and GloVe algorithms. Subsequently, lyrics vectors are generated for the pre-processed lyrics selected for mood detection through the use of word embeddings that were trained initially. In the second part of the study, lyrics vectors are employed as features in music mood detection performed via various machine learning techniques. Besides, Turkish music mood detection is also done by using traditional bag-of-words approach, in which TF-IDF term weighting scheme is used, and Doc2Vec algorithm for comparison purposes. The effects of stemming of the words into their roots and filtering out the precompiled list of stop-words on the results are investigated as well. The results obtained from the study show the effectiveness of incorporating word embeddings generated using big textual data collection into the Turkish text classification process, which is clearly illustrated by the improved classification performance.