Predicting stock movements with machine learning using textual data

Özdemir, Meryem.

Archives and Documentation Center Digital Archives Home
→
Boğaziçi Üniversitesi Tezleri
→
Sosyal Bilimler Enstitüsü
→
Yönetim Bilişim Sistemleri
→
M.A. Theses
→
View Item

dc.contributor	Graduate Program in Management Information Systems.
dc.contributor.advisor	Durahim, Ahmet Onur.
dc.contributor.author	Özdemir, Meryem.
dc.date.accessioned	2023-03-16T12:51:33Z
dc.date.available	2023-03-16T12:51:33Z
dc.date.issued	2020.
dc.identifier.other	MIS 2020 O84
dc.identifier.uri	http://digitalarchive.boun.edu.tr/handle/123456789/18105
dc.description.abstract	Economic events perceive great attention from information retrieval community. As one of the popular practices, language models on economy related textual data are proven to be advantageous for anticipating economic events. However, studies on Turkish stock market with textual sources are still limited as language models focus on popular languages. Fortunately, a significant step is taken on language models via the Transformer architecture, and its novel methodology widened the horizons of Natural Language Processing (NLP) studies for over 100 languages with the help of transfer learning. Ergo, in this study, it is aimed to incorporate both the latest advances and the traditional methods of NLP with machine learning classifiers to foresee the stock movements of the companies publicly traded in BIST market, using their official disclosures. To this end, 69,806 material events disclosures of BIST companies are fetched from Public Disclosure Platform (KAP) and labeled with stock movement directions. During the experiments, announcements are represented with Term Frequency Inverse Document Frequency (TFIDF) vectors and Bi-directional Encoder Representations for Transformers (BERT) embeddings so as to be classified with six different learners, namely Multinomial Naïve Bayes, Logistic Regression, Extreme Gradient Boosting (XGBoost), Light Gradient Boosting Machine (LightGBM), Categorical Boosting (CatBoost), and pre-trained classification layer of the Turkish case of BERT, namely BERTurk. While all setups yielded promising results, best performance is delivered by LightGBM on TFIDF with 39.7% F1-macro score.
dc.format.extent	30 cm.
dc.publisher	Thesis (M.A.) - Bogazici University. Institute for Graduate Studies in the Social Sciences, 2020.
dc.subject.lcsh	Stock exchanges -- Computer simulation.
dc.subject.lcsh	Stock price forecasting.
dc.subject.lcsh	Machine learning -- Mathematical models.
dc.title	Predicting stock movements with machine learning using textual data
dc.format.pages	x, 86 leaves ;