Archives and Documentation Center
Digital Archives

A deep learning-based extractive text summarization system for Turkish news articles

Show simple item record

dc.contributor Graduate Program in Management Information Systems.
dc.contributor.advisor Durahim, Ahmet Onur.
dc.contributor.author Gündeş, Özcan.
dc.date.accessioned 2023-03-16T12:51:33Z
dc.date.available 2023-03-16T12:51:33Z
dc.date.issued 2020.
dc.identifier.other MIS 2020 G86
dc.identifier.uri http://digitalarchive.boun.edu.tr/handle/123456789/18106
dc.description.abstract The goal of this study is to develop an automated extractive summarization system for Turkish news using pre-trained language models. Pre-trained language models have been applied to wide range Natural Language Processing tasks and achieve state of the art performance results. In this thesis, pre-trained language models for Turkish are applied on extractive summarization task. The proposed model has a pre-trained language model and on top of it, Transformer layers are added to capture document level features and semantic relationships between the sentences in the news articles. Then, these sentences are scored with sigmoid function, which outputs a real value between 0 and 1. To train this model, 2076 news are collected from well-known Turkish news website. After the data collection, each sentence in the articles is labelled as 0 or 1 with a heuristic algorithm. By using these labels, an extractive model is trained. In the test time, Top-5 scoring sentences are combined to generate final summaries. Also, to investigate the effects of hyperparameters, 241 different models, which have different architecture and hyperparameter sets, are run. The best one has achieved 38.38 Rouge-1 F score, 26.8 Rouge-2 F score and 38.04 Rouge-L F score. These scores are promising since they are significantly greater than LEAD-5 baseline, which has 37.49, 26.4 and 37.12 Rouge F scores. For this study, LEAD-5 is very strong baseline since the most significant sentences are placed at the beginning of the news to capture the readers’ attention. Therefore, the proposed model shows a good performance for Turkish news dataset.
dc.format.extent 30 cm.
dc.publisher Thesis (M.A.) - Bogazici University. Institute for Graduate Studies in the Social Sciences, 2020.
dc.subject.lcsh Natural language processing (Computer science)
dc.title A deep learning-based extractive text summarization system for Turkish news articles
dc.format.pages xi, 102 leaves ;


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search Digital Archive


Browse

My Account