Archives and Documentation Center
Digital Archives

Identifying event nuggets in Turkish news texts using natural language processing and machine learning methods

Show simple item record

dc.contributor Graduate Program in Computer Engineering.
dc.contributor.advisor Özgür, Arzucan.
dc.contributor.author Durna, Mehmet.
dc.date.accessioned 2023-03-16T10:04:09Z
dc.date.available 2023-03-16T10:04:09Z
dc.date.issued 2019.
dc.identifier.other CMPE 2019 D87
dc.identifier.uri http://digitalarchive.boun.edu.tr/handle/123456789/12395
dc.description.abstract An event nugget is the smallest textual instance that marks the existence of an event. Detecting event nuggets in a given text opens door to further research and many practical applications such as automatic classification of the events within a given text. Therefore, it has been studied extensively for some languages including English, Spanish and Chinese. In this thesis, event nugget detection and event type classification for Turkish are studied for the first time. Due to lack of annotated data for event nugget detection in Turkish, we developed a new annotated data set for this task. In this thesis we describe how we manually annotated our data set as well as our system to identify event nuggets in Turkish news texts. The data set consists of words from Turkish news texts. Each word in the data set is manually annotated in terms of sequence type, nugget type, realis value and whether the event nugget is the main event, thus enabling us to make analysis on this data set for event nugget detection, event type classification, realis classification and main event detection. We made use of language specific features like morphological features and dependency parser features in Turkish as well as some other features. We aimed to see the effect of language specific features on this kind of analysis. We also experimented with different machine learning algorithms to find the best fitting model for our tasks. After having completed our experiments, we have shown that Turkish specific morphological features, dependency tree related features as well as word embeddings enabled us to achieve better results.
dc.format.extent 30 cm.
dc.publisher Thesis (M.S.) - Bogazici University. Institute for Graduate Studies in Science and Engineering, 2019.
dc.subject.lcsh Natural language processing (Computer science)
dc.subject.lcsh Machine learning.
dc.subject.lcsh News Web sites -- Turkey.
dc.title Identifying event nuggets in Turkish news texts using natural language processing and machine learning methods
dc.format.pages xiii, 56 leaves ;


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search Digital Archive


Browse

My Account