Archives and Documentation Center
Digital Archives

Analyzing stemming and sentence simplification methodologies for Turkish multi-document text summarization

Show simple item record

dc.contributor Graduate Program in Computer Engineering.
dc.contributor.advisor Özgür, Arzucan.
dc.contributor.author Nuzumlalı, Muhammed Yavuz.
dc.date.accessioned 2023-03-16T10:02:04Z
dc.date.available 2023-03-16T10:02:04Z
dc.date.issued 2015.
dc.identifier.other CMPE 2015 N88
dc.identifier.uri http://digitalarchive.boun.edu.tr/handle/123456789/12292
dc.description.abstract Automatic text summarization is the task of generating a compact and coherent version of a given text document or a set of text documents. Although there is a vast number of studies for automatic document summarization on English, there is only a limited number of studies for other languages, especially for Turkish. Text simpli cation aims to reduce the grammatical or lexical complexities of the sentences. Automatic text simpli cation systems can be an important part of any NLP task to improve system performance. In this thesis, we analyzed the e ects of applying di erent levels of stemming approaches such as xed-length word truncation and morphological analysis and the e ects of applying text simpli cation techniques for multi-document summarization (MDS) on Turkish, which is an agglutinative and morphologically rich language. We constructed a manually annotated MDS data set, and to the best of our knowledge, reported the rst results on Turkish MDS. Additionally, we developed a rule-based text simpli cation system for Turkish that utilizes the syntactic features of the sentences to identify simpli cation patterns. Our results show that a simple xedlength word truncation approach performs slightly better than no stemming, whereas applying complex morphological analysis does not improve Turkish MDS in terms of ROUGE scores. Applying simpli cation rules that split complex sentences to individual simpler sentences as a preprocessing step slightly improves summarization performance, whereas applying a compression-based simpli cation approach relying solely on rule matching decreases the obtained ROUGE scores.
dc.format.extent 30 cm.
dc.publisher Thesis (M.S.) - Bogazici University. Institute for Graduate Studies in Science and Engineering, 2015.
dc.subject.lcsh Automatic abstracting.
dc.subject.lcsh Computational linguistics.
dc.title Analyzing stemming and sentence simplification methodologies for Turkish multi-document text summarization
dc.format.pages xii, 63 leaves ;


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search Digital Archive


Browse

My Account