Archives and Documentation Center
Digital Archives

Developing a concept extraction system for Turkish

Show simple item record

dc.contributor Graduate Program in Computer Engineering.
dc.contributor.advisor Güngör, Tunga.
dc.contributor.author Uzun-Per, Meryem.
dc.date.accessioned 2023-03-16T10:00:27Z
dc.date.available 2023-03-16T10:00:27Z
dc.date.issued 2011.
dc.identifier.other CMPE 2011 U88
dc.identifier.uri http://digitalarchive.boun.edu.tr/handle/123456789/12188
dc.description.abstract In recent years, due to growing vast amount of available electronic media and data, the necessity of analyzing electronic documents automatically is increased. In order to assess if a document contains valuable information or not, concepts, key phrases or main idea of the document have to be known. There are some studies on extracting key phrases or main ideas of documents for Turkish. However, to the best of our knowledge, there is no concept extraction system for Turkish although there are some studies for foreign languages. In this thesis, a concept extraction system is proposed for Turkish. Since Turkish characters do not fit with the computer language and Turkish is an agglutinative and complex language a pre-processing step is needed. After pre-processing step, only nouns of corpus, which are cleared from their inflectional morphemes, are used because most concepts are defined by nouns or noun phrases. In order to define documents with concepts, clustering nouns is considered to be useful. By applying some statistical methods and NLP methods, documents are identified by concepts. Several tests are done on the corpus that is tested in the bases of words, clusters, and concepts. As a result, the system generates concepts with 51 per cent success, but unfortunately it generates more concepts than it should be. Since concepts are abstract entities, in other words they do not have to be written in the texts as they appear, assigning concepts is a very difficult issue. Moreover, if we take into account the complexity of the Turkish language this result can be seen as quite satisfactory.
dc.format.extent 30cm.
dc.publisher Thesis (M.S.) - Bogazici University. Institute for Graduate Studies in Science and Engineering, 2011.
dc.relation Includes appendices.
dc.relation Includes appendices.
dc.subject.lcsh Programming languages (Electronic computers) -- Turkey.
dc.subject.lcsh Textbooks -- Turkey.
dc.title Developing a concept extraction system for Turkish
dc.format.pages xi, 59 leaves ;


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search Digital Archive


Browse

My Account