Archives and Documentation Center
Digital Archives

An approach for dictionary-based concept mining in Turkish

Show simple item record

dc.contributor Graduate Program in Computer Engineering.
dc.contributor.advisor Güngör, Tunga.
dc.contributor.author Aydın, Cem Rıfkı.
dc.date.accessioned 2023-03-16T10:01:45Z
dc.date.available 2023-03-16T10:01:45Z
dc.date.issued 2014.
dc.identifier.other CMPE 2014 A84
dc.identifier.uri http://digitalarchive.boun.edu.tr/handle/123456789/12262
dc.description.abstract Concept Mining is a field of NLP, where the documents, be it simple text files, emails, papers, journals, or any other textual materials are scanned, and the most comprehensive concepts concerning these documents are to be shown. Here concepts can be thought of as general ideas extracted from the documents. Concepts can also be extracted from visual, or audio materials, but this thesis focuses on extracting concepts from only textual materials, in an efficient way in terms of time, quality, and accuracy. In NLP field, the difference between keyword, and concept should be noticed in that keyword has to be present in the material being scanned, whereas concepts don't have to be present in this material. This is quite a big challenge which may call for the use of NLP, or statistical methods which may be beneficiary for extracting expressive concepts. This field has been studied on especially in western languages such as English, French, German, Spanish amongst many, and quite successful results have been achieved. As for Turkish this topic is still quite immature vis-à-vis the languages mentioned above. It has to be taken into account that Turkish is an agglutinative language, hence the documents first need to be pre-processed in order to process the stems. Among these words, we take only nouns into account since concepts are generally considered nouns. This thesis makes use of statistical methods, and Turkish Dictionary. The statistical method counts the frequency of words whereas the use of dictionary may suggest some probable concept words that are not present in the documents. The success rate (precision) of this thesis concept extraction method is 63.97%.
dc.format.extent 30 cm.
dc.publisher Thesis (M.S.) - Bogazici University. Institute for Graduate Studies in Science and Engineering, 2014.
dc.subject.lcsh Computational linguistics -- Methodology.
dc.subject.lcsh Text processing (Computer science)
dc.title An approach for dictionary-based concept mining in Turkish
dc.format.pages ix, 50 leaves ;


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search Digital Archive


Browse

My Account