An approach for dictionary-based concept mining in Turkish

Aydın, Cem Rıfkı.

Archives and Documentation Center Digital Archives Home
→
Boğaziçi Üniversitesi Tezleri
→
Fen Bilimleri Enstitüsü
→
Bilgisayar Mühendisliği
→
M.S. Theses
→
View Item

dc.contributor	Graduate Program in Computer Engineering.
dc.contributor.advisor	Güngör, Tunga.
dc.contributor.author	Aydın, Cem Rıfkı.
dc.date.accessioned	2023-03-16T10:01:45Z
dc.date.available	2023-03-16T10:01:45Z
dc.date.issued	2014.
dc.identifier.other	CMPE 2014 A84
dc.identifier.uri	http://digitalarchive.boun.edu.tr/handle/123456789/12262
dc.description.abstract	Concept Mining is a field of NLP, where the documents, be it simple text files, emails, papers, journals, or any other textual materials are scanned, and the most comprehensive concepts concerning these documents are to be shown. Here concepts can be thought of as general ideas extracted from the documents. Concepts can also be extracted from visual, or audio materials, but this thesis focuses on extracting concepts from only textual materials, in an efficient way in terms of time, quality, and accuracy. In NLP field, the difference between keyword, and concept should be noticed in that keyword has to be present in the material being scanned, whereas concepts don't have to be present in this material. This is quite a big challenge which may call for the use of NLP, or statistical methods which may be beneficiary for extracting expressive concepts. This field has been studied on especially in western languages such as English, French, German, Spanish amongst many, and quite successful results have been achieved. As for Turkish this topic is still quite immature vis-à-vis the languages mentioned above. It has to be taken into account that Turkish is an agglutinative language, hence the documents first need to be pre-processed in order to process the stems. Among these words, we take only nouns into account since concepts are generally considered nouns. This thesis makes use of statistical methods, and Turkish Dictionary. The statistical method counts the frequency of words whereas the use of dictionary may suggest some probable concept words that are not present in the documents. The success rate (precision) of this thesis concept extraction method is 63.97%.
dc.format.extent	30 cm.
dc.publisher	Thesis (M.S.) - Bogazici University. Institute for Graduate Studies in Science and Engineering, 2014.
dc.subject.lcsh	Computational linguistics -- Methodology.
dc.subject.lcsh	Text processing (Computer science)
dc.title	An approach for dictionary-based concept mining in Turkish
dc.format.pages	ix, 50 leaves ;