Archives and Documentation Center
Digital Archives

A comprehensive analysis of using WordNet, part-of-speech tagging, and word sense disambiguation in text categorization

Show simple item record

dc.contributor Graduate Program in Computer Engineering.
dc.contributor.advisor Güngör, Tunga.
dc.contributor.author Çelik, Kerem.
dc.date.accessioned 2023-03-16T10:01:05Z
dc.date.available 2023-03-16T10:01:05Z
dc.date.issued 2012.
dc.identifier.other CMPE 2012 C47
dc.identifier.uri http://digitalarchive.boun.edu.tr/handle/123456789/12219
dc.description.abstract By the huge increase of data volume in the digital environment and the machine learning techniques, studies on automatic categorization of text documents is increased. Text categorization is simply assigning prede ned label to unseen documents by using some learning models. Traditional text categorization is based on statistical analysis of documents to represent the document with some vectors. And then, one of the machine learning techniques is used for categorization of documents.In addition to the traditional text categorization techniques, in this thesis, we group words by their part of speech tag and investigate the e ect of each part of speech individually and jointly in the classi cation accuracy. Furthermore, we incorporate semantic features such as synonyms, hypernyms, hyponyms, meronyms and topics into the documents by using WordNet. Thus we add meaning of terms. One of the problems faced in this study is that not all the semantic features really related to the document, in other words synsets generate ambiguity. To solve the problem we introduce a new method to eliminate the ambiguity. In this thesis the main objective is to investigate the contribution of semantic features. By incorporating semantic features we add meaning to the documents and thus the classi cation accuracy increased.
dc.format.extent 30 cm.
dc.publisher Thesis (M.S.) - Bogazici University. Institute for Graduate Studies in Science and Engineering, 2012.
dc.relation Includes appendices.
dc.relation Includes appendices.
dc.subject.lcsh Computational linguistics.
dc.subject.lcsh Text processing (Computer science)
dc.title A comprehensive analysis of using WordNet, part-of-speech tagging, and word sense disambiguation in text categorization
dc.format.pages xxi, 73 leaves ;


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search Digital Archive


Browse

My Account