Improving text categorization performance by combining feature selection methods

Özbilen, Ece.

Arşiv ve Dokümantasyon Merkezi Dijital Arşivi Ana Sayfası
→
Boğaziçi Üniversitesi Tezleri
→
Fen Bilimleri Enstitüsü
→
Bilgisayar Mühendisliği
→
M.S. Theses
→
Öğe Göster

dc.contributor	Graduate Program in Computer Engineering.
dc.contributor.advisor	Güngör, Tunga.
dc.contributor.author	Özbilen, Ece.
dc.date.accessioned	2023-03-16T10:00:36Z
dc.date.available	2023-03-16T10:00:36Z
dc.date.issued	2011.
dc.identifier.other	CMPE 2011 O83
dc.identifier.uri	http://digitalarchive.boun.edu.tr/handle/123456789/12203
dc.description.abstract	Even though the arrival of the machine learning methods in text categorization is one of the essential factors that improves the effectiveness of text categorization, high dimensionality is still a challenge for classification performance. There are several ways to reduce the dimension of input vector in classification and feature selection is one of the most popular and effective methods of reducing dimension. Various researches have been done to improve the performance of feature selection methods on text categorization but they mostly deal with how to advance the performance of the individual feature selection methods whereas we know that combining the outputs of multiple algorithms/classifiers is one of the promising strategies that has been studied extensively in information retrieval. With this motivation, we present a comprehensive analysis of the comparison between the feature selection methods and their varied binary combinations for text categorization with a comparative discussion. We analyze the performances of five common feature selection methods with their combinations on five standard datasets with varied skewness in both global and local policies by using SVM. Comparing the performance of the individual methods with the performance of the combination methods shows that combining two feature selection methods significantly improves the performance of the individual methods. In addition, rank combination achieves better performance in the case of global policy on the other hand score combination significantly achieves better performance in the case of local policy. In this thesis, the main concern is to investigate the effectiveness of combining the individual metrics on the performances of text categorization. Thus, we also propose new combination methods that some of them clearly outperform the success of the score and rank combinations.
dc.format.extent	30 cm.
dc.publisher	Thesis (M.S.) - Bogazici University. Institute for Graduate Studies in Science and Engineering, 2011.
dc.relation	Includes appendices.
dc.relation	Includes appendices.
dc.subject.lcsh	Information storage and retrieval systems.
dc.subject.lcsh	Computational linguistics.
dc.subject.lcsh	Documentation -- Data processing.
dc.title	Improving text categorization performance by combining feature selection methods
dc.format.pages	xxiii, 188 leaves ;