Abstract:
Classification algorithms are the most commonly used Data Mining models that are widely used to extract valuable knowledge from huge amounts of data. Comparing the classification algorithms has been interesting the data mining community for many years. The criteria to evaluate the classifiers are mostly the accuracy, complexity, robustness, scalability, integration, comprehensibility, stability and interestingness abilities of it. This thesis study is concerned with the accuracy, complexity and robustness of the classifiers. The data miner selects the model mostly with respect to its classification accuracy; therefore, the performance of each classifier plays a very crucial role. As complexity, the cpu time consumed by each classifier is implied in the study. The study firstly discusses the application of some classification models on multiple datasets in 3 stages: firstly implementing the algorithms on pure datasets, secondly implementing the algorithms on the same datasets where continuous numerical variables are discretised, thirdly implementing the algorithms on the same datasets where Principal Component Analysis is applied. On the results, the accuracies and complexities are compared. The relationship of dataset characteristics and implementation attributes between accuracy and complexity is also debated, and finally, a regression model is introduced for predicting the classifier accuracy and complexity with given dataset and implementation conditions. Finally, the study is also concerned with the robustness of the classifiers which is measured by repetitive experiments on noisy and cleaned datasets.