Statistical comparison of classifiers using receiver operating characteristics information

Aslan, Özlem.

Arşiv ve Dokümantasyon Merkezi Dijital Arşivi Ana Sayfası
→
Boğaziçi Üniversitesi Tezleri
→
Fen Bilimleri Enstitüsü
→
Bilgisayar Mühendisliği
→
M.S. Theses
→
Öğe Göster

dc.contributor	Graduate Program in Computer Engineering.
dc.contributor.advisor	Alpaydın, Ethem.
dc.contributor.author	Aslan, Özlem.
dc.date.accessioned	2023-03-16T10:00:04Z
dc.date.available	2023-03-16T10:00:04Z
dc.date.issued	2009.
dc.identifier.other	CMPE 2009 A85
dc.identifier.uri	http://digitalarchive.boun.edu.tr/handle/123456789/12137
dc.description.abstract	Statistical tests in the literature mainly use error rate for comparison and assume equal loss for false positives and negatives. Receiver Operating Characteristics (ROC) curves and/or the Area Under the ROC Curve (AUC) can also be used for comparing classffier performances under a spectrum of loss values. A ROC curve and hence an AUC value is typically calculated from one training/test pair and to average over randomness in folds, we propose to use k-fold cross-validation to generate a set of ROC curves and AUC values to which we can fit a distribution and test hypotheses on. Experiment results on 15 datasets using 5 different classification algorithms show that our proposed test using AUC values is to be preferred over the usual paired t test on error rate because it can detect equivalences and differences which the error test cannot. The approach we use for ROC curves can also be applied to Precision-Recall curves, used mostly in information retrieval by applying k-fold cross-validated test on the area under the Precision-Recall curve. When multiple classifiers are to be compared over one dataset or multiple datasets, we can use Analysis of Variance (ANOVA). When we use more than one performance metric, we use the multivariate ANOVA, that is, MANOVA. Performance metrics of ANOVA is error or AUC. Performance metrics of MANOVA are true positive, false positive, true negative and false negative rates. We also perform the nonparametric version of ANOVA which is called Friedman test. We apply Sign test when we compare multiple classifiers over multiple datasets. We observe that using more than one performance metric includes their correlation in the statistical test and therefore produces more accurate results.
dc.format.extent	30cm.
dc.publisher	Thesis (M.S.)-Bogazici University. Institute for Graduate Studies in Science and Engineering, 2009.
dc.relation	Includes appendices.
dc.relation	Includes appendices.
dc.subject.lcsh	Receiver operating characteristic curves.
dc.title	Statistical comparison of classifiers using receiver operating characteristics information
dc.format.pages	xiv, 81 leaves;