Archives and Documentation Center
Digital Archives

Statistical comparison of classifiers using receiver operating characteristics information

Show simple item record

dc.contributor Graduate Program in Computer Engineering.
dc.contributor.advisor Alpaydın, Ethem.
dc.contributor.author Aslan, Özlem.
dc.date.accessioned 2023-03-16T10:00:04Z
dc.date.available 2023-03-16T10:00:04Z
dc.date.issued 2009.
dc.identifier.other CMPE 2009 A85
dc.identifier.uri http://digitalarchive.boun.edu.tr/handle/123456789/12137
dc.description.abstract Statistical tests in the literature mainly use error rate for comparison and assume equal loss for false positives and negatives. Receiver Operating Characteristics (ROC) curves and/or the Area Under the ROC Curve (AUC) can also be used for comparing classffier performances under a spectrum of loss values. A ROC curve and hence an AUC value is typically calculated from one training/test pair and to average over randomness in folds, we propose to use k-fold cross-validation to generate a set of ROC curves and AUC values to which we can fit a distribution and test hypotheses on. Experiment results on 15 datasets using 5 different classification algorithms show that our proposed test using AUC values is to be preferred over the usual paired t test on error rate because it can detect equivalences and differences which the error test cannot. The approach we use for ROC curves can also be applied to Precision-Recall curves, used mostly in information retrieval by applying k-fold cross-validated test on the area under the Precision-Recall curve. When multiple classifiers are to be compared over one dataset or multiple datasets, we can use Analysis of Variance (ANOVA). When we use more than one performance metric, we use the multivariate ANOVA, that is, MANOVA. Performance metrics of ANOVA is error or AUC. Performance metrics of MANOVA are true positive, false positive, true negative and false negative rates. We also perform the nonparametric version of ANOVA which is called Friedman test. We apply Sign test when we compare multiple classifiers over multiple datasets. We observe that using more than one performance metric includes their correlation in the statistical test and therefore produces more accurate results.
dc.format.extent 30cm.
dc.publisher Thesis (M.S.)-Bogazici University. Institute for Graduate Studies in Science and Engineering, 2009.
dc.relation Includes appendices.
dc.relation Includes appendices.
dc.subject.lcsh Receiver operating characteristic curves.
dc.title Statistical comparison of classifiers using receiver operating characteristics information
dc.format.pages xiv, 81 leaves;


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search Digital Archive


Browse

My Account