Abstract:
Each classification algorithm has its own underlying assumption and misclassifies different patterns and overall accuracy can be increased by a suitable fusion of multiple classifiers. The combination is performed over the scores of classifiers, which are mostly posterior probabilities. Although the aim of classifier fusion is improved accuracy, there is no guarantee that this will be the case. In this study, we propose a new combination scheme which uses a subset of the classifier scores instead of using all of them. We experiment with three different methods for discriminant selection and combination, using decision trees and feature selection. We see that decision trees are better in choosing the best subset of features and accuracy is improved especially when the chosen discriminant outputs are combined with a trained linear model. In trying to understand the behavior of the fixed rules, we apply the idea of decomposing a loss function into bias, variance and noise. This study gives a brief survey of the bias, variance and noise decompositions in the literature for squared and 0/1 loss. We show that they are unable to explain the error behaviour of fusion rules, especially for minimum and maximum rules. We give the reasons why some fusion strategies work better than others under the assumptions of uniform or Gaussian noise. We propose instead a measure based on the area of intersection to explain the behavior of the fixed rules.