Tuning model complexity using cross-validation for supervised learning

Yıldız, Olcay Taner.

Archives and Documentation Center Digital Archives Home
→
Boğaziçi Üniversitesi Tezleri
→
Fen Bilimleri Enstitüsü
→
Bilgisayar Mühendisliği
→
Ph.D. Theses
→
View Item

dc.contributor	Ph.D. Program in Computer Engineering.
dc.contributor.advisor	Alpaydın, Ethem.
dc.contributor.author	Yıldız, Olcay Taner.
dc.date.accessioned	2023-03-16T10:13:42Z
dc.date.available	2023-03-16T10:13:42Z
dc.date.issued	2005.
dc.identifier.other	CMPE 2005 Y55 PhD
dc.identifier.uri	http://digitalarchive.boun.edu.tr/handle/123456789/12596
dc.description.abstract	In this thesis, we review the use of cross-validation for model selection and propose the MultiTest method which solves the problem of choosing the best of multiple candidate supervised models. The MultiTest algorithm orders supervised learning algorithms (for classification and regression) taking into account both the result of pairwise statistical tests on expected error, and our prior preferences such as complexity of the algorithm. In order to validate the MultiTest method, we compared it with Anova, Newman-Keuls algorithms which check whether multiple methods have the same expected error. Though Anova and Newman-Keuls results can be extended to find a "best" algorithm, this does not always work. On the other hand, our proposed method is always able to find an algorithm as the "best" one. By using MultiTest method, we try to solve the problem of optimizing model complexity. For doing this, either we compare all possible models using MultiTest and select the best model or if the model space is very large, we make an effective search on the model space via MultiTest. If all possible models can be searched, MultiTest-based model selection always selects the simplest model with expected error not significantly worse than any other model. We also propose a hybrid, omnivariate architecture, for decision tree induction and rule induction. This is a hybrid architecture that contains different models at different places matching the complexity of the model to the complexity of the data reaching that model. We compare our proposed MultiTest-based omnivariate architecture with the well-known techniques for model selection on standard datasets.
dc.format.extent	30cm.
dc.publisher	Thesis (Ph.D.)-Bogazici University. Institute for Graduate Studies in Science and Engineering, 2005.
dc.relation	Includes appendices.
dc.relation	Includes appendices.
dc.subject.lcsh	Supervised learning (Machine learning)
dc.subject.lcsh	Computational learning theory.
dc.title	Tuning model complexity using cross-validation for supervised learning
dc.format.pages	xxiii, 186 leaves;