Archives and Documentation Center
Digital Archives

Softly semi-supervised learning for bioinformatics applications

Show simple item record

dc.contributor Graduate Program in Computer Engineering.
dc.contributor.advisor Özgür, Arzucan.
dc.contributor.author Çetinkaya Demir, Melis Özgür.
dc.date.accessioned 2023-03-16T10:01:45Z
dc.date.available 2023-03-16T10:01:45Z
dc.date.issued 2014.
dc.identifier.other CMPE 2014 C47
dc.identifier.uri http://digitalarchive.boun.edu.tr/handle/123456789/12263
dc.description.abstract Binary classi cation of biological data is an important research problem both in the Bioinformatics and Machine Learning elds. This problem is particularly challenging when the number of labeled instances is very few. There are three main machine learning approaches for classi cation: supervised methods, which only use labeled data, unsupervised methods, which only use unlabeled data, and semi-supervised methods, which use both labeled and unlabeled data. In this study, we compare the supervised and various developed semi-supervised methods which are based on k-NN (k Nearest Neighbor), SVM (Support Vector Machine) with linear kernel, and SVM with RBF (Radial Basis Function) kernel for two di erent Bioinformatics problems: predicting reccurrence in colorectal cancer from microarray data and predicting HIV-1-Human protein-protein interactions. As distinct from traditional semi-supervised learning approaches, we introduce the de nition of `softly labeled' data that de nes unlabeled data with additional information about their highly expected labels. We also evaluate our algorithms on a well-known optical digit dataset to classify the numbers `5' and `6' by generating synthetic noise and use as softly labeled data to better understand the behaviors of our algorithms. For all datasets, we concluded that softly labeled data are informative and enhances the evaluation results. Our semi-supervised methods SS-kNN (Semi-supervised kNN) and SS-SVM (Semi-supervised SVM) perform better than other algorithms in terms of accuracy for colorectal cancer and optical digit data, and area under the precision-recall curve for HIV-1-human protein-protein interaction data. Furthermore, in general, our semi-supervised methods achieve better performances than the supervised ones.
dc.format.extent 30 cm.
dc.publisher Thesis (M.S.) - Bogazici University. Institute for Graduate Studies in Science and Engineering, 2014.
dc.subject.lcsh Bioinformatics.
dc.subject.lcsh Machine learning.
dc.title Softly semi-supervised learning for bioinformatics applications
dc.format.pages xiv, 99 leaves ;


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search Digital Archive


Browse

My Account