Abstract:
Drug discovery is the process of designing and developing new medicine. The research and clinical experiments for new drug proposals are costly and take a long time. Although there has been proposed a lot of drug-like molecules, the number of drugs that are confirmed by regulating bodies and released to the market is very low. That is because most of the drug candidate molecules have low pharmacokinetic properties. Therefore, early assessments of ADMET properties have gained extreme importance for pharmaceutical industry, to be able to avoid costly failures. Here, our aim is to come up with an approach that reliably predicts druggability features of drug candidate molecules as well as to point out the relations between ADMET properties and molecular descriptors. In this thesis study, we examine and compare 4 different molecule representations to predict druggability features of molecules, using 3 different machine learning algorithms; namely k-nearest neighbor, support vector machine classifier and random forest on 9 ADMET property datasets. We conclude that among all molecular representations, morgan fingerprint performs better in terms of accuracy and F-measure, however run time for parameter tuning and train is longer with fingerprint representations. As far as the machine learning algorithms are concerned, SVM classifier with morgan fingerprint performs better with higher accuracy and F-measure. With descriptor vector representation, we examine a set of molecular descriptors and using RF classifier, we evaluate most effective molecular descriptor for each ADMET property. For some datasets, we add the most contributive descriptor to morgan fingerprint representation and report an increase on evaluation metrics.