Determination of protein-protein binding sites using machine learning tools

Sümbül, Fidan.

Arşiv ve Dokümantasyon Merkezi Dijital Arşivi Ana Sayfası
→
Boğaziçi Üniversitesi Tezleri
→
Fen Bilimleri Enstitüsü
→
Kimya Mühendisliği
→
M.S. Theses
→
Öğe Göster

Determination of protein-protein binding sites using machine learning tools

Sümbül, Fidan.

URI: http://digitalarchive.boun.edu.tr/handle/123456789/14830

Tarih: 2008.

Özet:

Protein-protein interactions are involved in almost all biological processes. Thus, the understanding of the principles underlying these interactions is of great significance. This is mainly to identify the functional sites in proteins and study how proteins function. The whole surface of the protein is not available for interaction with other proteins. There are some distinctive properties that differentiate binding residues from the rest of surface residues. To explore and further to predict the binding interfaces, the present work is composed of two sections. The first part is the identification of differentiating properties for three main groups of residues in a protein, namely, core, binding and non-binding surface residues on a database of 263 proteins. These properties are sequence and structure related characteristics, and as well dynamic peculiarities, of residues such as; the residue propensity, hydrophobicity, side chain polarity and charge, conservation, accessible surface area, and the fluctuations. Some residues prefer being at interface or core rather than the non-interface surface. The hydrophobic residues are favored at interface or in core of the protein. Positively charged polar residues are abundant at interface while the non-polar or polar but neutral ones are mostly found in the core. The interface and core residues have also higher conservation scores. The residues that have higher fluctuations with rest of the residues in the fastest and in the slowest modes by Gaussian Network Model (GNM) are mainly located at interface of proteins. These aforementioned properties are also analyzed in terms of the type of interactions, namely, homogeneous versus heterogeneous complexes and transient versus permanent complexes for a further understanding of the interaction sites. In the second part, these properties are used to predict the binding residues of proteins using support vector machines (SVM) and multiple kernels learning (MKL). Both of these methods are supervised classifier. The maximum accuracy obtained by SVM is 81.3 %, which is the highest observed accuracy in binding site prediction over the literature. The contributions of the grouped properties to the final results are determined by MKL. The type of amino acid, conservation score, accessible surface area and state of the amino acid (core or surface), relative correlations between fluctuations in both fast and slow modes, and the packing of the residue have the most contribution.

Tüm öğe kaydını göster