Text-based machine learning methodologies for modelling drug-target interactions

Öztürk, Hakime.

Arşiv ve Dokümantasyon Merkezi Dijital Arşivi Ana Sayfası
→
Boğaziçi Üniversitesi Tezleri
→
Fen Bilimleri Enstitüsü
→
Bilgisayar Mühendisliği
→
Ph.D. Theses
→
Öğe Göster

dc.contributor	Ph.D. Program in Computer Engineering.
dc.contributor.advisor	Özgür, Arzucan.
dc.contributor.advisor	Özkırımlı, Elif.
dc.contributor.author	Öztürk, Hakime.
dc.date.accessioned	2023-03-16T10:14:00Z
dc.date.available	2023-03-16T10:14:00Z
dc.date.issued	2019.
dc.identifier.other	CMPE 2019 O88 PhD
dc.identifier.uri	http://digitalarchive.boun.edu.tr/handle/123456789/12631
dc.description.abstract	The identi cation of novel interactions between proteins and drugs with computational methodologies constitutes a signi cant area of research. Most often, a drug can be re-purposed to target a novel protein which enables machine learning algorithms to learn from existing interactions to predict unknown interactions. The main goal of this thesis is to model the interactions between proteins and ligands (drug candidates) using their textual representations via machine/deep learning techniques. With that aim, we introduce a novel ligand representation approach and a novel protein representation approach as well as two prediction systems for identifying the strengths of the interactions between proteins and compounds (i.e., their binding a nities). The common theme of these studies is the use of textual representations of proteins (i.e., amino-acid sequences) and compounds (i.e., SMILES). A major advantage of textbased representations is that they are experimentally easier to obtain compared to the three-dimensional (3D) representations and therefore there are more protein/ligand text-based representations available than 3D representations. Furthermore, processing text-based representations is computationally less expensive compared to processing two-dimensional (2D) and 3D representations. We hypothesize that, much like natural languages, bio-chemical sequences have their own languages and processing these languages might reveal important insights about their characteristics. The application of Natural Language Processing (NLP) based approaches in tasks such as protein family/super-family clustering and protein-ligand binding a nity prediction achieved state-of-the-art performance. These results indicate that the textual forms of proteins and ligands can be used to formulate e ective solutions to address di erent bioinformatics and cheminformatics problems.
dc.format.extent	30 cm.
dc.publisher	Thesis (Ph.D.) - Bogazici University. Institute for Graduate Studies in Science and Engineering, 2019.
dc.subject.lcsh	Drugs -- Analysis.
dc.subject.lcsh	Proteins -- Analysis.
dc.subject.lcsh	Machine learning.
dc.title	Text-based machine learning methodologies for modelling drug-target interactions
dc.format.pages	xix, 138 leaves ;