Archives and Documentation Center
Digital Archives

Biomolecular language processing for drug-target affinity prediction

Show simple item record

dc.contributor Graduate Program in Computer Engineering.
dc.contributor.advisor Özgür, Arzucan.
dc.contributor.advisor Özkırımlı, Elif.
dc.contributor.author Özçelik, Rıza.
dc.date.accessioned 2023-10-15T06:54:29Z
dc.date.available 2023-10-15T06:54:29Z
dc.date.issued 2022
dc.identifier.other CMPE 2022 O83
dc.identifier.uri http://digitalarchive.boun.edu.tr/handle/123456789/19713
dc.description.abstract Finding high-affinity protein-chemical pairs is a prominent stage of the drug discovery pipeline. However, the number of available proteins and chemicals forms an experimentally insurmountable combination space and necessitates computational approaches. Drug-target affinity prediction models come into play here and rapidly highlight the high-affinity pairs. This thesis introduces state-of-the-art drug-target affinity prediction models and training strategies to facilitate drug discovery studies. The introduced approaches leverage biomolecular language processing techniques which interpret the chemicals and proteins as documents formed in biomolecular languages. The units of bimolecular languages, named biomolecular words, are discovered in large corpora and pharmacologically verified as meaningful substructures. The biomolecular words are used to develop a novel drug-target affinity prediction framework: ChemBoost. ChemBoost models leverage the biomolecule word-driven representations and achieve state-of-the-art prediction performance. The experiments also demonstrate that unseen biomolecules challenge all drug-target affinity prediction models and reveal a generalizability problem. A language-inspired model training framework, DebiasedDTA, is introduced to target the problem. The evaluations indicate that DebiasedDTA boosts models on seen and unseen biomolecules, especially when the target pair is dissimilar to training biomolecules. ChemBoost and DebiasedDTA are published as an open-source python package, pydta.
dc.publisher Thesis (M.S.) - Bogazici University. Institute for Graduate Studies in Science and Engineering, 2022.
dc.subject.lcsh Drug targeting.
dc.subject.lcsh Molecular biology -- Data processing.
dc.subject.lcsh Bioinformatics -- Data processing.
dc.title Biomolecular language processing for drug-target affinity prediction
dc.format.pages xvi, 105 leaves


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search Digital Archive


Browse

My Account