Archives and Documentation Center
Digital Archives

Extracting protein-ligand interactions from the biomedical literature using deep learning approaches

Show simple item record

dc.contributor Graduate Program in Computer Engineering.
dc.contributor.advisor Özgür, Arzucan.
dc.contributor.advisor Özkırımlı, Elif.
dc.contributor.author Yüksel, Atakan.
dc.date.accessioned 2023-03-16T10:04:32Z
dc.date.available 2023-03-16T10:04:32Z
dc.date.issued 2019.
dc.identifier.other CMPE 2019 Y85
dc.identifier.uri http://digitalarchive.boun.edu.tr/handle/123456789/12419
dc.description.abstract Protein-ligand interactions play crucial roles in living organisms, thus they attract many researchers from various disciplines. There are protein-ligand interaction databases that provide information to researchers in a suitable format. These databases extract the interactions manually from biomedical literature but the extraction process is becoming harder each day because of the increase in the number of biomedical publication, thereby the need for an automated extraction system has arisen. The aim of this thesis is to fulfill this need via deep learning models. This thesis includes performance analysis of Convolutional Neural Network (CNN) and Bidirectional Long Short Term Memory (BiLSTM) Networks for the task of protein-ligand interaction extraction. Comparison of features in terms of their effect on the performance of the models is also included in the thesis. The gold standard corpus that is created for BioCreative VI ChemProt task is selected as our dataset for training and evaluation of our models. Word embeddings, distance embeddings, part of speech (POS) tags and inside outside beginning (IOB) chunk tags are used as features in the models. The grid search algorithm is applied to find the optimal hyperparameters for each model in the experiments. The best models and input representations are selected via using the development set then they are evaluated on the test set. Based on the results on the test set, we concluded that BiLSTM performs better than CNN for each evaluated feature setting.
dc.format.extent 30 cm.
dc.publisher Thesis (M.S.) - Bogazici University. Institute for Graduate Studies in Science and Engineering, 2019.
dc.subject.lcsh Machine learning.
dc.subject.lcsh Proteins.
dc.subject.lcsh Ligands (Biochemistry)
dc.subject.lcsh Ligand binding (Biochemistry)
dc.title Extracting protein-ligand interactions from the biomedical literature using deep learning approaches
dc.format.pages xiv, 65 leaves ;


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search Digital Archive


Browse

My Account