Archives and Documentation Center
Digital Archives

Relation extraction for chemical and protein interactions from biomedical documents

Show simple item record

dc.contributor Graduate Program in Computer Engineering.
dc.contributor.advisor Özgür, Arzucan.
dc.contributor.advisor Özkırımlı, Elif.
dc.contributor.author Dönmez, Hilal.
dc.date.accessioned 2023-03-16T10:05:25Z
dc.date.available 2023-03-16T10:05:25Z
dc.date.issued 2021.
dc.identifier.other CMPE 2021 D66
dc.identifier.uri http://digitalarchive.boun.edu.tr/handle/123456789/12454
dc.description.abstract The sharing of chemical-protein interactions (CPI) with the scientific commu nities plays a crucial role in understanding the mechanisms of diseases, as well as in facilitating drug discovery and drug repurposing studies. Significant amount of knowl edge on CPI is published in unstructured documents. The goal of this thesis is to extract relations between chemicals and proteins from information provided in sen tences. For this purpose, we focus on two tasks: (i) binary relation extraction and (ii) multi-class relation extraction from biomedical documents. The aim of the first task is to identify whether a sentence states a relation between a pair of biochemicals or not. On the other hand, the second task extends the first one by also aiming at identifying the type of the relation between the pair of biochemicals. For both tasks, we develop transformer-based models by utilising the BioBERT and SciBERT architectures. Fur thermore, we investigate the effectiveness of different input representation approaches such as sentence and dependency tree-based representations. Our results demonstrate that BioBERT based model with whole sentence input representation achieves the best performance for both tasks on the benchmark ChemProt test data set with an F1-score of 77.8% for binary relation extraction and micro-averaged F1-score of 76.1% for multi class relation extraction. Interestingly, the significantly shorter dependency tree based input representations achieve close F1-scores to whole sentence input representation. Finally, we introduce Vapur, which is a search engine for protein-chemical interactions extracted from COVID-19 related scientific publications. Vapur shows that our relation extraction models can be effectively used in real-world biomedical applications.
dc.format.extent 30 cm.
dc.publisher Thesis (M.S.) - Bogazici University. Institute for Graduate Studies in Science and Engineering, 2021.
dc.subject.lcsh Chemical agents (Munitions)
dc.subject.lcsh Protein-protein interactions.
dc.subject.lcsh Biomedical engineering.
dc.title Relation extraction for chemical and protein interactions from biomedical documents
dc.format.pages xxi, 159 leaves ;


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search Digital Archive


Browse

My Account