Archives and Documentation Center
Digital Archives

A semantic sentence similarity estimation approach for the biomedical domain

Show simple item record

dc.contributor Graduate Program in Computer Engineering.
dc.contributor.advisor Özgür, Arzucan.
dc.contributor.author Soğancıoğlu, Gizem.
dc.date.accessioned 2023-03-16T10:02:36Z
dc.date.available 2023-03-16T10:02:36Z
dc.date.issued 2016.
dc.identifier.other CMPE 2016 S74
dc.identifier.uri http://digitalarchive.boun.edu.tr/handle/123456789/12330
dc.description.abstract During the last decades, the use of semantic text similarity has been adopted as a major component in many Natural Language Processing tasks, including text retrieval, summarization, and document categorization. Integration of semantic information acts as a powerful tool for a better understanding and structuring of text. Among the many domains that benefit from text mining studies, biomedical literature is one of the most challenging areas because of its domain-specific language. As an inevitable result of the complex nature of the biomedical literature, domain-specific adaptations are crucial requirements. There are several semantic text similarity approaches that have been applied on the word-level. However, and to the best of our knowledge, there has not been any research on sentence-level semantic similarity in the biomedical domain. Furthermore, our experimental results revealed that domain-independent state-of-theart approaches in sentence-level semantic similarity do not effectively cover biomedical knowledge and produce poor results. In this study, we propose several different approaches for domain-specific semantic sentence-level similarity computation, including measures utilizing distributional vector representations of sentences, methods combining general and domain specific ontologies, as well as a supervised approach exploiting high-level features. Our proposed methods are evaluated using a manually annotated data set which consists of 100 sentence pairs from biomedical literature. The experiments showed that the supervised semantic similarity computation approach obtained the best performance and improved over the previous domain-independent systems up to 42.6% in terms of the Pearson correlation metric.
dc.format.extent 30 cm.
dc.publisher Thesis (M.S.) - Bogazici University. Institute for Graduate Studies in Science and Engineering, 2016.
dc.subject.lcsh Semantic integration (Computer systems)
dc.subject.lcsh Natural language processing (Computer science)
dc.title A semantic sentence similarity estimation approach for the biomedical domain
dc.format.pages xiv, 80 leaves ;


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search Digital Archive


Browse

My Account