A semantic sentence similarity estimation approach for the biomedical domain

Soğancıoğlu, Gizem.

Archives and Documentation Center Digital Archives Home
→
Boğaziçi Üniversitesi Tezleri
→
Fen Bilimleri Enstitüsü
→
Bilgisayar Mühendisliği
→
M.S. Theses
→
View Item

dc.contributor	Graduate Program in Computer Engineering.
dc.contributor.advisor	Özgür, Arzucan.
dc.contributor.author	Soğancıoğlu, Gizem.
dc.date.accessioned	2023-03-16T10:02:36Z
dc.date.available	2023-03-16T10:02:36Z
dc.date.issued	2016.
dc.identifier.other	CMPE 2016 S74
dc.identifier.uri	http://digitalarchive.boun.edu.tr/handle/123456789/12330
dc.description.abstract	During the last decades, the use of semantic text similarity has been adopted as a major component in many Natural Language Processing tasks, including text retrieval, summarization, and document categorization. Integration of semantic information acts as a powerful tool for a better understanding and structuring of text. Among the many domains that benefit from text mining studies, biomedical literature is one of the most challenging areas because of its domain-specific language. As an inevitable result of the complex nature of the biomedical literature, domain-specific adaptations are crucial requirements. There are several semantic text similarity approaches that have been applied on the word-level. However, and to the best of our knowledge, there has not been any research on sentence-level semantic similarity in the biomedical domain. Furthermore, our experimental results revealed that domain-independent state-of-theart approaches in sentence-level semantic similarity do not effectively cover biomedical knowledge and produce poor results. In this study, we propose several different approaches for domain-specific semantic sentence-level similarity computation, including measures utilizing distributional vector representations of sentences, methods combining general and domain specific ontologies, as well as a supervised approach exploiting high-level features. Our proposed methods are evaluated using a manually annotated data set which consists of 100 sentence pairs from biomedical literature. The experiments showed that the supervised semantic similarity computation approach obtained the best performance and improved over the previous domain-independent systems up to 42.6% in terms of the Pearson correlation metric.
dc.format.extent	30 cm.
dc.publisher	Thesis (M.S.) - Bogazici University. Institute for Graduate Studies in Science and Engineering, 2016.
dc.subject.lcsh	Semantic integration (Computer systems)
dc.subject.lcsh	Natural language processing (Computer science)
dc.title	A semantic sentence similarity estimation approach for the biomedical domain
dc.format.pages	xiv, 80 leaves ;