Abstract:
One of the challenges for scientists in the biomedical domain is the huge amount and the rapid growth of information buried in the text of electronic resources. Developing text mining methods to automatically extract biomedical entities from the text of these electronic resources and identifying the relations between the extracted entities is crucial for facilitating research in many areas in the biomedical domain. Two main problems, which have to be solved to accomplish this goal, are the extraction and normalization of entities, and the identi cation of the relations between them from a given text. In this thesis, we proposed two approaches with two di erent perspectives for the extraction and normalization of biomedical named entities. The rst approach makes use of shallow linguistic knowledge to extract entities and normalize them through an ontology. On the other hand, the second approach makes use of word embeddings, which convey semantic information, for the normalization of the entities in a given text. The word-embedding based approach obtained the state-of-the-art results on the BioNLP Shared Task 2016 Bacteria Biotope data set. Both of the proposed methods are unsupervised and can be adapted to di erent domains. We also developed two applications, one of which is a pipeline, which is composed of modules based on the approaches that we proposed in this thesis, for the extraction of bacteria biotope information from scienti c abstracts. The other application is developed for extracting Brucella-host interaction relevant data from the biomedical literature, whose results reveal the importance of using a wider context than a sentence for biomedical relation extraction.