Deep learning-based dependency parsing for Turkish

Bilgin, Şaziye Betül.

Archives and Documentation Center Digital Archives Home
→
Boğaziçi Üniversitesi Tezleri
→
Fen Bilimleri Enstitüsü
→
Bilgisayar Mühendisliği
→
Ph.D. Theses
→
View Item

dc.contributor	Ph.D. Program in Computer Engineering.
dc.contributor.advisor	Özgür, Arzucan
dc.contributor.advisor	Güngör, Tunga.
dc.contributor.author	Bilgin, Şaziye Betül.
dc.date.accessioned	2023-10-15T07:06:21Z
dc.date.available	2023-10-15T07:06:21Z
dc.date.issued	2022
dc.identifier.other	CMPE 2022 B56 PhD
dc.identifier.uri	http://digitalarchive.boun.edu.tr/handle/123456789/19723
dc.description.abstract	Dependency parsing is an important step for many natural language processing (NLP) systems such as question answering and machine translation. Turkish, being a morphologically rich language and having a complex grammar, is challenging for au tomatic processing. Limited NLP tools and resources for Turkish make the task even more challenging. Data-driven deep learning models show promising performance in dependency parsing. Yet, the amount of data to train a data-driven dependency parser directly affects performance, and deep learning-based systems require extensive data to achieve good performance. In this thesis, we focused on Turkish dependency parsing and proposed two solutions to the challenges this task poses. First, we increased the size and quality of labeled data for Turkish dependency parsing. In this respect, we cre ated the BOUN Treebank by annotating 9,761 sentences. In addition, we re- annotated the IMST and PUD treebanks using the same annotation scheme. As a result, we presented the largest collection of Turkish treebanks with consistent annotation. Sec ond, we developed novel state-of-the-art dependency parsing models for Turkish as well as other low-resource languages. As our first parsing approach, we introduced a hybrid dependency parser where Turkish grammar rules and morphological features of words are integrated into the deep learning model. Despite the limited training data, the hybrid parser achieved higher success than the current methods for Turkish dependency parsing. As our second parsing approach, we proposed a deep dependency parser with semi-supervised enhancement. By conducting experiments on a number of low-resource languages besides Turkish, we achieved state-of-the-art results on all datasets. We have shown that deep learning-based models can be improved not only by additional training data, but also by integrating intelligently extracted information.
dc.publisher	Thesis (Ph.D.) - Bogazici University. Institute for Graduate Studies in Science and Engineering, 2022.
dc.subject.lcsh	Natural language processing (Computer science)
dc.title	Deep learning-based dependency parsing for Turkish
dc.format.pages	xx, 165 leaves