MorphLaz :|a finite-state morphological analyzer for Laz

Önal, Esra.

Archives and Documentation Center Digital Archives Home
→
Boğaziçi Üniversitesi Tezleri
→
Sosyal Bilimler Enstitüsü
→
Bilişsel Bilim
→
M.A. Theses
→
View Item

dc.contributor	Graduate Program in Cognitive Science.
dc.contributor.advisor	Özgür, Arzucan.
dc.contributor.advisor	Öztürk, Balkız.
dc.contributor.author	Önal, Esra.
dc.date.accessioned	2023-03-16T11:36:56Z
dc.date.available	2023-03-16T11:36:56Z
dc.date.issued	2021.
dc.identifier.other	COGS 2021 O63
dc.identifier.uri	http://digitalarchive.boun.edu.tr/handle/123456789/15713
dc.description.abstract	This thesis is a part of documentation and revitalization efforts of the endangered Laz language, a member of South Caucasian language family mainly spoken on the northeastern coastline of Turkey. It introduces the implementation of the first automatic language analysis tool for Laz, specifically for Pazar dialect designed as a rule-based morphological analyzer developed with two-level morphology using finite-state networks. Additional language resources such as lexicon and corpus were collected for the purposes of increasing the coverage power and evaluating the performance of the analyzer. Morphologically rich languages create many challenges for natural language processing (NLP) tasks. In order to develop high or low-level NLP systems such as lemmatization, part-of-speech-tagging, spelling correction and machine translation, in any NLP pipeline, the first aim is usually to do some sort of morphological analysis on text or speech. Among different approaches to the computational study of morphology, for this study, due to the low amount of language and computational resources, I chose a rule-based approach that is highly accepted and used for formalizing morphotactics and morphophonemics, namely two-level morphology and finite-state transducers. The evaluation is based on naïve coverage of the analyzer over text data and error analysis. The results show 78.2% of coverage over the unique tokens in Pazar Laz corpus (PLC), 92.1% of coverage over Laz Treebank and 74.3% on Fındıklı Laz corpus (FLC). Error analysis on PLC results indicates that most of the word forms that could not be analyzed are due to missing word stems
dc.format.extent	30 cm.
dc.publisher	Thesis (M.A.) - Bogazici University. Institute for Graduate Studies in the Social Sciences, 2021.
dc.subject.lcsh	Laz language.
dc.title	MorphLaz :\|a finite-state morphological analyzer for Laz
dc.format.pages	xii, 94 leaves ;