Archives and Documentation Center
Digital Archives

MorphLaz :|a finite-state morphological analyzer for Laz

Show simple item record

dc.contributor Graduate Program in Cognitive Science.
dc.contributor.advisor Özgür, Arzucan.
dc.contributor.advisor Öztürk, Balkız.
dc.contributor.author Önal, Esra.
dc.date.accessioned 2023-03-16T11:36:56Z
dc.date.available 2023-03-16T11:36:56Z
dc.date.issued 2021.
dc.identifier.other COGS 2021 O63
dc.identifier.uri http://digitalarchive.boun.edu.tr/handle/123456789/15713
dc.description.abstract This thesis is a part of documentation and revitalization efforts of the endangered Laz language, a member of South Caucasian language family mainly spoken on the northeastern coastline of Turkey. It introduces the implementation of the first automatic language analysis tool for Laz, specifically for Pazar dialect designed as a rule-based morphological analyzer developed with two-level morphology using finite-state networks. Additional language resources such as lexicon and corpus were collected for the purposes of increasing the coverage power and evaluating the performance of the analyzer. Morphologically rich languages create many challenges for natural language processing (NLP) tasks. In order to develop high or low-level NLP systems such as lemmatization, part-of-speech-tagging, spelling correction and machine translation, in any NLP pipeline, the first aim is usually to do some sort of morphological analysis on text or speech. Among different approaches to the computational study of morphology, for this study, due to the low amount of language and computational resources, I chose a rule-based approach that is highly accepted and used for formalizing morphotactics and morphophonemics, namely two-level morphology and finite-state transducers. The evaluation is based on naïve coverage of the analyzer over text data and error analysis. The results show 78.2% of coverage over the unique tokens in Pazar Laz corpus (PLC), 92.1% of coverage over Laz Treebank and 74.3% on Fındıklı Laz corpus (FLC). Error analysis on PLC results indicates that most of the word forms that could not be analyzed are due to missing word stems
dc.format.extent 30 cm.
dc.publisher Thesis (M.A.) - Bogazici University. Institute for Graduate Studies in the Social Sciences, 2021.
dc.subject.lcsh Laz language.
dc.title MorphLaz :|a finite-state morphological analyzer for Laz
dc.format.pages xii, 94 leaves ;


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search Digital Archive


Browse

My Account