Archives and Documentation Center
Digital Archives

Morphologically motivated input variations in Turkish - English neural machine translation

Show simple item record

dc.contributor Graduate Program in Computer Engineering.
dc.contributor.advisor Güngör, Tunga.
dc.contributor.author Yirmibeşoğlu, Zeynep.
dc.date.accessioned 2023-03-16T10:05:31Z
dc.date.available 2023-03-16T10:05:31Z
dc.date.issued 2021.
dc.identifier.other CMPE 2021 Y57
dc.identifier.uri http://digitalarchive.boun.edu.tr/handle/123456789/12459
dc.description.abstract Success of neural networks in natural language processing has paved the way for neural machine translation (NMT), which rapidly became the mainstream approach in machine translation. Tremendous improvement in translation performance has been achieved with breakthroughs such as encoder-decoder networks, attention mechanism and Transformer architecture. However, the necessity of large amounts of parallel data for training an NMT system, and rare words in translation corpora are issues yet to be overcome. In this study, neural machine translation of the low-resource Turkish-English language pair is approached. State-of-the-art NMT architectures are employed and data augmentation methods that exploit monolingual corpora are used. The importance of input representation for the morphologically-rich Turkish language is pointed out, and a comprehensive analysis of linguistically and non-linguistically motivated input segmentation approaches has been made. Experiments on different input variations have proven the importance of morphologically motivated input seg mentation for the Turkish language that carries a rich morphology. Moreover, supe riority of the Transformer architecture over attentional encoder-decoder models has been shown for the Turkish-English language pair. Among the employed data aug mentation approaches, back-translation has proven to be the most effective, and the benefit of increasing amount of parallel data on translation quality is confirmed. This thesis demonstrates a comprehensive analysis on NMT architectures with different hy perparameters, data augmentation methods and input representation techniques, and proposes ways of tackling the low-resource setting of Turkish-English NMT.
dc.format.extent 30 cm.
dc.publisher Thesis (M.S.) - Bogazici University. Institute for Graduate Studies in Science and Engineering, 2021.
dc.subject.lcsh Neural networks (Computer science)
dc.subject.lcsh Machine learning.
dc.subject.lcsh Natural language processing (Computer science)
dc.title Morphologically motivated input variations in Turkish - English neural machine translation
dc.format.pages xiv, 86 leaves ;


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search Digital Archive


Browse

My Account