Arşiv ve Dokümantasyon Merkezi
Dijital Arşivi

Multilingual identification of verbal multiword expressions using bidirectional long short-term memory based architectures

Basit öğe kaydını göster

dc.contributor Graduate Program in Computer Engineering.
dc.contributor.advisor Güngör, Tunga.
dc.contributor.author Berk, Gözde.
dc.date.accessioned 2023-03-16T10:04:29Z
dc.date.available 2023-03-16T10:04:29Z
dc.date.issued 2019.
dc.identifier.other CMPE 2019 B47
dc.identifier.uri http://digitalarchive.boun.edu.tr/handle/123456789/12416
dc.description.abstract Verbal multiword expression (VMWE) identi cation is a challenging task for many natural language processing studies. In this study, sequence tagging approach accompanied with stochastic models and variants of IOB tagging scheme is used for VMWE identi cation. In the scope of this thesis, a VMWE annotated Turkish corpus is constructed as the rst part of the PARSEME shared task 1.1 which is constructing VMWE annotated corpora in many languages. Additionally, a multilingual system called Deep-BGT is developed as the second part of the shared task which is developing language-independent VMWE identi cation systems using the corpora constructed in the rst part. The Turkish corpus is one of the biggest corpora in the shared task. The training and test corpora that were published in the PARSEME shared task 1.0 are updated as the PARSEME shared task 1.1 training and development corpora according to the new guidelines. A new test corpus is constructed from scratch. Deep-BGT uses the bidirectional Long Short-Term Memory model with a Conditional Random Field layer on top (BiLSTM-CRF). To the best of our knowledge, this study is the rst one that employs the BiLSTM-CRF model for VMWE identi cation. Deep-BGT was ranked the second in the open track in terms of the general ranking metric. Moreover, a novel tagging scheme called bigappy-unicrossy is introduced to rise to the challenge of overlapping VMWEs. Finally, the VMWE identi cation system is advanced by evaluating a subset of hyperparameters which consists of tagging scheme, number of units, number of BiLSTM layers, and classi er. A comprehensive analysis of BiLSTM based architectures for multilingual identi cation of VMWEs is presented accordingly.
dc.format.extent 30 cm.
dc.publisher Thesis (M.S.) - Bogazici University. Institute for Graduate Studies in Science and Engineering, 2019.
dc.subject.lcsh Natural language processing (Computer science)
dc.title Multilingual identification of verbal multiword expressions using bidirectional long short-term memory based architectures
dc.format.pages xiii, 68 leaves ;


Bu öğenin dosyaları

Bu öğe aşağıdaki koleksiyon(lar)da görünmektedir.

Basit öğe kaydını göster

Dijital Arşivde Ara


Göz at

Hesabım