Multilingual identification of verbal multiword expressions using bidirectional long short-term memory based architectures

Berk, Gözde.

Archives and Documentation Center Digital Archives Home
→
Boğaziçi Üniversitesi Tezleri
→
Fen Bilimleri Enstitüsü
→
Bilgisayar Mühendisliği
→
M.S. Theses
→
View Item

dc.contributor	Graduate Program in Computer Engineering.
dc.contributor.advisor	Güngör, Tunga.
dc.contributor.author	Berk, Gözde.
dc.date.accessioned	2023-03-16T10:04:29Z
dc.date.available	2023-03-16T10:04:29Z
dc.date.issued	2019.
dc.identifier.other	CMPE 2019 B47
dc.identifier.uri	http://digitalarchive.boun.edu.tr/handle/123456789/12416
dc.description.abstract	Verbal multiword expression (VMWE) identi cation is a challenging task for many natural language processing studies. In this study, sequence tagging approach accompanied with stochastic models and variants of IOB tagging scheme is used for VMWE identi cation. In the scope of this thesis, a VMWE annotated Turkish corpus is constructed as the rst part of the PARSEME shared task 1.1 which is constructing VMWE annotated corpora in many languages. Additionally, a multilingual system called Deep-BGT is developed as the second part of the shared task which is developing language-independent VMWE identi cation systems using the corpora constructed in the rst part. The Turkish corpus is one of the biggest corpora in the shared task. The training and test corpora that were published in the PARSEME shared task 1.0 are updated as the PARSEME shared task 1.1 training and development corpora according to the new guidelines. A new test corpus is constructed from scratch. Deep-BGT uses the bidirectional Long Short-Term Memory model with a Conditional Random Field layer on top (BiLSTM-CRF). To the best of our knowledge, this study is the rst one that employs the BiLSTM-CRF model for VMWE identi cation. Deep-BGT was ranked the second in the open track in terms of the general ranking metric. Moreover, a novel tagging scheme called bigappy-unicrossy is introduced to rise to the challenge of overlapping VMWEs. Finally, the VMWE identi cation system is advanced by evaluating a subset of hyperparameters which consists of tagging scheme, number of units, number of BiLSTM layers, and classi er. A comprehensive analysis of BiLSTM based architectures for multilingual identi cation of VMWEs is presented accordingly.
dc.format.extent	30 cm.
dc.publisher	Thesis (M.S.) - Bogazici University. Institute for Graduate Studies in Science and Engineering, 2019.
dc.subject.lcsh	Natural language processing (Computer science)
dc.title	Multilingual identification of verbal multiword expressions using bidirectional long short-term memory based architectures
dc.format.pages	xiii, 68 leaves ;