Identification of verbal multiword expressions using deep learning architectures and representation learning methods

Erden, Berna.

Arşiv ve Dokümantasyon Merkezi Dijital Arşivi Ana Sayfası
→
Boğaziçi Üniversitesi Tezleri
→
Fen Bilimleri Enstitüsü
→
Bilgisayar Mühendisliği
→
M.S. Theses
→
Öğe Göster

dc.contributor	Graduate Program in Computer Engineering.
dc.contributor.advisor	Güngör, Tunga.
dc.contributor.author	Erden, Berna.
dc.date.accessioned	2023-03-16T10:04:27Z
dc.date.available	2023-03-16T10:04:27Z
dc.date.issued	2019.
dc.identifier.other	CMPE 2019 E75
dc.identifier.uri	http://digitalarchive.boun.edu.tr/handle/123456789/12414
dc.description.abstract	Understanding multiword expressions (MWEs) plays an instrumental role in Natural Language Processing applications such as parsing and machine translation. MWE identi cation is a task that automatically detects and classi es MWEs in running text. As with the basic characteristics of MWEs, signi cant challenges exist in MWE identi cation. Considering the recent attempts of the PARSEME network on verbal multiword expressions (VMWEs), we focus on the identi cation of VMWEs. We update the PARSEME Turkish train and test corpora 1.0 (2017) as the PARSEME Turkish train and development corpora 1.1 (2018). We construct the PARSEME Turkish test corpus 1.1. In addition, we develop a multilingual VMWE identi cation system based on bidirectional long short term memory with conditional random elds networks accompanied with the gappy 1-level tagging scheme. To extend our study, we examine the impact of data representation format on the VMWE identi cation task. We introduce the bigappy-unicrossy tagging scheme to recognize overlaps in sequence labelling tasks. Our results show that data representation format is important to identify discontinuous VMWEs. Moreover, we enhance our neural VMWE identi cation model with automatically learned embeddings by neural networks to respond to the variability challenge. We compare character-level convolutional neural networks and character-level bidirectional long short-term (BiLSTM) networks. We analyze two di erent schemes to represent morphological information using BiLSTM networks. Our results demonstrate that character embeddings and morphological embeddings improve performance in general. The choice of representation learning method depends on language.
dc.format.extent	30 cm.
dc.publisher	Thesis (M.S.) - Bogazici University. Institute for Graduate Studies in Science and Engineering, 2019.
dc.subject.lcsh	Natural language processing (Computer science)
dc.title	Identification of verbal multiword expressions using deep learning architectures and representation learning methods
dc.format.pages	xv, 77 leaves ;