Identification of verbal multiword expressions using deep learning architectures and representation learning methods

Erden, Berna.

Archives and Documentation Center Digital Archives Home
→
Boğaziçi Üniversitesi Tezleri
→
Fen Bilimleri Enstitüsü
→
Bilgisayar Mühendisliği
→
M.S. Theses
→
View Item

Identification of verbal multiword expressions using deep learning architectures and representation learning methods

Erden, Berna.

URI: http://digitalarchive.boun.edu.tr/handle/123456789/12414

Date: 2019.

Abstract:

Understanding multiword expressions (MWEs) plays an instrumental role in Natural Language Processing applications such as parsing and machine translation. MWE identi cation is a task that automatically detects and classi es MWEs in running text. As with the basic characteristics of MWEs, signi cant challenges exist in MWE identi cation. Considering the recent attempts of the PARSEME network on verbal multiword expressions (VMWEs), we focus on the identi cation of VMWEs. We update the PARSEME Turkish train and test corpora 1.0 (2017) as the PARSEME Turkish train and development corpora 1.1 (2018). We construct the PARSEME Turkish test corpus 1.1. In addition, we develop a multilingual VMWE identi cation system based on bidirectional long short term memory with conditional random elds networks accompanied with the gappy 1-level tagging scheme. To extend our study, we examine the impact of data representation format on the VMWE identi cation task. We introduce the bigappy-unicrossy tagging scheme to recognize overlaps in sequence labelling tasks. Our results show that data representation format is important to identify discontinuous VMWEs. Moreover, we enhance our neural VMWE identi cation model with automatically learned embeddings by neural networks to respond to the variability challenge. We compare character-level convolutional neural networks and character-level bidirectional long short-term (BiLSTM) networks. We analyze two di erent schemes to represent morphological information using BiLSTM networks. Our results demonstrate that character embeddings and morphological embeddings improve performance in general. The choice of representation learning method depends on language.

Show full item record