Hierarchical multitask learning for language modeling with transformers

Aksoy, Çağla.

Arşiv ve Dokümantasyon Merkezi Dijital Arşivi Ana Sayfası
→
Boğaziçi Üniversitesi Tezleri
→
Fen Bilimleri Enstitüsü
→
Bilgisayar Mühendisliği
→
M.S. Theses
→
Öğe Göster

dc.contributor	Graduate Program in Computer Engineering.
dc.contributor.advisor	Güngör, Tunga.
dc.contributor.author	Aksoy, Çağla.
dc.date.accessioned	2023-03-16T10:04:52Z
dc.date.available	2023-03-16T10:04:52Z
dc.date.issued	2020.
dc.identifier.other	CMPE 2020 A57
dc.identifier.uri	http://digitalarchive.boun.edu.tr/handle/123456789/12440
dc.description.abstract	Recent works show that learning contextualized embeddings for words is bene ficial for natural language processing (NLP) tasks. Bidirectional Encoder Representa tions from Transformers (BERT) is one successful example of this approach. It learns embeddings by solving two tasks, which are masked language model (masked LM) and the next sentence prediction (NSP). This procedure is known as pre-training. The pre-training of BERT can also be framed as a multitask learning problem. In this thesis, we adopt hierarchical multitask learning approaches for BERT pre-training. Pre-training tasks are solved at different layers instead of the last layer, and informa tion from the NSP task is transferred to the masked LM task. Also, we propose a new pre-training task, bigram shift, to encode word order information. To evaluate the effectiveness of our proposed models, we choose two downstream tasks, one of which requires sentence-level embeddings (textual entailment), and the other requires contex tualized embeddings of words (question answering). Due to computational restrictions, we use the downstream task data instead of a large dataset for the pre-training to see the performance of proposed models when given a restricted dataset. We test their per formance on several probing tasks to analyze learned embeddings. Our results show that imposing a task hierarchy in pre-training improves the performance of embeddings.
dc.format.extent	30 cm.
dc.publisher	Thesis (M.S.) - Bogazici University. Institute for Graduate Studies in Science and Engineering, 2020.
dc.subject.lcsh	Multilevel models (Statistics)
dc.subject.lcsh	Natural language processing (Computer science)
dc.title	Hierarchical multitask learning for language modeling with transformers
dc.format.pages	xvi, 64 leaves ;