Self-trained discriminative constituency parser with hierarchical joint learning approach

Çelebi, Arda.

Archives and Documentation Center Digital Archives Home
→
Boğaziçi Üniversitesi Tezleri
→
Fen Bilimleri Enstitüsü
→
Bilgisayar Mühendisliği
→
M.S. Theses
→
View Item

dc.contributor	Graduate Program in Computer Engineering.
dc.contributor.advisor	Özgür, Arzucan.
dc.contributor.author	Çelebi, Arda.
dc.date.accessioned	2023-03-16T10:01:19Z
dc.date.available	2023-03-16T10:01:19Z
dc.date.issued	2012.
dc.identifier.other	CMPE 2012 C45
dc.identifier.uri	http://digitalarchive.boun.edu.tr/handle/123456789/12235
dc.description.abstract	Determining the syntactic structure of a sentence is a fundamental step towards understanding what is conveyed in that sentence. The syntactic parse tree of a sentence can be used in several tasks such as information extraction, machine translation, summarization and question answering. Therefore, syntactic parsing has been one of the most studied topics in the literature. Today's top performing parsers employ statistical approaches and achieve over 90% accuracy. While statistical approaches reach their highs in supervised settings, semi-supervised approaches like self-training of parsers is starting to emerge as a next challenge. Such parsers train on their own outputs with the goal of achieving better results by learning on their own. However, only a small number of self-trained parsers have met this goal so far. In this thesis, we tackle the problem of self-training a feature-rich discriminative constituency parser, which to our knowledge has never been studied before. We approach the self-training problem with the assumption that we can't expect the whole parse tree given by a parser to be completely correct but, rather, some parts of it are more likely to be. We hypothesize that instead of feeding the parser the whole guessed parse trees of its own, we can break them down into smaller ones, namely n-gram trees, and perform self-training on them. We thus have an n-gram parser and transfer the distinct expertise of the n-gram parser to the full sentence parser by using the Hierarchical Joint Learning (HJL) approach. The resulting parser is called a jointly self-trained parser. We first study joint learning in completely supervised setting and observe slight improvement of the jointly trained parser over the baseline. When the real n-gram trees are replaced with guessed ones, the resulting jointly self-trained parser performs no di erently than the baseline.
dc.format.extent	30 cm.
dc.publisher	Thesis (M.S.) - Bogazici University. Institute for Graduate Studies in Science and Engineering, 2012.
dc.subject.lcsh	Natural language processing (Computer science)
dc.subject.lcsh	Parsing (Computer grammar)
dc.title	Self-trained discriminative constituency parser with hierarchical joint learning approach
dc.format.pages	xv, 75 leaves ;