Improving image captioning with language modeling regularizations

Ulusoy, Okan.

Archives and Documentation Center Digital Archives Home
→
Boğaziçi Üniversitesi Tezleri
→
Fen Bilimleri Enstitüsü
→
Elektrik- Elektronik Mühendisliği
→
M.S. Theses
→
View Item

dc.contributor	Graduate Program in Electrical and Electronic Engineering.
dc.contributor.advisor	Anarım, Emin.
dc.contributor.advisor	Akgül, Ceyhun Burak.
dc.contributor.author	Ulusoy, Okan.
dc.date.accessioned	2023-03-16T10:20:22Z
dc.date.available	2023-03-16T10:20:22Z
dc.date.issued	2019.
dc.identifier.other	EE 2019 U68
dc.identifier.uri	http://digitalarchive.boun.edu.tr/handle/123456789/12969
dc.description.abstract	Inspired by the recent work in language modeling, we investigate the eﬀects of a set of regularization techniques on the performance of a recurrent neural network based image captioning model. Using these techniques, we achieve 13 Bleu-4 points improvements over using no regularizations. We show that our model does not suﬀer from loss-evaluation mismatch and also connect the model performance to dataset properties by running experiments on MSCOCO dataset. Further, we propose two diﬀerent applications for our image captioning model, namely human in the loop system and zero shot object detection. The former application further improves CIDEr score of our best model by 30 points using only the ﬁrst two tokens of a reference sentence of an image. In the latter one, we train our image captioning model as an object detector which classiﬁes each objects in an image without ﬁnding their location. The main advantage of this detector is that it does not require object locations during the training phase.
dc.format.extent	30 cm.
dc.publisher	Thesis (M.S.) - Bogazici University. Institute for Graduate Studies in Science and Engineering, 2019.
dc.subject.lcsh	Automatic speech recognition.
dc.title	Improving image captioning with language modeling regularizations
dc.format.pages	xvi, 99 leaves ;