Archives and Documentation Center
Digital Archives

Identifying image related sentences in news articles

Show simple item record

dc.contributor Graduate Program in Computer Engineering.
dc.contributor.advisor Akarun, Lale.
dc.contributor.advisor Özgür, Arzucan.
dc.contributor.author İlter, Melike Esma.
dc.date.accessioned 2023-03-16T10:04:20Z
dc.date.available 2023-03-16T10:04:20Z
dc.date.issued 2019.
dc.identifier.other CMPE 2019 I67
dc.identifier.uri http://digitalarchive.boun.edu.tr/handle/123456789/12407
dc.description.abstract With the increasing availability of images on the web, identifying image related sentences has become an important problem. This research area is also important for the news publishing community for automatic captioning of news images and summa rization. Although a large body of research has been devoted to image captioning, it is still a challenging problem. Previous works on image captioning mostly focus on gener ating new captions for the images. The problem of identifying image related sentences in news articles is discussed in this thesis for the first time and our approach is novel because we do not try to generate a caption from scratch, but we try to select the most appropriate set of sentences for the image from the news text itself. This technique helps not to lose the relationship between the news article and the image caption. We have used the CNN news dataset which only contains the text parts of news as basis and we have augmented the dataset by collecting the images of the news articles. We generated a two class ground truth for the image and sentences of news article by using Tf-Idf and Word2Vec vectors; and cosine and SEMILAR sentence-to-sentence similarity methods. We utilized HOG and BOVW image descriptors and Word2Vec text feature extraction methods. We implemented Naive Bayes, k-NN and Random Forest classification methods to measure the performance of our proposed system. We have also applied PCA dimensionality reduction method for image features to evaluate the equal weights of image and text features. We have also conducted experiments to solve the unbalanced class distribution of the two classes. The experiment results show that Naive Bayes classifier with HOG features gives better results.
dc.format.extent 30 cm.
dc.publisher Thesis (M.S.) - Bogazici University. Institute for Graduate Studies in Science and Engineering, 2019.
dc.subject.lcsh Information storage and retrieval systems.
dc.subject.lcsh Multimedia systems.
dc.title Identifying image related sentences in news articles
dc.format.pages xiii, 92 leaves ;


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search Digital Archive


Browse

My Account