Archives and Documentation Center
Digital Archives

Topic identification within microblog post collections

Show simple item record

dc.contributor Ph.D. Program in Computer Engineering.
dc.contributor.advisor Bingöl, Haluk.
dc.contributor.advisor Üsküdarlı, Suzan.
dc.contributor.author Yıldırım, Ahmet.
dc.date.accessioned 2023-03-16T10:13:52Z
dc.date.available 2023-03-16T10:13:52Z
dc.date.issued 2017.
dc.identifier.other CMPE 2017 Y55 PhD
dc.identifier.uri http://digitalarchive.boun.edu.tr/handle/123456789/12624
dc.description.abstract This thesis aims to identify topics in collections of microblog posts, where topics correspond to a set of related topic elements. The rst approach, Boun-TI, examines the use of Wikipedia { well written cross-domain articles { to capture topics within microblog posts that are messy, unstructured, and fragmented. The topic elements are identi ed based on their tf-idf scores, where the microblog post set is considered as a single document for tf computation. For idf computation, a public stream post set is used where each post is considered as a document. The tf-idf vectors of Wikipedia articles are computed, and the cosine similarity of the tf-idf vectors determine the topics. This approach was evaluated with more than 1 million tweets gathered during the 2012 US presidential election, resulting in a precision of 0:96 and F1 = 1. The second approach, S-Boun-TI, examines the generation of semantically structured topics, so that they can be further processed to yield more information. S-Boun- TI considers distinguishing elements of a post set as linked entities. Co-occurrence of two elements in the same post is considered as a relation. The related element sets which form topics are maximal cliques of the graph of elements and relations. To express topics, an ontology for microblog topics is introduced. The topics can be utilized in conjunction with LOD. Over 1M posts during the 2016 U.S. presidential election debates, and other events such as the death of Carrie Fisher and the Dakota Access Pipeline demonstrations were considered for evaluation. Quantitative and qualitative observations are provided and example SPARQL queries and their results are presented to show the utilization of the topics. Both approaches gave promising results and are suitable for future research and development. S-Boun-TI has been found to represent related elements better then Boun-TI.
dc.format.extent 30 cm.
dc.publisher Thesis (Ph.D.) - Bogazici University. Institute for Graduate Studies in Science and Engineering, 2017.
dc.subject.lcsh Information technology.
dc.subject.lcsh Software engineering.
dc.title Topic identification within microblog post collections
dc.format.pages xviii, 146 leaves ;


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search Digital Archive


Browse

My Account