Abstract:
Topic models are probabilistic generative models used to analyze a collection of documents. People have leveraged topic models for many years to extract hidden structures from documents. However, classical topic models such as Latent Dirichlet Allocation (LDA) have issues with short texts typical in user- generated social media content. Due to the limited context of short texts, they fail to learn interpretable topics from extensive vocabularies with the bag of word representations that do not repre sent them well. This thesis proposes a topic model based on Graph Neural Networks (GNN) where documents are represented as graphs with entity-specific relations using Wikidata as a knowledge graph. A graph attention network learns the embeddings of these documents whose outputs are passed to the probabilistic generative topic model Entity Embedded Topic Modeling (EETM) as probability distribution parameters to yield the topics. We evaluate our model with various short text collections fetched from Twitter related to politics, sports, pandemics, and trending news events. We provide a detailed discussion regarding our observations related to the learned embeddings and qualities of topics resulting from our model.