Abstract:
Sentiment Analysis (SA) is one of the Natural Language Processing (NLP) tasks whose goal is to understand subjective information from a piece of text. The increased accessibility to the Internet and thus to social media leads people to create an enormous amount of textual data. This data can store valuable information that waits to be extracted. Targeted Sentiment Analysis (TSA) specifically aims to extract sentiment towards a particular target from a given text. Sentiment analysis in English texts is a well-studied area and mainly requires human- annotated data for training. For languages such as Turkish, there is a lack of such annotated data. In the context of this study, we introduce an annotated dataset that consists of Twitter data in Turkish. It contains almost 4K sentences that are labeled for both sentiment analysis and targeted sentiment analysis. The proposed dataset allows us to train a TSA model on Turkish texts. We propose BERT-based models with different architectures, one of which is to be used as our baseline for TSA and the others are to improve this baseline. We observe that the performance of conventional SA models degrades when used for TSA data. We investigate the performance of several BERT-based architectures for this task. Our best performing model with target markers and max-pooling layer outperforms the F1-score of conventional BERT-based SA models by 13%.