Abstract:
It is well known that prejudiced and discriminatory language is being widely used and spread through several channels such as printed or social media. The discrimina tory language, in particular hate speech as its more aggressive, degrading and openly targeting form, which poses a threat to the values of democracy and human rights is a global problem that needs an immediate solution. Since we find the detection of hate speech important in the fight against hate speech, we have developed a model to detect it. For this purpose, we created a dataset by retrieving printed media news that the Hrant Dink Foundation systematically annotated in the context of hate speech from the website of the PRNet media monitoring company. To the best of our knowledge, with this study, the first model developed for Turkish language that runs on a labeled dataset is produced. In particular, the fact that most of the hate speech in printed media is based on context and implications requires a system that can detect changing discursive cues and understand the context around these discourses. With different word representations, we have examined the Hierarchical Attention Network (HAN) model, which aims to capture the changing meanings of expressions by using the hi erarchical structure of the text. We studied the compatibility of our model with the problem by comparing it with Convolution Neural Network (CNN), which provided important results in text processing, and with machine learning models. In order to improve our study, we developed linguistic features for the problem based on critical discourse analysis techniques. We enhanced the HAN model using these features. Our results show that performance increases with a set of features that point out the use of ‘othering language’. We believe that these feature sets created for the Turkish language will encourage new studies in the quantitative analysis of hate speech.