dc.description.abstract |
Hate speech directed at ethnicities, nationalities, religious identities, and specific groups has increased not only in social media, but also in print media. This creates a need for automated hate speech detection systems that can quickly review and filter print media content before it is provided to readers if it contains hate speech. However, most of the existing automatic hate speech detection models are limited to detecting hate speech without considering the hate speech target group- specific discourse that is often used in news articles. Moreover, there are few datasets that include Turkish print media articles in the hate speech domain. In this study, a new BERT based model enriched with a set of target-oriented lin guistic features for hate speech detection is proposed. The e↵ects of weighting di↵erent BERT hidden vectors are also investigated, instead of using only the first hidden vector of the BERT encoder, which is the classical approach. New BERT based models that integrate di↵erent attention techniques are proposed for combining hidden vectors. A new preprocessed Turkish dataset for hate speech is also published, in which the target group for all hate speech articles is annotated. Experiments on a comprehensive Turk ish dataset of news articles labeled for hate speech show that competitive performance in terms of accuracy and F1-score is achieved compared to previous approaches. |
|