Hate speech detection in Turkish news using a transformer-based model enhanced with linguistic features

Yüksel, Atıf Emre.

Archives and Documentation Center Digital Archives Home
→
Boğaziçi Üniversitesi Tezleri
→
Fen Bilimleri Enstitüsü
→
Bilgisayar Mühendisliği
→
M.S. Theses
→
View Item

dc.contributor	Graduate Program in Computer Engineering.
dc.contributor.advisor	Özgür, Arzucan.
dc.contributor.author	Yüksel, Atıf Emre.
dc.date.accessioned	2023-10-15T06:40:59Z
dc.date.available	2023-10-15T06:40:59Z
dc.date.issued	2022
dc.identifier.other	CMPE 2022 Y85
dc.identifier.uri	http://digitalarchive.boun.edu.tr/handle/123456789/19690
dc.description.abstract	Hate speech directed at ethnicities, nationalities, religious identities, and specific groups has increased not only in social media, but also in print media. This creates a need for automated hate speech detection systems that can quickly review and filter print media content before it is provided to readers if it contains hate speech. However, most of the existing automatic hate speech detection models are limited to detecting hate speech without considering the hate speech target group- specific discourse that is often used in news articles. Moreover, there are few datasets that include Turkish print media articles in the hate speech domain. In this study, a new BERT based model enriched with a set of target-oriented lin guistic features for hate speech detection is proposed. The e↵ects of weighting di↵erent BERT hidden vectors are also investigated, instead of using only the first hidden vector of the BERT encoder, which is the classical approach. New BERT based models that integrate di↵erent attention techniques are proposed for combining hidden vectors. A new preprocessed Turkish dataset for hate speech is also published, in which the target group for all hate speech articles is annotated. Experiments on a comprehensive Turk ish dataset of news articles labeled for hate speech show that competitive performance in terms of accuracy and F1-score is achieved compared to previous approaches.
dc.publisher	Thesis (M.S.) - Bogazici University. Institute for Graduate Studies in Science and Engineering, 2022.
dc.subject.lcsh	Linguistic geography.
dc.subject.lcsh	Hate speech -- Social aspects -- Turkey.
dc.subject.lcsh	Turkish newspapers.
dc.title	Hate speech detection in Turkish news using a transformer-based model enhanced with linguistic features
dc.format.pages	xiii, 54 leaves