Fine-grained hate speech detection in Arabic using transformer-based models

Rajae Bensoltane, Taher Zaki

Abstract


With the proliferation of social media platforms, characterized by features such as anonymity, user-friendly access, and the facilitation of online community building and discourse, the matter of detecting and monitoring hate speech has emerged as an increasingly formidable challenge for society, individuals, and researchers. Despite the crucial importance of hate speech detection task, the majority of work in this field has been conducted in English, with insufficient focus on other languages, particularly Arabic. Furthermore, most existing studies on Arabic hate speech detection have addressed this task as a binary classification problem, which is unreliable. Therefore, the aim of this study is to provide an enhanced model for detecting fine-grained hate speech in Arabic. To this end, three transformer-based models were evaluated to generate contextualized word embeddings from input sequence. Additionally, these models were combined with a bidirectional gated recurrent unit (BiGRU) layer to further improve the extracted semantic and context features. The experiments were conducted on an Arabic reference dataset provided by the open-source Arabic corpora and processing tools (OSACT-5) shared task. A comparative analysis indicates the efficiency of the proposed model over the baseline and related work models by achieving a macro F1-score of 61.68%.

Keywords


Arabic; Bidirectional gated recurrent unit; Fine-grained hate speech; Natural language processing; Transformer

Full Text:

PDF


DOI: http://doi.org/10.11591/ijece.v14i3.pp2927-2936

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

International Journal of Electrical and Computer Engineering (IJECE)
p-ISSN 2088-8708, e-ISSN 2722-2578

This journal is published by the Institute of Advanced Engineering and Science (IAES) in collaboration with Intelektual Pustaka Media Utama (IPMU).