Arabic offensive text classification using emojis: Including emoji data in Arabic natural language processing

Amal Albalawi, Wael M. S. Yafooz

Abstract


In the digital social media ecosystem, controlling offensive language requires advanced algorithmic tools. This study examines the influence of including emojis translation in the text preprocessing stage of the classification of offensive Arabic text. A novel dataset of 10,000 Arabic tweets was developed, with rigorous annotations to classify content as offensive or non-offensive. The dataset was meticulously annotated and validated using Cohen's kappa (CK) and Krippendorff's Alpha (α) to ensure consistency and accuracy. Several experiments evaluated the dataset with the most common text classification models: seven machine learning (ML) classifiers and three deep learning (DL) models. Two experimental sets were conducted: one with emoji translation in preprocessing to enrich text input and another without emoji translation to directly assess the impact of emojis on classification accuracy. The findings indicate that emojis significantly affect text classification models, with advanced DL models showing higher sensitivity to contextual nuances conveyed by emojis compared to traditional ML classifiers. This research highlights the dual role of emojis, which are often linked to positive emotions and offensive contexts, adding complexity to digital communication. It contributes to the development of more accurate and context-sensitive natural language processing (NLP) tools.

Keywords


Arabic text classification; Deep learning emojis analysis; Machine learning; Natural language processing; Offensive language detection

Full Text:

PDF


DOI: http://doi.org/10.11591/ijece.v15i3.pp3332-3345

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

International Journal of Electrical and Computer Engineering (IJECE)
p-ISSN 2088-8708, e-ISSN 2722-2578

This journal is published by the Institute of Advanced Engineering and Science (IAES).