Arabic fake news detection using hybrid contextual features

Hussain Mohammed Turki, Essam Al Daoud, Ghassan Samara, Raed Alazaidah, Mais Haj Qasem, Mohammad Aljaidi, Suhaila Abuowaida, Nawaf Alshdaifat

Abstract


Technology has advanced and social media users have grown dramatically in the last decade. Because social media makes information easily accessible, some people or organizations distribute false news for political or commercial gain. This news may influence elections and attitudes. Even though English fake news is widely detected and limited, Arabic fake news is hard to recognize owing to a lack of study and data collection. Wara Arabic bidirectional encoder representations from transformers (WaraBERT), a hybrid feature extraction approach, combines word level tokenization with two Arabic bidirectional encoder representations from transformers (AraBERT) variants to provide varied features. The study also discusses eliminating stopwords, punctuations, and tanween markings from Arabic data. This study employed two datasets. The first, Arabic fake news dataset (AFND), has 606,912 records. Second dataset Arabic news (AraNews) has 123,219 entries. WaraBERT-V1 obtained 93.83% AFND accuracy using the bidirectional long short-term memory (BiLSTM) model. However, the WaraBERT-V2 technique obtained 81.25%, improving detection accuracy above previous researchers for the AraNews dataset. These findings show that WaraBERT outperforms word list techniques (WLT), term frequency-inverse document frequency (TF-IDF), and AraBERT on both datasets.

Keywords


AraBERT; AraNews; Fake news; Tokenization; WaraBERT

Full Text:

PDF


DOI: http://doi.org/10.11591/ijece.v15i1.pp836-845

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

International Journal of Electrical and Computer Engineering (IJECE)
p-ISSN 2088-8708, e-ISSN 2722-2578

This journal is published by the Institute of Advanced Engineering and Science (IAES) in collaboration with Intelektual Pustaka Media Utama (IPMU).