Predicting personality traits from Arabic text: an investigation of textual and demographic features with feature selection analysis
Abstract
Automatic personality recognition (APR) utilizes machine learning to predict personality traits from various data sources. This study aims to predict the big five personality traits from modern standard Arabic (MSA) texts, using both textual and demographic features. The “MSAPersonality” dataset is employed to conduct a comprehensive analysis of features and feature selection methods to evaluate their impact on APR model performance. We compared feature selection algorithms from the filter, wrapper, and embedded-based categories through a systematic experimental design that consisted of feature engineering, feature selection, and regression. This study showed that each trait was more accurately predicted using a distinct set of features. However, age and study level were the most common features among the five traits. Moreover, although there were no statistically significant differences in performance between the feature selection techniques, embedded-based methods offered the best compromise between performance, time, and interpretability. These findings contribute to the understanding of APR in general and among Arabic speakers.
Keywords
Automatic personality recognition; Big five; Demographic features; Feature selection; Machine learning; Modern standard Arabic
Full Text:
PDFDOI: http://doi.org/10.11591/ijece.v15i1.pp970-979
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
International Journal of Electrical and Computer Engineering (IJECE)
p-ISSN 2088-8708, e-ISSN 2722-2578
This journal is published by the Institute of Advanced Engineering and Science (IAES) in collaboration with Intelektual Pustaka Media Utama (IPMU).