Predicting personality traits from Arabic text: an investigation of textual and demographic features with feature selection analysis

Khaoula Chraibi, Ilham Chaker, Azeddine Zahi

Abstract


Automatic personality recognition (APR) utilizes machine learning to predict personality traits from various data sources. This study aims to predict the big five personality traits from modern standard Arabic (MSA) texts, using both textual and demographic features. The “MSAPersonality” dataset is employed to conduct a comprehensive analysis of features and feature selection methods to evaluate their impact on APR model performance. We compared feature selection algorithms from the filter, wrapper, and embedded-based categories through a systematic experimental design that consisted of feature engineering, feature selection, and regression. This study showed that each trait was more accurately predicted using a distinct set of features. However, age and study level were the most common features among the five traits. Moreover, although there were no statistically significant differences in performance between the feature selection techniques, embedded-based methods offered the best compromise between performance, time, and interpretability. These findings contribute to the understanding of APR in general and among Arabic speakers.

Keywords


Automatic personality recognition; Big five; Demographic features; Feature selection; Machine learning; Modern standard Arabic

Full Text:

PDF


DOI: http://doi.org/10.11591/ijece.v15i1.pp970-979

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

International Journal of Electrical and Computer Engineering (IJECE)
p-ISSN 2088-8708, e-ISSN 2722-2578

This journal is published by the Institute of Advanced Engineering and Science (IAES) in collaboration with Intelektual Pustaka Media Utama (IPMU).