MSAPersonality: a modern standard Arabic dataset for personality recognition
Abstract
Automatic personality recognition is a task that attempts to automatically infer personality traits from a variety of data sources, including Text. Our words, whether spoken or written, reveal a lot about who we are. As people speak different languages, each with its own set of characteristics and level of complexity, identifying their personalities automatically might be language-dependent. This task requires an annotated text corpus with personality traits. However, the lack of corpora for languages other than English makes the task extremely challenging. We concentrated our efforts in this paper on the Arabic language in particular because it is understudied and lacks a corpus, despite being one of the most widely spoken languages in the world. Our primary goal was constructing our “MSAPersonality” dataset, which consists of 267 texts in modern standard Arabic that have been annotated with the Big Five personality traits. To evaluate the dataset and its potential for classification and regression, we used text preprocessing techniques, feature extraction, and machine learning algorithms. We obtained promising experimental results. Therefore, further research into predicting personality from Arabic text can be conducted.
Keywords
Affective computing; Arabic text analysis; Automatic personality recognition; Machine learning; Natural language processing
Full Text:
PDFDOI: http://doi.org/10.11591/ijece.v14i4.pp4498-4507
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
International Journal of Electrical and Computer Engineering (IJECE)
p-ISSN 2088-8708, e-ISSN 2722-2578
This journal is published by the Institute of Advanced Engineering and Science (IAES) in collaboration with Intelektual Pustaka Media Utama (IPMU).