Emotion recognition based on the energy distribution of plosive syllables

Abdellah Agrima, Ilham Mounir, Abdelmajid Farchi, Laila Elmazouzi, Badia Mounir


We usually encounter two problems during speech emotion recognition (SER): expression and perception problems, which vary considerably between speakers, languages, and sentence pronunciation. In fact, finding an optimal system that characterizes the emotions overcoming all these differences is a promising prospect. In this perspective, we considered two emotional databases: Moroccan Arabic dialect emotional database (MADED), and Ryerson audio-visual database on emotional speech and song (RAVDESS) which present notable differences in terms of type (natural/acted), and language (Arabic/English). We proposed a detection process based on 27 acoustic features extracted from consonant-vowel (CV) syllabic units: \ba, \du, \ki, \ta common to both databases. We tested two classification strategies: multiclass (all emotions combined: joy, sadness, neutral, anger) and binary (neutral vs. others, positive emotions (joy) vs. negative emotions (sadness, anger), sadness vs. anger). These strategies were tested three times: i) on MADED, ii) on RAVDESS, iii) on MADED and RAVDESS. The proposed method gave better recognition accuracy in the case of binary classification. The rates reach an average of 78% for the multi-class classification, 100% for neutral vs. other cases, 100% for the negative emotions (i.e. anger vs. sadness), and 96% for the positive vs. negative emotions.


Consonant-vowel; Neural architecture search; Speech emotion recognition energy distribution

Full Text:


DOI: http://doi.org/10.11591/ijece.v12i6.pp6159-6171

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

International Journal of Electrical and Computer Engineering (IJECE)
p-ISSN 2088-8708, e-ISSN 2722-2578