Handling class imbalance in education using data-level and deep learning methods

Rithesh Kannan, Hu Ng, Timothy Tzen Vun Yap, Lai Kuan Wong, Fang Fang Chua, Vik Tor Goh, Yee Lien Lee, Hwee Ling Wong

Abstract


In the current field of education, universities must be highly competitive to thrive and grow. Education data mining has helped universities in bringing in new students and retaining old ones. However, there is a major issue in this task, which is the class imbalance between the successful students and at-risk students that causes inaccurate predictions. To address this issue, 12 methods from data-level sampling techniques and 2 methods from deep learning synthesizers were compared against each other and an ideal class balancing method for the dataset was identified. The evaluation was done using the light gradient boosting machine ensemble model, and the metrics included receiver operating characteristic curve, precision, recall and F1 score. The two best methods were Tomek links and neighbourhood cleaning rule from undersampling technique with a F1 score of 0.72 and 0.71 respectively. The results of this paper identified the best class balancing method between the two approaches and identified the limitations of the deep learning approach.

Keywords


Academic at-risk; Class balancing; Educational data mining; Multi-classification; Resampling techniques; Synthetic datasets

Full Text:

PDF


DOI: http://doi.org/10.11591/ijece.v15i1.pp741-754

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

International Journal of Electrical and Computer Engineering (IJECE)
p-ISSN 2088-8708, e-ISSN 2722-2578

This journal is published by the Institute of Advanced Engineering and Science (IAES) in collaboration with Intelektual Pustaka Media Utama (IPMU).