Optimized stacking ensemble for early-stage diabetes mellitus prediction
Abstract
This paper presents an optimized stacking-based hybrid machine learning approach for predicting early-stage diabetes mellitus (DM) using the PIMA Indian diabetes (PID) dataset and early-stage diabetes risk prediction (ESDRP) dataset. The methodology involves handling missing values through mean imputation, balancing the dataset using the synthetic minority over-sampling technique (SMOTE), normalizing features, and employing a stratified train-test split. Logistic regression (LR), naïve Bayes (NB), AdaBoost with support vector machines (AdaBoost+SVM), artificial neural network (ANN), and k-nearest neighbors (k-NN) are used as base learners (level 0), while random forest (RF) meta-classifier serves as the level 1 model to combine their predictions. The proposed model achieves impressive accuracy rates of 99.7222% for the ESDRP dataset and 94.2085% for the PID dataset, surpassing existing literature by absolute differences ranging from 10.2085% to 16.7222%. The stacking-based hybrid model offers advantages for early-stage DM prediction by leveraging multiple base learners and a meta-classifier. SMOTE addresses class imbalance, while feature normalization ensures fair treatment of features during training. The findings suggest that the proposed approach holds promise for early-stage DM prediction, enabling timely interventions and preventive measures.
Keywords
Artificial neural network; diabetes mellitus; feature normalization; random forest; stacking
Full Text:
PDFDOI: http://doi.org/10.11591/ijece.v13i6.pp7048-7055
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
International Journal of Electrical and Computer Engineering (IJECE)
p-ISSN 2088-8708, e-ISSN 2722-2578
This journal is published by the Institute of Advanced Engineering and Science (IAES) in collaboration with Intelektual Pustaka Media Utama (IPMU).