Optimized stacking ensemble for early-stage diabetes mellitus prediction

ABSTRACT


INTRODUCTION
Diabetes Mellitus is a persistent metabolic ailment characterized by elevated blood glucose levels.As per the International Diabetes Federation, approximately 463 million adults were affected by diabetes in 2019, with projections indicating a surge to 700 million individuals by 2045 [1].In the Indian context, the incidence of diabetes is substantial, with an approximate population of 77 million adult individuals affected by the condition in the year 2019 [2].Furthermore, there is a significant association between type 2 diabetes mellitus (T2DM) and depression among individuals in India.In a study conducted in Hue City, Vietnam, it was observed that approximately 23.2% of patients diagnosed with T2DM experienced symptoms of depression [3].
Diabetes is often regarded as a systemic disease with far-reaching consequences, as it exerts detrimental effects on vital organs.Individuals diagnosed with diabetes face an increased susceptibility to various complications, including but not limited to miscarriage, renal failure, myocardial infarction, vision impairment, and other chronic and potentially life-threatening conditions [4].Therefore, it is essential to diagnose diabetes mellitus (DM) faster to prevent or delay the onset of these complications.Machine learning ISSN: 2088-8708  Optimized stacking ensemble for early-stage diabetes mellitus prediction (Aman)

7049
(ML) algorithms (such as support vector machine (SVM) [5], k-nearest neighbors (k-NN) [6], random forest (RF) [7], and artificial neural network (ANN) [8]) can help in the early diagnosis and accurate prediction of DM by analyzing various health indicators such as plasma glucose concentration, serum insulin resistance, and blood pressure [9]- [11].Timely identification and precise prognostication of DM hold paramount importance in facilitating efficacious interventions and optimal disease management.Leveraging the capabilities of ML algorithms, which possess the ability to process vast volumes of data, enables the detection of intricate patterns that might elude human experts [12]- [14].However, there is a need for further research and improved methodologies to enhance diagnostic accuracy and personalized treatment strategies.
To address this problem, this study proposes a novel stacking-based hybrid ML approach for the prediction of early-stage DM.By integrating multiple base classifiers through a stacked classifier, the proposed approach can capture complex relationships and patterns within the data, leading to improved predictive performance.The use of ML algorithms offers a comprehensive understanding of DM and aids in disease management, including the identification of individuals at risk of developing complications [15]- [18].This approach has the potential to improve health outcomes and enhance the quality of life for individuals with DM [19], [20].
In recent years, there has been a growing emphasis on the early detection and prediction of DM, prompting extensive research in this field.Doğru et al. [21] introduced a hybrid super ensemble learning model that integrated multiple algorithms, yielding remarkable accuracy rates of 99.6%, 92%, and 98% for the prediction of early-stage DM across diverse datasets.Krishnamoorthi et al. [22] devised an innovative healthcare disease prediction framework for DM, leveraging RF and SVM models, attaining an accuracy of 86% in DM prediction.In the study [23], a stacked-based model was employed to predict the presence of DM in individuals.Compared to other existing models such as LR, NB, and linear discriminant analysis (LDA), the stacked-based model predicted blood sugar disease with 93.1% accuracy.This demonstrates the effectiveness of the stacked ensemble method for enhancing DM prediction results.In addition, Chakravarthy and Rajaguru [24] proposed a voting-based approach for the early diagnosis of DM.To enhance DM prediction, they applied a mixture of three ML algorithms: LR, RF, and XGBoost classifiers.After evaluating the efficacy of each algorithm separately, it was found that the ensemble method with weighted voting provided the best results for binary classification in terms of accuracy, precision, and F1-score.Mushtaq et al. [25] studied the effectiveness of voting-based models and hyperparameter-tuned ML algorithms for predicting DM.The study focused on addressing the challenge of imbalanced datasets through the implementation of methods such as Tomek and synthetic minority over-sampling technique (SMOTE).A two-stage model selection approach was employed, where LR, SVM, k-NN, gradient boost, NB, and RF algorithms were evaluated.RF emerged as the top-performing algorithm, achieving an accuracy of 80.7% after dataset balancing using SMOTE.Subsequently, a voting algorithm was applied to combine three superior models, resulting in an accuracy of 82.0%.These studies have demonstrated the effectiveness of various ML algorithms in enhancing DM prediction results.
The specific aim of this work is to develop a novel stacking-based hybrid ML approach for the prediction of early-stage DM.The integration of a stacked classifier in this research enables the combination of multiple base classifiers, leveraging their collective decision-making capabilities to enhance prediction accuracy.By employing the stacked classifier, the proposed approach can effectively capture complex relationships and patterns within the data, leading to improved predictive performance for early-stage DM detection.Through a comprehensive review of existing literature, we compare our proposed approach with the currently available models and methodologies.Section 2 describes the detailed methodology of the proposed model containing data pre-processing, handling missing values, balancing the dataset, normalizing features, and hyperparameter tuning of base and meta classifiers.Section 3 presents the results of the experiments conducted, including accuracy rates and performance metrics of the stacking-based hybrid model.Finally, section 4 compares the results and performance of the proposed stacking-based hybrid model with existing literature on early-stage DM prediction, and highlights the improvements achieved by the proposed approach.

MATERIAL AND METHOD
The methodology employed in this study aims to develop a precise and reliable model for predicting early-stage DM.The process, as illustrated in Figure 1, follows a systematic approach involving data acquisition, data preprocessing, model formulation, model training, and model evaluation.The initial step involves gathering relevant datasets for analysis.Subsequently, the collected data undergoes preprocessing to ensure its quality and eliminate inconsistencies.Once the data is appropriately prepared, the model formulation stage entails selecting suitable algorithms and techniques for constructing the predictive model.The model is then trained using the preprocessed data, optimizing its parameters and refining its performance.Finally, the model's predictive accuracy and performance are evaluated using appropriate evaluation metrics.To reproduce the results obtained in this study, all experiments were conducted on the Waikato environment for knowledge analysis (WEKA) platform [26].The experimental setup utilized a system equipped with a 3.2 GHz Intel Core i5 CPU and 16 GB RAM.WEKA is a comprehensive and user-friendly environment for data analysis and machine learning tasks, providing a diverse range of algorithms and evaluation methods.By utilizing the WEKA platform, researchers can replicate the procedures described in this study and achieve comparable outcomes.

Data pre-processing
The data pre-processing phase involves handling missing values through mean imputation, addressing class imbalance through SMOTE, and normalizing the datasets.These steps prepare the datasets for subsequent modelling and analysis, ensuring a reliable and standardized foundation for accurate earlystage DM prediction.In this study, both the PID dataset [27] and the ESDRP dataset [28] were collected from the OpenML website, a reliable platform for sharing datasets and machine learning experiments.The PID dataset initially consists of 768 instances and 9 attributes, while the ESDRP dataset comprises 520 instances and 17 attributes as shown in Table 1.To handle missing values in the datasets, the mean imputation technique is employed.This can be implemented using the "ReplaceMissingValues" filter available in the WEKA platform (WEKA.filters.unsupervised.attribute.ReplaceMissingValues).This filter replaces the missing values with the mean value of the corresponding attribute, ensuring that the datasets are complete and ready for analysis.
Optimized stacking ensemble for early-stage diabetes mellitus prediction (Aman) 7051 There are 320 occurrences labelled "Positive" and 200 examples labelled "Negative" in the DM prediction dataset.Before data preprocessing, there are 268 cases labelled "Positive" and 500 instances labelled "Negative" in the PID dataset.The SMOTE algorithm is utilized to address class imbalancing issue by generating synthetic examples of the minority class.This can be achieved using the "SMOTE" filter in WEKA (WEKA.filters.supervised.instance.SMOTE), with specific parameters such as the number of instances to generate (P), the number of nearest neighbors (K), and the class index (C).By applying this filter, the class imbalance is alleviated, allowing for more balanced datasets.Following the application of SMOTE, the number of cases labelled "Positive" rises to 320, equaling the number of instances labelled "Negative" at 400.Similarly, the number of "Positive" examples in the PID dataset climbs to 536 and the number of "Negative" instances increases to 500.
Normalization is performed on the preprocessed datasets to ensure consistency and comparability.The "Normalize" filter in WEKA (WEKA.filters.unsupervised.attribute.Normalize) can be applied to scale all attribute values between 0 and 1.This step eliminates any potential bias introduced by varying attribute scales, enhancing the accuracy and interpretability of the subsequent modeling and analysis.

Classifiers and hyperparameter tuning
This section encompasses the process of learner selection and hyperparameter tuning for the stacking-based hybrid ML approach.In this approach, base learners are utilized as level 0 models within the stacking ensemble, while hyperparameter tuning is employed to enhance their performance.A meta-classifier is employed as the level 1 model in the stacking ensemble, which combines the predictions from the base learners and generates the final prediction.This integration of diverse outputs from the base learners enables improved overall predictive performance [29].In this study, the meta-classifier used is RF.To optimize the performance of the base learners, hyperparameter tuning was conducted.Table 2 provides an overview of the hyperparameter settings for each classifier used in this study, specifically for the PID dataset and the ESDRP dataset.The hyperparameters were fine-tuned using cross-validation with the CVParameterSelection technique.This approach enabled the selection of the most suitable hyperparameter values for each classifier, ensuring optimal performance within the stacking ensemble.Naïve Bayes (NB) is a probabilistic classifier that assumes independence between features and calculates the probability of a sample belonging to a specific class using Bayes' theorem.It is computationally efficient and works well with high-dimensional data but may oversimplify complex relationships between features [30].To apply naïve Bayes in WEKA, researchers can utilize the naïve Bayes classifier available in the platform's library.
Logistic regression (LR) is a linear classifier that models the connection between the input features and the sample's likelihood of belonging to a certain class.It is interpretable, and supports categorical and continuous data, but presupposes a linear relationship between features and the target variable's log odds [31].In WEKA, the Logistic classifier can be used to apply logistic regression.k-nearest neighbors (k-NN) classifier is a non-parametric algorithm that assigns a sample to the predominant class based on the class labels of its k closest neighbors in the feature space.It is easy to comprehend and does not require model training, but it can be sensitive to the selection of k and may be subject to the curse of dimensionality for high-dimensional data [32].In WEKA, the instance-based learning with k-NN (IBk) classifier can be used for k-NN classification.
Artificial neural networks (ANNs) are a class of mathematical models that simulate the behavior of biological neural networks.Composed of interconnected nodes called neurons, organized in layered architectures, ANNs aim to unravel intricate relationships between input features and target variables.ANN can capture non-linear relationships, but require careful architecture design, and training data, and can be computationally intensive [33], [34].In WEKA, researchers can utilize the MultilayerPerceptronClassifier to implement ANNs.
AdaBoost with support vector machine is a boosting-based classifier that combines multiple weak classifiers to create a strong classifier.In this case, SVMs are used as the weak classifiers.AdaBoost iteratively adjusts the weights of misclassified samples to focus on difficult-to-classify instances.SVMs provide robust classification boundaries but may be sensitive to the choice of kernel and hyperparameters [35].In WEKA, researchers can apply AdaBoost with SVM using the AdaBoostM1 classifier.
Random forest (RF) is a classifier that uses a bagging approach to aggregate the predictions of many DTs.It utilizes bootstrapping and feature randomization to reduce overfitting and improve generalization.It handles both categorical and continuous features and provides feature importance measures, but can be computationally expensive for large datasets [36].In WEKA, the random forest classifier can be used to apply random forest classification.

RESULTS AND DISCUSSION
After the evaluation in WEKA, the performance metrics for each hyper-tuned model were obtained.These metrics typically include accuracy, precision, recall, F-measure, mean absolute error (MAE), and area under the curve (AUC).For the ESDRP dataset, the hyper-tuned models were individually optimized using the CVParameterSelection technique in WEKA to maximize their performance before being combined in the proposed stacking-based hybrid model.The evaluation results showcased promising outcomes, with each model demonstrating its strengths and areas of improvement.Table 3 presents a comprehensive performance analysis, highlighting the accuracy achieved by each hyper-tuned model.Notably, the proposed stackingbased hybrid model stands out with an impressive accuracy of 99.7222%.This result surpasses the other hyper-tuned models, including NB (92.5926%),LR (93.5185%),RF (99.0741%), k-NN (98.6111%),ANN (96.2963%), and AdaBoost with SVM (93.9815%).These findings emphasize the effectiveness of the hypertuning process and the potential of the stacking ensemble approach.
Moving on to the PID dataset, similar efforts were made to optimize the hyper-tuned models using the CVParameterSelection technique in WEKA.The performance analysis in Table 4 provides valuable insights into the performance of each model on this specific dataset.The proposed stacking-based hybrid model demonstrates remarkable accuracy, achieving a score of 94.2085%.This accuracy outperforms other hyper-tuned models such as ANN (87.4598%),NB (77.8135%),LR (82.9582%),RF (90.9968%), k-NN (81.0289%), and AdaBoost with SVM (81.672%).These results underscore the importance of the hyperparameter tuning process and its impact on the models' performance.
The outstanding performance of the proposed model can be attributed to its stacking -based hybrid approach, which combines the predictions of the hyper-tuned base learners, effectively leveraging their strengths.It is important to note that each base learner was hyper-tuned independently to optimize its performance, ensuring that the model benefits from the best possible configurations for each classifier.The use of the RF meta-classifier in the stacking ensemble further enhances the model's predictive capability.

COMPARISON WITH THE EXISTING LITERATURE
The comparison with existing literature reveals a diverse range of models and methodologies proposed for the prediction of early-stage DM.However, the proposed model, which utilizes a stacking ensemble approach, demonstrates superior performance in DM prediction when compared to the existing models.This highlights the effectiveness and potential of the novel approach in enhancing the accuracy and reliability of early-stage DM prediction.According to Table 5, the proposed model achieves an accuracy of 94.2085% on the dataset and 99.7222% on the ESDRP dataset.This reflects a substantial improvement over the existing models, with the proposed model outperforming them by absolute differences ranging from 10.2085% to 16.7222% in terms of accuracy.

CONCLUSION
In conclusion, this research work introduces a novel stacking-based hybrid machine learning approach for accurately predicting early-stage DM.The approach combines multiple base learners at level 0 and utilizes an RF meta-classifier at level 1 to effectively aggregate their predictions.The obtained high accuracy rates on the early-stage DM and PID datasets highlight the effectiveness of the proposed model in predicting early-stage DM.The proposed approach demonstrates significant improvements over existing literature, outperforming them by absolute differences ranging from 10.2085% to 16.7222% in terms of accuracy.This substantial enhancement in accuracy showcases the superiority of the proposed model.The findings of this study have several implications.Firstly, the high accuracy rates achieved by the proposed model indicate its potential as a valuable tool in aiding early intervention and prevention strategies for DM.Timely identification of individuals at risk can enable proactive healthcare management and improve patient outcomes.Secondly, the stacking-based hybrid approach proves to be an effective methodology for integrating the predictions of multiple base learners, leveraging their strengths and enhancing overall performance.Future research endeavors should focus on further validating the generalizability of the proposed model on larger and more diverse datasets.Additionally, exploring feature importance analysis techniques can enhance the interpretability of the model, enabling better insights into the factors influencing early-stage DM.

Figure 1 .
Figure 1.Workflow of the proposed stacking-based hybrid model for early-stage DM prediction

Table 1 .
Characteristics of diabetes datasets

Table 2 .
Hyperparameters used for classifier tuning with CVParameterSelection

Table 3 .
Comparative analysis of models performance on the ESDRP dataset

Table 4 .
Comparative analysis of models performance on the PID dataset

Table 5 .
Comparison of models for early-stage DM prediction in existing literature