An Ensemble Multi-Model Technique for Predicting Chronic Kidney Disease

Chronic Kidney Disease (CKD) is a type of lifelong kidney disease that leads to the gradual loss of kidney function over time; the main function of the kidney is to filter the wastein the human body. When the kidney malfunctions, the wastes accumulate in our body leading to complete failure. Machine learning algorithms can be used in prediction of the kidney disease at early stages by analyzing the symptoms. The aim of this paper is to propose anensemble learning technique for predicting Chronic Kidney Disease (CKD). We propose a new hybrid classifier called as ABC4.5, which is ensemble learning for predicting Chronic Kidney Disease (CKD). The work comprises of two stages, the first stage consists of obtaining weak decision tree classifiers from C4.5 and in the second stage, the weak classifiers are added to the weighted sum to represent the final output for improved performance of the classifier.


INTRODUCTION
Chronic Kidney Disease (CKD) comprises of decreased renal functionfrom a placid to acute kidney failure. CKD mostly occurs in the elderly population and progressive loss in younger population, over 30% of the populations with 65 years have this stable disease [1]. CKD is also kindred with a high risk of Cardio Vascular Disease (CVD). The malfunction of the kidney totally deals with the glomerular filtration rate (GFR) [2]. The glomerular filtration rate represents the filtration rate at which the kidney operates to filter the wastes in the blood. The stages of the Chronic Kidney Disease are classified into five stages, the first stage is the normal kidney damage with a GFR of (>90 mL/min/1.73 m 2 ), the second stage is the Mild reduction in GFR (60-89 mL/min/1.73 m 2 ), the third stage contains two phases of moderate reduction, Moderate reduction in GFR (45-59 mL/min/1.73 m 2 ), and Moderate reduction in GFR (30-44 mL/min/1.73 m 2 ), the fourth stage is Severe reduction in GFR (15-29 mL/min/1.73 m 2 ) and the fifth stage is the kidney failure where the filtration rate is <15 mL/min/1.73 m 2 ).In stage 1 and 2, the GFR doesn't conclude the diagnosis since the GFR are at normal levels, but these can be identified from the albumin excretion ratio, historical kidney transplantation, and abnormalities. In stages 1 to 3 there will not no signs of CKD, but when the stage crosses to 4 and 5 the body experiences significant changes, the changes are respect to the metabolic acidosis, alterations, anemia and even electrolyte imbalance, signs of the metabolic acidosis include malnutrition in protein, weakness in muscle and loss in body mass and the alterations include Hypertension, Pulmonary edema, and Peripheral edema. Anemia is also associated with the CKD by fatigue, impaired immune function and increases CVD mortality. The CKD can be diagnosed by urinalysis, lipid profile, complete blood count, and level of serum albumin.
With the expeditious development in the medical diagnosis, ensemble classifiers can play a crucial role in predicting and diagnosing at early stages.
The rest of the paper is organized into six sections, In section II the related work is presented, Section III gives the description of the dataset and its attributes, the proposed framework is presented in section IV, Experimental analysis and findings are presented in section V and finally the conclusion and future outlook in section VI.

RELATED WORKS
In this section, the work related to the analysis, diagnosis, and prediction of Chronic Kidney Disease of past years is presented.
Michelle M [3] found the role of variants of the APOLI gene causing various forms of kidney disease progression in African descent. Jessica K [4] proposed management strategies which include in finding new potential biomarkers for monitoring the diabetes mellitus disease and also the new potential therapies for slowing down the progression of the diabetes mellitus and chronic kidney disease. The treatment of Chronic Kidney Disease associated with dyslipidemia is presented in [5], where the statins have a moderate effect on the CKD and addition of Ezetimibe to a statin may prevent CKD. Daniel L. Galvan [6] proposed the overview of the role of mitochondrial dysfunction in chronic kidney disease in the development of diabetic nephropathy and the analysis found the mitochondrial targets which improve the treatment of chronic kidney disease.Michelle A [7] discuss the factors of the progression of the renal dysfunction. The work in [8] estimates the predictive ability of cardiovascular disease in patients with chronic kidney disease. Two common risk scores were used in evaluating the predictability of the cardiovascular event. The results conclude that the probability of aero sclerotic cardiovascular events in patients with CKD regardless of renal function, albuminuria and previous cardiovascular events.Claudia Pontillo et al. [9] investigated whether CKD273, a urinary biomarker in predicting the glomerular filtration rate. In the investigation, the urinary biomarker CKD273 predicts the stage 3 CKD. The work in [10] proposed a prediction of autonomic neuropathy in chronic kidney disease in stage 5, where the work carried out in two stages, the first stage contained a set of questionnaires and followed by the testing of postural hypotension. The results concluded that the distribution of autonomic neuropathy with the questionnaire method was higher than the postural hypotension.Austin G Stack [11] proposed the measures of CKD which improves cardiovascular disease prediction, the measures concluded that both GFR and ACR are the casual cause of cardiovascular disease.G. Bilancio [12] proposed a predictive model for cardiovascular disease in case of kidney transplants, in the analysis and 34 variables were investigated and the confidence interval and hazard ratio were calculated, the diabetic nephropathy predicated 91.2% of cardiovascular disease. The patterns, prediction and progression of the chronic kidney

DATASET DESCRIPTION
We experimented our proposed ABC4.5 technique on chronic kidney disease dataset collected from a private hospital in Tamil Nadu. The dataset contained eleven numeric and fourteen nominal attributes, containing 400 observations with two classes' ckd and notckd. The detailed attribute information is presented in Table 1.

PROPOSED METHODOLOGY
The objective of this paper is to propose an ensemble technique for Chronic Kidney Disease.The work contains two stages; in the first stage the dataset is preprocessed and subjected to C4.5 classifiers, in the second stage the output classifiers are optimized using the adaptive boosting technique. The framework is shown in Figure 1. The main steps in ABC framework for Chronic Kidney Disease (CKD) are as follows: Step1. In this step, the attribute values in the dataset are preprocessed to remove any unwanted and missing values.
Step 2. The preprocessed data is subjected to 10-cross-validation and a percentage split of 60%.
Step 3. With the percentage split, the preprocessed dataset is divided into training and test dataset, where the test dataset is subjected to C4.5 classifier; the C4.5 tree classifier produces a hypothesis.
Step 4. The produced hypothesis forms a new dataset called as D_new, this new dataset is also subjected to the percentage split of 60%, the new test dataset is given as an input to the adaptive boosting where the weak learners from the C4.5 classifier are tweaked in favor of misclassified instances with their weighted sum.
Step 5. The final outcome represents boosted classifier for Chronic Kidney Disease.

EXPERIMENTAL ANALYSIS AND FINDINGS
This section contains the experimental analysis and findings of the proposed ABC technique for Chronic Kidney Disease (CKD). The confusion matrix containing true positive, true negative, false positive and false negative values were obtained for classifiers under analysis such as SVM, C4.5, PSO-MLP, DT and the proposed ABC4.5. These values obtained from the confusion matrix of the various classifiers were used to calculate the performance metrics such as True Positive Rate (TPR), False Positive Rate(FPR), accuracy, precision, recall, and receiver operating characteristics (ROC). Confusion matrix of the classifiers are shown in Figure 2., Performance metric comparison is shown in Figure 3., Accuracy and Inaccuracy are shown in Figure 4., and the average execution time of the classifiers under analysis is shown in Figure 5. respectively.  5  TP  115  96  105  84  119  TN  19  37  23  65  14  FP  68  74  75  48  88  FN  38  33  37  43

CONCLUSION AND FUTURE OUTLOOK
In this analysis, SVM, C4.5, PSO-MLP, DT, and ABC4.5classifiers were implemented on a Chronic Kidney Disease (CKD) dataset. The proposed ABC4.5 achieved an accuracy of 92.76%, which is higher than the classifiers with an execution time of 0.12 sec in detecting Chronic Kidney Disease (CKD). Applying hybrid ensemble learning algorithms on a Chronic Kidney Disease dataset for improved performance can do further outlook.

Exec time (in sec)
Exec time (in sec)