Hybrid deep learning model using recurrent neural network and gated recurrent unit for heart disease prediction

ABSTRACT


INTRODUCTION
Heart disease is the leading cause of death in the United States and globally [1]. The most common type is coronary artery disease which can cause a heart attack [2]. Health factors such as cholesterol, blood pressure, and glucose control contribute to cardiovascular health [3]. The main factors that could lead to heart disease include family genetic, high blood pressure, cholesterol, sex, age, diet, calcium rate, a stretched level of blood vessel and lifestyle. Heart disease becomes one of the leading causes of death around the world [2], [4]. This paper aims to increase the accuracy of heart disease prediction and produce a better accuracy compared to the existing RNN deep learning model [5].
Artificial intelligence (AI), machine learning, deep learning are the technologies that infusing the healthcare system and industries for the past decades [6]- [12]. Deep learning provides the healthcare industry with the ability to analyst data at exceptional speeds without compromising on accuracy. Deep learning works similar to the human brain where it uses statistical data in a mathematical model [11]. Deep learning consists of multiple neurons and layers that rely on a neural network and allows the micro-analytics on the data to generate a desired expected outcome [13]. Deep learning is very essential to make a precise and meticulous prediction of heart disease as mitigation in early detection for patients. An efficient medical disease prediction model named grey wolf optimization (GWO) + RNN where its GWO algorithm is used for feature selection which removes the unrelated attributes and redundant attributes. It improved the performance of prediction with auto encoder (AE) based RNN method avoids the feature dimensionality problems has produced 98.2% [14]. Removing the attributes from the datasets makes the quality of the prediction poor. The original version of the GWO algorithm has a few disadvantages such as low solving accuracy, unsatisfactory ability of local searching, and slow convergence rate [15], [16]. The Advance RNN model with multiple GRU has produced 98.4% yet the model consumes high time processing. The model tends to crash due to out of memory which was caused by the high number of neurons in the model [5]. Machine learning algorithms and techniques have been very accurate in predicting the heart related diseases but still, there is a need in shaping the scope of research, especially on how to handle high dimensional data and overfitting problem.
One of the issues in the predictive performance of systems is low data quality. The most common data quality issues are missing and irrelevant values. Imputation of missing values can help to increase data quality by filling gaps in training data [17]. A lot of research can also be done on the correct ensemble of algorithms to use for a particular type of data [18]. Cleveland heart disease dataset which is taken from the universal child immunization (UCI) repository will be the primary source for this proposed model. A deep learning ensemble model with LSTM and GRU has retained 85.71% accuracy. The future work of this study can be performed with different mixtures of ML and DL models for better prediction [19].

LITERATURE REVIEW
Cardiovascular disease is the leading cause of death worldwide and a major public health concern [20]. Another common issue in the healthcare analysis is the imbalance of dataset distribution and particularly in heart disease prediction [1]. This paper overcomes the problem of imbalanced datasets and insufficient real data using a new under-sampling scheme of SMOTe. Demand for future work should be done with larger datasets to ensure the accuracy of deep learning models are maintained and sustained. SMOTe will synthetic the small class of datasets to become larger and balance datasets [21]. The next section will explain further about the study in artificial intelligence, machine learning and deep learning.

Artificial intelligence (AI)
Artificial intelligence (AI) refers to the man-made intelligence that has been developed in machines and computers to think like humans and mimic their actions. It is able to perform the learning data patterns, analyst the information and problem solving similar to human brains [5], [22]. AI is powerful and efficient in healthcare especially in the crucial field such as early detection of sickness, managing the treatment, prediction of sickness and critical events, evaluation of diseases. AI provides the cognitive abilities in managing healthcare and facilitates of auto-updating medical information across other healthcare platforms. The exceptional ability of AI to provide more accurate analysis results, minimize the clinical data errors, notifying on real-time health risk and robust prediction have become the reason to utilize AI in healthcare, eventually improve the healthcare quality [6].

Machine learning (ML)
Machine learning (ML) is a subset of AI that uses statistical computations and numerical calculations to perform analysis [23]. ML requires algorithms that cover the learning process of the data to process and produce the end results. It focused on building applications that learn from data and improve their accuracy over time without being programmed. An algorithm is a sequence of statistical processing steps. Algorithms are executed a few times, which is known as training, to find patterns and features in massive amounts of data. Eventually, the final output will be produced based on the strong algorithms and available data.

Deep learning (DL)
Deep learning is a subfield of machine learning where respective algorithms are supported by the neurons and layers in the structure and function of a brain called artificial neural networks. Deep learning duplicates the functions of the human brain in data processing and analysing in order to make a decision [10], [11]. Deep learning uses a hierarchical level of artificial neural networks to perform the deep analysis process to learn a dataset very rigorously. The learning process of deep learning is executed similarly to a human brain, from input to final output where in between lies the number of hidden layers and each hidden layers have neurons. Neuron is also known as a node that holds and deeply analyst specific data has been instructed earlier in the algorithm of a neural network and forwards the partially processed data to another layer of a 5469 neural network. The process data of deep learning uses a non-linear approach, where all the inputs would be connected and related in order to produce the best output. The first layer of the neural network collects the input data, processes them and sends it to the next layer as output. The next layers of the neurons in neural networks of deep learning will process the previous information before making the decision and produce the final results. Artificial neural networks (ANNs) are a class of non-linear statistical models that have been remarkably successful under the deep learning framework. When composed together in many layers, ANNs are feasible to fit and often obtain high predictive accuracy [24], [25].

RNN
RNN is a type of deep learning-oriented algorithm, which follows a sequential approach. RNN's uniqueness comes from its having connections between hidden layers. It replicates the same hidden layer many times over where the same weights and biases are applied to the inputs on each time step. The network will run the process within the loop where it will modify its hidden state, update and store the data in its builtin memory. The RNN will build its model by train its respective data. The model will be updated and rebuild at each time a data pass through the RNN chain. RNN is very simple to understand the data during the training process and able to handle both numerical and categorical data, which requires little data preparation, for possible to validate a model using statistical tests, performs well with large datasets [5], [26].
RNN uses cluster points of data in functional groups in a statistical algorithm. Larger data will be complex and harder for the classification and clustering process due to various types of variables [26]. Besides that, RNN has memory capabilities where it will capture the updates and record them in their neuron units. This is an advantage in processing time sequence data which provides RNN a unique quality in deep learning models. RNN able to collect data from arbitrarily long sequences. RNN captures the sequential information present in the input data and dependency between the words in the text while making predictions. Figure 1 shows the RNN and feed forward neural network process of a data sequence. RNN has backpropagation capabilities which consist of strong advantage in data processing and deep analysing for each hidden layer of the neural network.

GRU
Gated recurrent unit (GRU) is a type of enhanced LSTM and RNN where it can keep the essential data and relations of input sequences efficiently and purge the less important data to reduce the memory and processing time of GRU. Due to this uniqueness, GRU is widely used in sequential data prediction and reduces the processing time [5], [27]. GRU is improvised from LSTM to reduce lagging and delay processing in the neural network [27]. The structure of the GRU is simplified from the LSTM, with two gates, but no separate memory cell. A single update gate is formed in GRU, which replaced the input gate and the forget gate in LSTM, is used to analyst rapidly the current state of output. The reset gate in GRU is introduced to purge unnecessary data of the previous hidden state [28].

Keras
Keras is an open source machine-learning library written in python and widely used for rapid building deep learning models. It is a high-level application programming interface (API) that runs on top of Tensorflow, Theano, and CNTK and wraps up extensive complex numerical computation. Keras provides a convenient solution to deep learning problems and removes the effort of building a complex network [29]. Keras overcome the process of complex neural networks into a much simplified solution that has been supported tf [29]. Keras module is the official frontend of Tensorflow, which is the most popular API among other deep learning libraries [30].

Tensorflow
Tensorflow is an end-to-end open source platform for machine learning [5].
Tensorflow is an open source library for numerical computation and large-scale machine learning that combines the machine learning and deep learning models' algorithms to make them more efficient. Tensorflow will execute the process in neural networks to learn the data behavior patterns based on the available data and incorporates them with the Tensorflow database. Tensorflow's database is enormous with libraries for neural networks, statistical computations and numerical calculations. Tensorflow operates at a large scale and in heterogeneous environments. Tensorflow uses dataflow graphs to represent computation, shared state, and the operations that mutate that state [31]. Table 1 shows the analysis of the existing RNN deep learning models on heart disease prediction. F. Ali et al. [32] produced the highest accuracy with 98.5%. Most of the deep learning models have the redundancy of learning where the neural networks of the deep learning model need to train the particular data and memorise the behaviour pattern of the particular datasets. The redundancy has caused the main reason for the inefficiency in data prediction. Babu et al. [14] provides 98.23% using GWO optimization using RNN. The original version of the grey wolf optimization (GWO) algorithm has few disadvantages such as low solving accuracy, bad local searching ability and slow convergence rate [15], [16]. Krishnan et al. [5] has developed an improvised RNN with multiple GRU with 98.4%. This paper will enhance this model to increase the accuracy using an optimum number of GRU and the best learning rate. Redundancy of data and inefficiency in the deep analysing of each parameter through out the hidden layers. [5] 2020

Critical analysis
The presence of multiple gated recurrent unit (GRU) have improvised the RNN model performance with 98.4% of accuracy.
The performance of the model can be improved.

METHOD
This paper proposes a hybrid deep learning model which will be designed with RNN and GRU supported by Keras and Tensorflow as the backend. The datasets have 313 samples which is a small size of datasets found to be imbalanced. The imbalanced dataset is relevant primarily in the context of supervised machine learning involving two or more classes. Imbalanced data classification is one of the issues that has risen up when machine learning evolved from science fiction to an applied technology [21]. The complications of imbalanced data classification occur because of the disproportion in the dataset that is very common [37]. Therefore, the datasets will be synthetic through oversampling in SMOTe technique. The SMOTe's main purpose is to make the imbalance dataset to be balanced by randomly regenerating the minority class of dataset. SMOTe synthesizes new instances from minority class to make the equality in data [21], [37]. It makes the linear interpolation for the minority class of instances and correlates the best regeneration of the dataset in order to main the balance data by randomly selecting the k-nearest neighbours (KNN) for each example in the minority class [38].

Development of proposed model
The RNN will be developed together with GRUs to form a robust hybrid neural network. Figure 2 shows 5 main steps that will be used to develop the proposed model. The preliminary task is very crucial to ensure the collection of data and ensure its data quality. The dataset used in this study was obtained from UCI Repository that consists of 303 patients who are undergoing angiography. The Cleveland data will be processed to minimise the noisy data which has the missing values of features for 6 rows in the datasets. The processed data should follow the standards before it passes through the proposed neural network. Various studies and hybrid models of deep learnings have been studied thoroughly before the idea of this proposed model is designed.

Proposed model
The proposed model can be divided into 3 components which are data processing, data analytics and data visualization. The 1 st component of the model acts to perform the extract transform load (ETL) process using SMOTe, and this overcomes the imbalanced nature of Cleveland dataset. This technique generates synthetic data for the minority class by oversampling and make the Cleveland dataset balance, more quality and larger.
The 2 nd component will be the essential part where the predictive analytics under the data analytics will be performed. Predictive analytics for this component is designed with RNN and 7 GRUs. As more layers using certain activation functions are added to neural networks, the gradients of the loss function approach zero, making the network hard to train. Therefore, the batch normalization layers will refine its learning rate and training for the model by Adam optimization. The trained model with best epochs, batch size of data and verbose will be saved respectively to json and h5 format. The learning rate will change to choose the accuracy for the trained model. The 3 rd component will visualise the behaviour pattern of the proposed model in the graphs in Tensorboard. The file for the final results of the prediction will be generated and saved at the end of this process. The entire process of the model will be displayed to depict the overall performance, deep analysing and sub process of the proposed model. The proposed model produces higher accuracy compared to other RNN based deep learning models. In concordance with the ultimate goal of this paper to further improve the prediction accuracy, this hybrid deep learning prediction model merges the best prediction records obtained from the previous model trainings. RNN  optimization able to refine and choose the best learning rate to increase the accuracy of the model and generates optimal weights for the proposed model. This proposed hybrid prediction model able to outperform RNN with the presence of multiple GRUs in the hidden layers of the model as shown in Figure 3.

RESULTS AND DISCUSSION
The results are generated once the proposed model has completed the trainings and testing. Data used for trainings are separated from testing, which will be 70:30. The proposed model enables the best training with a small batch size of data. This small batch size of the Cleveland dataset is best suitable for the RNN and GRU to make a better prediction in the testing. Figure 4 shows the accuracy of 97.9869% before the proposed model gone through almost 3,000,000 epochs. The results were generated after it went through several trainings. The trainings model and network are saved in .json and .h5 files. The next testing of new datasets will be using the prior trained model to generate the end results.  the booster for the model. Previously, the proposed model was found to crash each time it went through a high number of training. The environment of Ubuntu and Pycharm halt the training process due to excessive use of memory in the computer.
The proposed model has proven to perform better compared to other existing models of RNN as shown in Table 3. The complex RNN with a higher number of hidden layers and neurons has been simplified and works faster with GRU. The significance of RNN and GRU has been discussed and achieved 98.4% [5] while the proposed model has achieved 98.6876%. This enhancement was done by the implementation of the correct number of GRUs with RNN. The proposed model has proven to work faster due to its environment in Ubuntu and the presence of GPU.

CONCLUSION
This paper applied SMOTe method to solve the data imbalance problem. The hybrid model is designed with RNN and GRU and has successfully increased the prediction accuracy. The RNN with GRU selects the attributes for the classification and predicts the diseases based on the priority during the model training. The RNN has the advantage of processing instances of data independent to previous instances. GRU is capable to memorise the data behaviour during training faster compared to RNN where it will purge unnecessary data during learning of the model. The proposed model has the higher performance with the larger dataset ever that consists of more than 100,000 samples compared to the existing studies. Future work 5474 on the hybrid model can be extended with other deep learning models for better attribute selection and could increase the efficiency of the data classification and accuracy. This hybrid deep learning model has enhanced the quality of deep analysing and decision-making in heart disease prediction. This predictive deep learning model could be one of the AI tools that could potentially transform the quality of life for billions of people around the world who are suffering from heart disease.