A deep learning framework to detect Covid-19 disease via chest X-ray and CT scan images

Received Jun 22, 2020 Revised Jul 29, 2020 Accepted Aug 10, 2019 COVID-19 disease has rapidly spread all over the world at the beginning of this year. The hospitals' reports have told that low sensitivity of RT-PCR tests in the infection early stage. At which point, a rapid and accurate diagnostic technique, is needed to detect the Covid-19. CT has been demonstrated to be a successful tool in the diagnosis of disease. A deep learning framework can be developed to aid in evaluating CT exams to provide diagnosis, thus saving time for disease control. In this work, a deep learning model was modified to Covid-19 detection via features extraction from chest X-ray and CT images. Initially, many transfer-learning models have applied and comparison it, then a VGG-19 model was tuned to get the best results that can be adopted in the disease diagnosis. Diagnostic performance was assessed for all models used via the dataset that included 1000 images. The VGG-19 model achieved the highest accuracy of 99%, sensitivity of 97.4%, and specificity of 99.4%. The deep learning and image processing demonstrated high performance in early Covid-19 detection. It shows to be an auxiliary detection way for clinical doctors and thus contribute to the control of the pandemic.


INTRODUCTION
Most countries in the world have been infecting with Coronavirus disease (Covid-19) with 2.5 million confirmed cases [1]. The outbreak was declared as a "public health emergency of international concern" (PHEIC) by the "World Health Organization" (WHO). Covid-19 has widely spread over all world since on January 30, 2020 [2]. It is a highly contagious person-to-person transmissible and pneumonia caused [3]. Based on the WHO's reports, the mortality rate has 2-3% of people because of the virus. In the absence of a preventive vaccine for Covid-19 disease. It is essential to diagnostic testing at an early stage based on criteria as clinical symptoms, "Reverse-transcription polymerase chain reaction" (RT-PCR), so as to isolate the infected people immediately [4]. However, there are reports showing the RT-PCR test might not be enough sensitive for early detection [5,6]. So, computed tomography (CT) appeared as a noninvasive imaging approach that can detect specific lesions in the lung associated with Covid-19 disease [7]. Chest CT is a diagnostic tool for pneumonia and Covid-19, is easy to do, and can output an accurate diagnosis. It shows perfect radiographic features in all Covid-19 images, as multifocal patchy consolidation, ground-glass opacities, and multifocal patchy consolidation [8]. It has been noted that several patients had a negative RT-PCR test while in the Chest CT scan having positive [9]. Artificial intelligence

845
(AI) involving machine learning (ML) and deep learning (DL) has grand evidence success in the medical image understanding scope due to its high strength of classification and feature extraction [10,11]. Convolutional neural network (CNN) has widely applied to detect and viral pneumonia and differentiate bacterial in chest radiographs. CNN has powerful in feature extraction, involves spatial filters that collect information on the structure [12]. Doctors usually use X-rays to diagnose lung inflammation and pneumonia. All hospitals have X-ray imaging, it could be possible to use X-rays to analyze the lungs of Covid-19 patient. But X-ray analysis requires takes significant time and a radiology expert [13].
Li et al. [14] developed a neural network model is called (CONVNet) to the detection of Covid-19 via chest CT images. The data was used consists of 4356 images for 3322 patients age around 49±15 years. They based on the RestNet50 model to develop the algorithm could use a robust diagnosis for Covid-19. The sensitivity and specificity of the work were reported as 90% and 96%, respectively. Bhandary et al. [15] proposed a modified AlexNet model by using a support vector machine and compared it against Softmax. They implemented on 1018 images of the chest X-ray belonging to the LIDC-IDRI database to detect pneumonia and cancer. The algorithm's performance has evaluated, and it has 97.27% accuracy, 98.09% sensitivity, and 95.63% specificity. Wang et al. [16]  The motivation in this work, an automated diagnosis system development is able to analyze the lesion from radiology images and aide doing a rapid and accurate diagnosis. The rest of this study consists of: Section 2 presents the methodology of the proposed deep learning framework. Dataset Information and performance evaluation metrics also results, and discussions are presented and described in section 3. In the end, the conclusion is shown in section 4.

CNN model
In deep learning, CNN is a class of deep neural networks that attempts to simulate the process of analyzing images via the visual cortex (cerebral cortex) in the brain [19]. In the past, most researchers in computer vision extracted the features by hand-crafted for better results in classification [20]. Nowadays, CNN performs the respective work of feature extraction automatically through the training stage based on pooling layers and convolution layers [21]. Convolutional layers consist of various types of filters that are trained according to the classification goal. While the pooling layers are doing reducing the dimension of feature extraction and retain the size and shape of an image. There are many CNN models popularly because of their efficiency and robustness in the field of pattern recognition [22]. It is used in many scopes, especially in the classification of medical images [23]. Hence, the VGG-19 one of the models used in our work is illustrated in Figure 1. VGG-19 architecture consists of convolutional, pooling, and fully connected layers. It contains a total of 25 layers. The input image size is 224×224 pixels. The filter size is 3×3 pixels for the convolutional layer (ReLU). The max-pooling layer is used to reduce the cost and size of the data. In the final architecture, the layers have a fully-connected layer (Flatten and ReLU) with a dropout of (0.5) that method to reduce overfitting and an output layer with softmax activation.

Performance evaluation metrics
A variety of metrics have used agreeable by the scientific community to evaluate the performance of the classification system to detect lung disease [24]. The performance of this study is evaluated with the confusion matrix based on the essential parameters used: true-positive (TP), true-negative (TN), falsepositive (FP), and false-negative (FN). By these parameters, it can be calculated validity metrics, such as accuracy, sensitivity, specificity, F1 score, precision. Also, other values false-negative rate (FNR), false positive rate (FPR), false discovery rate (FDR), false omission rate (FOR), matthews correlation coefficient (MCC), bookmaker informedness (BM) and markedness (MK) are also computed. The mathematical formulae of these measures can be expressed as [25]:

RESULTS AND DISCUSSIONS
In this section, the results are presented for lung classification. At first, the dataset used is presented, and information its, then, the metrics used are shown for performance evaluation, as well as detail the results of the implemented, also compare with other related works.

Dataset information
This work, a database of lung disease with chest X-ray or CT images is used, which is publicly available in Ref. [26] and Ref. [23]. The dataset contains 1000 images, 805 images of normal, and 195 images of Covid-19. The normal as Chest X-ray images, while Covid-19 images consist of 172 Chest X-ray and 23 Lung CT images, shown in the Table 1. All images for Covid-19 with chest X-ray or CT images, are available in 24-bit RGB-scale in JPEG format, with a different size. Chest X-ray normal (CXRN) images were selected from Guangzhou Women and Children's Medical Center. All CXRN imaging was performed as a major aspect of patients' daily care. Before training the system, the real diagnoses for the images were evaluated by three expert radiologists. The CXRN images are available in JPEG format, and a different size of about 2022×2129 to 1088×824. Images samples used can be shown in Figure 2.

Fine-tuning the VGG19 model
The deep learning system was implemented in a personal computer with an Intel Core i7-7700HQ CPU @ 2.81 GHz, Nvidia GeForce GTX 1050-Ti graphic cards, and 16 GB of RAM, working on a Windows 10 (64-bit) operating system, and implemented fully in Python language via Keras library with Open CV and Tensorflow as back-end. The VGG19 model given in Figure 1 was trained at 80% and validated on 20% for all dataset available images. Based on the above, 200 images have used for the validation set and the remaining 800 images for the training set. There were 195 Covid-19 and 805 normal images. So, the ratio of COVID-19 to normal images in the total dataset was around 24%.
The experimental work in this study is divided into three-stage. In the first step, all images have preprocessed via converting it to the RGB scale and resizing it to 224×224 pixels so that the images are ready as an input to the CNN model. Then, the data (image intensity) were normalized by converting it to the range (0, 1). In the second step, since the number of training images (data) used in our work is not sufficient and to ensure that model generalizes, data augmentation has performed via setting the image rotation to 15 degrees clockwise randomly. In the third step, transfer learning is the process of taking a network pre-trained on a dataset and utilizing it to recognize image or object categories it was not trained on. While fine-tuning requires that retraining the head of CNN architecture to recognize new object classes it was not primarily prepared for. In this work has used Fine-tuning using Keras via a multi-step process. Firstly, all layers below the head are frozen in the network ensuring that the backward pass in backpropagation does not reach it. Secondly, the fully connected nodes are removed at the end of the network and replaced it with newly initialized ones. Then, training is started only for the fully connected layer heads. Figure 2 illustrates the sample test images of chest X-ray and lung CT with a normal case or Covid-19 disease. This dataset consists of various dimensions of images. Wherefore, the images resizing is processed to reduce the dimension to 224×224×3 pixels. Further, image augmentation is implemented to grow up the number of training images. Initially, a pre-trained VGG16, VGG19, Xception, ResNet50V2, MobileNetV2, NASNetMobile, ResNet101V2, and InceptionV3 is used to analyze the dataset used and compared among them as in Table 2. Furthermore, the performance of all DL models is then validated with the other predictive analytics parameters, as shown in Table 3.   Figure 3 shows training and accuracy for the pre-trained VGG19 model. The model was fine-tuned according to the parameters explained above. The highest accuracy value was obtained compared to other models, see Table 1. The results show that pre-trained models can output high accuracy performances, as shown in Table 2. Although that, the VGG19 was achieved the top average accuracy through this validation at 99% for 100 epochs. Figure 4 shows the loss accuracy attained for the VGG19 model.   Figure 5 shows the performance of the prediction time for all models used. It observed from Figures 5 and 6   Despite the novelty of the covid19 disease which we have worked on, there is a state-of-the-art method implemented on covid19 images. Table 4 shows the comparison of other methods with our work. The comparison is confined to the type of model and the number of images that have been trained and tested. As well as calculate the most prominent parameters as area under curve (AUC), accuracy (ACC), sensitivity (SEN), and specificity (SPE). Obviously, the highest accuracy in other literature is 97.27%, which was announced by Bhandary [15]. In comparison with the literature results by different authors, it observes our tuning model has produced a maximum accuracy of 99%. Based on the results, we could say that our VGG19 model outperformed the other models in corona virus disease diagnosis.
There are multiple CNN models used by different authors in the diagnosis of diseases by medical imaging. These methods have some limitations. The most important limitation in this work; the few numbers of images that the model was trained on, due to the difficulty in obtaining it as the public at present.

CONCLUSION
The early diagnosis of Covid-19 has been considered challenging due to the consequences of the disease spread to society. Deep learning techniques and soft computing skills would aide in the accuracy and acceleration of the diagnostic process. In this study, we had offered a tuned VGG19 model that could help diagnosis Covid-19 automatically. All implemented models have given good results at chest radiography. But the tuned VGG19 model has produced better accuracy results than other methods and outperformed the present literature. Therefore, models with a fine-tuning could be a committed computeraided diagnostic system for clinical doctors and contribute to the control of the pandemic.