Multi-kernel CNN block-based detection for COVID-19 with imbalance dataset

Received Jul 20, 2020 Revised Sep 14, 2020 Accepted Oct 6, 2020 COVID-19, which originated from Wuhan, rapidly spread throughout the world and became a public health crisis. Recognizing the positive cases at the earliest stage was crucial in order to restrain the spread of this virus and to perform medical treatment quickly for patients affected. However, the limited supply of RT-PCR as a diagnosis tool caused greatly delay in obtaining examination results of the suspected patients. Previous research stated that using radiologic images could be utilized to detect COVID-19 before the symptoms appeared. With the rapid development of Artificial intelligence in medical imaging in recent years, deep learning as the core of this technology could achieve human-level-performance in diagnostic accuracy. In this paper, deep learning was implemented to detect COVID-19 using a chest X-ray dataset. The proposed model employed a multi-kernel convolution neural network (CNN) block combined with pre-trained ResNet34 to overcome an imbalanced dataset. The model block adopted different kernel sizes as 1x1, 3x3, 5x5, and 7x7. The findings show that the proposed model is capable of performing binary and three class classification tasks with an accuracy of 100% and 93.51% in the validation phase and 95% and 83% in the test phase, respectively.


INTRODUCTION
The coronavirus disease 2019 (COVID- 19) was first emerging in Wuhan, Hubei province of China in December 2019, with numerous reports of unknown cause's pneumonia. Since then, this outbreak has spread rapidly and has been declared a pandemic by the World Health Organization (WHO), which poses a severe threat to human health [1][2][3]. According to the data from the WHO, the number of coronavirus cases had affected 5,525,307 individuals worldwide and caused 347,114 deaths, as of May, 30 th 2020. This novel coronavirus has spread to more than 227 countries due to human-to-human transmission [4]. International Committee on Taxonomy of Viruses (ICTV) named new coronavirus as severe acute respiratory syndromecoronavirus-2 (SARS-CoV-2). While coronavirus disease 2019 (COVID-19) was a term given by WHO as an infectious disease infected by SARS-CoV-2. The COVID-19 may cause serious respiratory infections or even deadly acute respiratory distress syndrome (ARDS) [5]. The common clinical symptoms of COVID-19 include sore throat, headache, fever, cough, shortness of breath, muscle pain, and fatigue [6]. Reverse transcription-polymerase chain reaction (RT-PCR) has been considered as one of the diagnostic tools for suspected COVID-19 patients at an early stage. However, the limited supply of PCR test kits causes a significant delay in obtaining the test result for the affected patient and in preventing the spread of the virus, especially at the center of the pandemic area. In addition to the RT-PCR test, chest radiological imagery, including computed tomography (CT) scanning and X-ray imaging, play a keyl role in the early detection and prevention of these infectious diseases [7]. One of the advantages of radiological images is that the detection of COVID-19 symptoms can be performed even with the negative result obtained from the RT-PCR test [8,9]. Combining the results of a chest radiological image with the patient's initial symptoms, travel history, current direct contact, and laboratory test could perform a diagnosis of COVID-19 faster in the medical practice as early as possible. The implementation of this procedure is beneficial for identifying infected people promptly and controlling the spread of diseases, especially within the range of pandemic area. Briefly, the radiological image is a crucial element of the examination process for suspected patients of COVID-19, and its applications have been confirmed in various previous studies [2,[10][11][12].
Valuable diagnostic information is contained in radiological images acquired from COVID-19 patients. Several findings identified changes in chest X-ray imagery and CT scan, especially in the early stages of COVID-19 symptoms [13]. Significant findings have also been revealed by researchers in radiological imaging for COVID-19 studies that confirming changes in the lung organ [14][15][16]. Thus, the accurate and prompt prognosis of suspected cases at the early stage of COVID-19 plays a pivotal role in quarantine and proper medical care. However, the need for medical imaging examination creates another issue when a significant number of suspected patients have to conduct chest X-ray/CT scans in a medical facility. This will lead to immense burdens on medical workers because the severe shortage of radiologists is a significant challenge, mostly in the current situation. Besides, visual exhaustion endured by radiologists due to working overtime increases the potential risk of a missed diagnosis.
The COVID-19 pandemic has raised awareness of the necessity for radiologists in this sector. As a result, this outbreak has raised concerns about the development of automated detection methods focusing on artificial intelligence (AI) algorithms. In 2015, deep learning, as AI's core technology, exceeded humanlevel-performance on the ImageNet image classification task with one million images for training [17], and several disciplines have applied its implementation. The deep learning innovations in the field of radiology could help obtain an accurate diagnosis. As a result of the recent advancement of convolutional neural network (CNN) in medical image analysis [18,19], automatic detection algorithms using deep learning have become feasible methods for medical image classification and segmentation. Deep learning algorithms have been widely implemented in many medical cases such as detection of lung diseases, breast cancer detection, skin cancer classification, brain disease classification, breast cancer detection, brain disease classification, fundus image segmentation, retina disease, and pneumonia detection from chest X-ray images [20,21]. In addition, deep learning technologies could be beneficial in eliminating drawbacks, including a low supply of RT-PCR testing kits, a charge of PCR testing, and waiting time of lab results.
This research presents a deep learning architecture for the automatic detection of COVID-19. The proposed model utilizes a pre-trained ResNet-34 and multi-kernel CNN block to combat the imbalanced dataset. The model is trained with raw chest X-ray images dataset to detect COVID-19. The dataset for chest X-ray images contains 125 COVID-19 patients, 500 normal patients, and 500 pneumonia patients. The layout of the paper is as: Part two describes the architecture of the proposed model and the chest X-ray images dataset in detail. The experimental results and discussions are described in part three. The final part gives a conclusion to the research.

DATASET AND METHOD
In this section, a detail description including dataset, the architecture of deep learning network, multi-kernel CNN block, and experimental setup will be given. The flowchart of the experiment that conducted in this research is shown in Figure 1.

X-ray images dataset
The availability of data is the first step to initiate any diagnostic tool based on deep learning method. As COVID-19 infects the epithelial cells that line the respiratory tract of patients, X-rays images can utilize to inspect the condition of a patient's lungs. In this research, the data of chest X-ray images were collected from two primary datasets to perform the detection of COVID-19 [22]. The first source of image dataset was established by Joseph Paul Cohen (available at https://github.com/ieee8023/covid-chestxray-dataset) using chest X-ray images of COVID-19 patients which collected from various open access sources such as online publication, Figure1.com, Radiopaedia.org, and the Italian Society of Medical & Interventional Radiology. In [21], the purpose during this collection process is to maintain the quality of the images. This source of 2469 dataset regularly provides updated images supported by researchers from various countries. There are currently 125 chest X-rays identified with COVID-19 in the dataset consisting of 82 male cases and 43 female cases that found to be positive. However, all information about COVID-19 patients in this dataset was not entirely provided. For instance, the age information, intubated status, temperature measurement, and survival status from the majority of COVID-19 positive patients was not revealed. The second source of the chest X-ray dataset utilized normal and pneumonia X-ray images provided by Wang et al. [23]. In order to train the proposed method with an imbalanced dataset, 125 COVID-19, 500 normal, and 500 pneumonia images are used entirely at random. The dataset utilized the scikit-learn module (train_test_split) in PyTorch to split the dataset randomly. Figure 2 shows COVID-19, normal, and pneumonia cases obtained from those two sources of a dataset.

The proposed network and multi-kernel CNN block
The rapid development of artificial intelligence is very closely related to innovation in deep learning technology. Deep learning, particularly in the field of computer vision, such as object detection, image classification, and image & video recognition, has achieved human-level performance for various image datasets [24]. Its primary algorithm, which is constructing and creating a deep neural network (DNN), is named a convolutional neural network. The aim of CNN is to decrease the input images into a configuration that is easier to process without losing significant features. Therefore, CNN is very useful in designing an architecture that is not only good at learning features but also scalable for large datasets. The CNN model is created by combining one or more such hidden layers; then, its parameters are updated to perform a specific task. In this research, rather than building a new deep learning model from scratch, the recommended method is to create a model using a pre-trained model and proven architecture. Thus, when building the architecture of the deep learning, pre-trained ResNet-34 and multi-kernel CNN block architecture [25,26] are combined as the core idea of this study. The major challenge of detecting COVID-19 in the X-ray images is that the deep learning model has to classify the images with fine details. Generally, deep learning with a large number of hidden layers is crucial to image segmentation for the extraction of features in object detection tasks. However, the classification task should have a design that can learn and capture slight changes of features instead of being very deep in layers. To address this issue, the proposed method utilized transfer learning by using ResNet as one of the most popular methods in developing deep learning models. In transfer learning, the deep learning is trained in two stages; i) pre-trained, where the model network usually is trained on a large-scale benchmark database representing a wide diversity of categories and ii) fine-tuning, where the pre-trained model is implemented and trained on the particular task of interest, which have fewer labeled target than the pretraining database. The pre-trained step assists the model in learning common features that can be reused on the specific target task. This kind of two-stage model has been employed in many settings of the application, and particularly in medical imaging. Basic architectures designed for ResNet-34 with corresponding pre-trained weight parameters are adjusted on medical tasks, specifically in this research, to detect COVID-19. Besides, transfer learning is also exploited to overcome an imbalance dataset for training deep learning models [27]. Figure 3 shows an overview of the proposed model architecture for this research. The main structure of the proposed model consists of three main components: pre-trained ResNet-34, four multi-kernel CNN blocks, linear output layer. The input of the chest X-ray image is processed by extracting features using pre-trained ResNet-34. However, from the four main blocks which construct ResNet-34 architecture, only the first block is retained as a feature extraction module followed directly by an average pooling layer. This procedure aims to capture different levels of features and prevent the network from going deeper. Additionally, ResNet is chosen due to the skip connection, which can prevent the gradient from vanishing and accelerate the network convergence.
Afterward, the multi-kernel building blocks continue to process the images from previous pretrained stages. These blocks are adopted from glaucoma detection tasks by using attention mechanisms to fixate on the region of interest (ROI) [26]. Focusing on small ROI in chest X-ray images makes the network working effectively due to reducing the redundancy of the same region. As a result, the attention mechanism can be refined and then used to identify the most important regions for COVID-19 detection. In [24], the input of the network used a mask image that already had located the pathological area for outputting the positive and negative labels of glaucoma. For this research, raw chest X-ray images are employed as input images then forwarded to pre-train ResNet-34 to extract features, and multi-kernel CNN blocks highlight the salient region of the images. By following this structure, the proposed model can resolve the imbalanced dataset.
The specific components that construct each of the multi-kernel CNN blocks are shown in Figure 4. There are six convolutional layers (C1, C2, C3, C4, C5, and C6) operations with four different kernel sizes as 1x1, 3x3, 5x5, and 7x7. By concatenating four channels of convolutional layers C1, C2, C3, and C4, multiscale features can be extracted to identify the salient area in the chest X-ray images. Note that all convolutional operations within multi-kernel CNN blocks are followed by batch normalization and then a ReLU operation to increase network non-linearity in order to accelerate the convergence rate. At last, two linear layers are added to produce the result of classification.

Experimental setup
There are two different scenarios conducted to test how the proposed model detects and classifies COVID-19 using chest X-ray images. First, the proposed architecture is trained to classify chest X-ray images into two classes: COVID-19 and Normal categories. For this scenario, 625 chest X-ray images (500 normal and 125 COVID-19) are randomly divided into 80% training, 10% validation, and 10% test sets. Second, the proposed architecture is trained to classify three categories: COVID-19, normal, and pneumonia categories with the same partition as the first scenario of a total of 1,125 images (500 normal, 500 pneumonia, and 125 COVID-19). Before training on the network, all the chest X-ray images are resized to 224×224. Due to the limited number of training images, data augmentations are performed, including random rotation and affine transformation.
Experiments are performed on a public PyTorch framework in Windows 10 using a computer with an Intel(R) Core (TM) i7-7700K CPU@4.20 GHz, 16 GB of memory, and a single Nvidia GeForce GTX 1080 GPU which has 8 GB VRAM. Benefiting from the Nvidia GPU, the training process is capable of computing a huge amount of datasets and accelerating training time. Hence the large computational operations in terms of memory using GPU is better than the CPU. During the training phase, the proposed model employs stochastic gradient descent (SGD) with batch size 16 due to SGD optimization achieves a better performance based on [28]. However, in order to obtain better performance in each iteration and to avoid over fitting, early stopping regularization is utilized in this experiment. Also, cross-entropy is chosen as the loss function, and 0.001 is selected as the learning rate (LR), but the comparison for different LRs is shown in the next section. The code and dataset link is released publicly at: https://github.com/naimji/Multikernel-CNN-Block-Based-Detection-for-COVID-19-with-Imbalance-Dataset.

RESULTS AND DISCUSSION
In this section, the performance of the trained proposed model is evaluated by using the confusion matrix (CM) and then derived different metrics from CM. Specifically, the metrics of sensitivity, specificity, precision, F1-score, and accuracy are defined as: As mentioned in the previous section, the dataset was divided into 80% training, 10% validation, and 10% test phase for two different scenarios. Note that the validation and test phases did not implement the data augmentation method. For the evaluation process, the performance of the proposed model was assessed during the validation phase by using these five metrics. However, only the accuracy metric for performance evaluation during the test phase was utilized by the remaining 10% of images. Figure 5 shows the curve of training loss, validation loss, and accuracy for binary classification with two different learning rates. Figure 5(a) indicates that there is a notable increase in the loss of training values for LR 0.001, which occurred randomly in some epochs. The leading cause for this significant increase is related to the amount of data in the COVID-19 images, which is much smaller than the normal category (imbalanced dataset). There is no significant increase in training loss compared to LR 0.0001, but the accuracy value is lower than LR 0.001. In general, a large learning rate allows the model to learn faster while a lower learning rate allows the model to learn more optimally, but it takes significantly longer to train. Therefore, an early stopping checkpoint is beneficial for LR 0.001, due to the requirement of small epochs to achieve a global optimum compared to LR 0.0001. The confusion matrix in Figure 6 shows that the proposed model detects and classifies two classes with overall excellent performance at two different learning rates. The proposed model achieves 100% accuracy from 63 images with 52 normal images and 11 COVID-19 images for the LR 0.001. However, there is one misclassified image using LR 0.0001, which still relates to how the optimization process takes a longer time to achieve the global optimum. By using early stopping, the training and validation phase is guided as to how many iterations can be run before the model begins to over fit.
Next, Figure 7 presents three class classification curves with identical problem as the two class classification curves. There was a significant increase in training loss values for LR 0.001 and 0.0001. Particularly in the case of LR 0.001, the sharp increase occurred more frequently in some epochs than in two class's classification. Considering the number of data in the COVID-19 category that is significantly lower than normal and pneumonia categories during the training phase. Whereas, the validation losses in both LRs consistently decrease, which shows how the proposed model overcomes the effects of imbalanced datasets.
The performance of classification for both LRs is slightly different, which can be noted from the confusion matrix in Figure 8. For the LR 0.001, the misclassified images are found in normal and pneumonia categories with a total of 10 images while only one image for COVID-19, which is wrongly predicted as pneumonia. There are 11 images that incorrectly predicted for pneumonia or normal categories by using LR 0.0001 followed by pneumonia predicted as COVID-19 and COVID-19 predicted as pneumonia with total of 3 misclassified images.   Tables 1 and 2 show the performance metrics of the two scenarios derived from the confusion matrix. The best performance of two class's classification is achieved by using LR 0.001 which obtained sensitivity, specificity, F1 score, and accuracy values of 1.00, 1.00, 1.00, 1.00, and 100%, respectively. Also, the three classes classification achieve the best performance with 0.91 sensitivity, 0.94 specificities, 0.93 precision, 0.87 F1-score, and 93.51% accuracy.  Tables 3 and 4, which employed the best parameters from the training and validation phase. The performance of test accuracy predict 46 correct images with average accuracy of 95% in binary classification. While, for three classes classification, 94 images were correctly predicted with average accuracy of 83%. Overall, the proposed model shows the excellent performance in two scenarios and proves that the model can overcome an imbalanced dataset. However, to make the proposed model more robust in three classes' classification or multiclass classification, a large database is required to train it. Furthermore, our model needs to be validated by external health organizations or professional radiologists to be implemented in real conditions.

CONCLUSION
In this paper, a combination of pre-trained ResNet-34 and multi-kernel CNN block is introduced to detect and classify COVID-19 cases using an imbalance dataset. Pretrained ResNet-34 extracts the main features of chest X-ray images as input image, then multi-kernel CNN block highlights the salient region of the images. By following this structure, the proposed model can resolve the imbalanced dataset. The proposed model is capable of performing two and three class classification tasks with an accuracy of 100% However, the effectiveness of the proposed model needs to be examined by professional radiologists and evaluated in a larger dataset. Consequently, our model will solve these issues to create a more robust and accurate model in the next research, particularly in multiclass classification.