Calculating the area of white spots on the lungs of patients with COVID-19 using the Sauvola thresholding method

Coronavirus desease 2019 (COVID-19) is a pandemic that has occurred in the world since 2019. Researchers have carried out various ways in dealing with this disease, starting from the screening stage to the stage of treatment and therapy for COVID-19 patients. As the gateway to the COVID-19 problem, screening has an essential role in a diagnosis that leads to appropriate treatment. In this paper, we will focus on the screening stage using digital image processing techniques, namely in calculating the area of white spots in the lungs of COVID-19 patients. The white patches are an early indication of how badly COVID-19 is attacking the patient. We use X-Ray Thorax image objects as research data in this paper. Although the current experimental results show that this method has a successful performance of 71.11%, it is pretty promising for further development due to its simplicity.


INTRODUCTION
At the end of December 2019, there was an outbreak of an unknown pneumonia disease with no known cause in Wuhan, Hubei Province, China. A group of patients is admitted with an initial diagnosis of pneumonia of unknown etiology; these patients are epidemiologically associated with seafood and wet animals from a wholesale market in Wuhan, Hubei Province, China [1], [2]. In early January 2020, the virus that causes this mysterious pneumonia was identified as a new type of coronavirus (nCov) named Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2), while the name of the disease is called coronavirus disease 2019 . COVID-19 has spread globally rapidly, so that in March 2020, COVID-19 was officially declared a pandemic by the World Health Organization (WHO) [3]. The number of COVID-19 cases in Indonesia as of November 2021 is the number of confirmed positive patients; 4,251,423 people, 4,099,399 recovered patients, and 143,685 dead patients [4].
The SARS-Cov 2 virus can be transmitted through physical contact and respiratory droplets. This virus attacks the human respiratory system, especially the lungs. At the beginning of infection, the victim will experience general symptoms such as fever, cough, fatigue, difficulty breathing, sputum production, dyspnoea, hemoptysis, headache, diarrhea, and lymphopenia. These symptoms will appear after an incubation period of about 5.2 days, depending on the condition of the immune system and the patient's age [2]. In order to detect the SARS-Cov 2 virus in the body, there are two methods can be done, namely a rapid test by taking a blood sample to check whether Immunoglobulin G (IgG) and Immunoglobulin M (IgM) antibodies are formed in the body and a polymerase chain reaction (PCR) swab test by testing a sample of mucus produced by the body taken from the nose and throat [5]. In addition to conducting a clinical examination, diagnosis of lung disease can be made through a chest x-ray, the image of the Thorax X-ray will be diagnosed to determine the patient's lung condition. On X-ray results, normal lungs will look like black shadows. However, in patients infected with the coronavirus, white spots indicate the presence of fluid in the lung cavity, known as ground glass opacity (GGO). GGO in patients with COVID-19 is located in the periphery or posterior, especially in the lower lobe; GGO with inter/intra-lobular septal thickening or bilateral, peripheral, and basal consolidation can also be found. This accumulation of fluid can cause the sufferer to have difficulty breathing and even cause death [6].
Some research regarding GGO are as follows: Kang et al. [7] conducted preliminary research on optimizing computed tomography parameters in detecting CGNs in lung cancer screening cases. Shao et al. [8] performed feature extraction on PET images and CT scans to monitor the growth of invasive adenocarcinoma (IAC) in early-stage lung cancer. Huang et al. [9] attempted to clarify the difference between pure GGO nodules and prognosis by using patients who had pure GGO to participate in their research. In addition, they also reviewed 404 lung cancer patients who had received cancer resection from July 2014 to March 2015 to verify the conclusions of their research. Ichikawa et al. [10] investigated the relationship between GGO visibility and signal-to-noise-based physical detection index in the low dose computed tomography (LDCT) model by analyzing a set of images obtained from 12 types of multidetector row computed tomography (MDCT). Hotta et al. [11] conducted a study by reviewing 34 adenocarcinoma patients with multiple ground-glass nodules in the Southeast Asian population in order to obtain individual characteristics. Xue et al. [12] conducted a study to determine the relationship between neutrophillymphocyte ratio (NLR) and the growth of GGO in lung cancer. The method they used included all patients with acute renal failure (ARF) in this study, monitored, and followed up on the patients based on the variation in ARF growth recorded. The parameters used were age, sex, smoking history, histology, tumor size, and stage of cancer present in the patient. Next, they calculated signal to noise using SPSS software. Chen et al. [13] identified the GGO model's innovation by sequencing the size of the GGO to determine the priority scale of surgical operations by doctors based on the results of CT scans. Li et al. [14] carried out research to extract CT features associated with ground-glass nodule pathology so that they could provide an accurate diagnosis. The method used was patients with ground glass nodules from March 2016 to October 2019 who had undergone surgery and then monitored their GGO progress based on the extracted CT features. Cheng et al. [15] reviewed synchronous multiple primary lung cancer (SMPLC) cases in patients who had undergone surgery and then underwent epidermal growth factor receptor tyrosine kinase inhibitors (EFGR-TKI) within 12 months of surgery. Wang et al. [16] in their research, built an objective and accurate prediction in assessing the pathology of GGO by extracting the parameter features of p53 expression. Ye et al. [17] conducted research using deep learning methods in identifying GGO. Their research used images from the lung image database consortium and image database resource initiative (LIDC-IDRI). Qu et al. [18], Their research is motivated by the increasing number of cases due to GGO. So, they researched to investigate and evaluate surgical resection procedures related to ARF cases in the hospital where they worked. Firmino et al. [19] highlighted the importance of conducting a review of the use of computer-aided diagnosis (CAD) in identifying lung cancer, particularly the identification of GGO. Chillakuru et al. [20] developed a deep learning model to evaluate computer vision in identifying axial slices of the lung for less surgical resection. Pizzi et al. [21] extracted radiomic features from CT scanned GGO images using machine learning as an early diagnosis of acute lung disease. Toledo et al. [22] developed a small optical depth sensor (ODS) instrument that collects the daily average aerosol optical depth (AOD) and detects cloud characteristics both on Earth and on Mars from their observatory. Yi-Feng et al. [23] evaluated lung biopsies' diagnostic performance and safety under CT fluoroscopy control by performing automated biopsies on several patients. Wang et al. [24] analyzed high-resolution computed tomography (HRCT) features of pure ground-glass nodules (GGN) to treat patients with adenocarcinoma. Peng et al. [25] conducted research using the lung inflammation index to score the level of lung inflammation associated with the severity of COVID-19.
Referring to all the research results above, we can be concluded that GGO can be a reference in predicting the presence of abnormalities in the lungs. Various methods have been carried out in analyzing the relationship between GGO and lung-related diseases. Previously, we had conducted research based on image processing on several types of medical image modalities [26]- [29]. However, to the best of our knowledge, there has been no research on the relationship between lung infection and COVID-19 based on X-ray image segmentation of the lungs, especially the presence of GGO. In other hand, GGO in the lungs can be analyzed using image processing digital, where currently digital image processing is growing rapidly and can be used in the medical world to analyze X-ray images, so it can assist medical personnel in identifying an abnormality or disease. Therefore, this paper develops a system for determining the degree of lung infection due to COVID-19 using the Sauvola thresholding method. In addition, this system will calculate the area and number of white spots found in the lungs of COVID-19 sufferers. The purpose of this research is to transfer the knowledge of medical personnel who are experienced in detecting white patches X-rays of the lungs into a system so that less experienced medical personnel can detect them more quickly and accurately.

PROPOSED METHOD
In this research, we focus our algorithm on pre-processing a lung X-ray image based on the tuned tri-threshold fuzzy intensification operators method before segmenting; then, we will segment the lungs and white patches on the X-ray images of the thorax using the Sauvola thresholding method. After successfully segmenting the lungs, we calculated the lung area and spots white with pixel units using the Sauvola thresholding method. Then we evaluate the performance of the Sauvola thresholding method in image processing X-rays of normal lungs and X-ray images of the lungs of patients with COVID-19. The algorithm proposed in this paper is described in Figure 1.

. Data
In this experiment, we used secondary data in the form of X-ray images of the lungs which became the standard database in research with lung image objects for normal patients. As for COVID-19 patients, we also use X-ray images of the lungs, which is also a database of lung objects affected by COVID-19. There are two kinds of data type and divided into two groups of data are used: X-ray image of the lungs of a patient with COVID-19 obtained from the Italian Society of Medical and Interventional Radiology [30] and X-ray image data of patients' lungs normal data obtained from the websites www.kaggle.com [31], and radiopaedia.org [32]. The amount of data used in this study is as much as 95 data, of which 45 data are X-ray image data of the patient's lungs COVID-19 and 50 other data are X-ray image data of normal patients. Table 1 shows an example of the research data we used in this experiment. In this research, we only use 95 X-ray images of the lungs because this number is sufficient to represent the object's condition in actual conditions. We hope that with the pilot data of 95, this data will be used as a template when implementing this system on actual data in relevant conditions.

Tuned tri-threshold fuzzy intensification operator
This method will modify the histogram value using fuzzy techniques to increase the sharpness of the image. This method uses a simple fuzzy membership function that assigns the pixel value of a given channel to a range between zero and one depending on the threshold value. This method will be applied to each color channel of the image to be processed to obtain an image with smooth and precise color quality. This stage begins with cropping the input image. Cropping algorithm using the polygonal method crop, where the process is to cut an image using a rectangular shape many to be mapped using points with coordinates X and Y. The function of cropping in this experiment is to separate X-ray images of the lung organs from other areas using the polygonal crop method. By using the polygonal crop method, the mapping will be carried out using points with X and Y coordinates which will form a polygonal shape that can select parts of the lungs from the X-ray image, as shown in Figure 2.

Sauvola thresholding
Segmentation is a process to obtain the area of the desired object in an image by separating the object from its background. This separation process aims to facilitate the classification and area calculation processes more precisely and accurately [33]. Thresholding is an image segmentation method in which the process is based on differences in the image's gray level to separate the object and its background. An image resulting from thresholding can be presented in the form of a histogram to determine the distribution of pixel intensity values in an image-specific part of the image so that the histogram can be properly partitioned and the threshold value can be determined [34]. Sauvola thresholding is the development of the Niblack algorithm. Sauvola is a valuable local thresholding technique for images with non-uniform backgrounds, especially for text recognition. This method will calculate multiple thresholds for each pixel using a unique formula that considers the mean and standard deviation of the local environment [35]. In the Sauvola method, the threshold value T (x, y) is calculated using (1): with ( , ) is thresholding, is parameters that are positive in the range [0,2, 0,5], is maximum value of standard deviation, and ( , ) is standard deviation.

RESULTS AND DISCUSSION
Pre-processing is the initial process in digital image processing that aims to improve image quality by removing noise, increasing contrast/brightness, sharpening object edge intensity, and removing blurry effects. In this research, pre-processing is carried out because the input image contains noise, affecting the experimental results. In this research, all input images will be resized to a size of 500×500 pixels. As described above, in the pre-processing, we used the tuned tri-threshold fuzzy intensification operator method to improve image quality. Figure 3 is an example histogram of an X-ray image of the lungs of a COVID-19 patient. This histogram serves to express the distribution of the pixel intensity of an image. Figure 3(a) is an original image, Figure 3(b) is a histogram of the original image, while Figure 3(c) is an image histogram after processing using a tuned tri-threshold fuzzy intensification operator. If we compare the histogram in Figure 3(b) with Figure 3(c), it can be seen that there are differences in the distribution of the intensity of the image pixels. In Figure 3(b), it can be seen that the distribution of pixel intensity is limited to a specific value. In contrast, in Figure 3(c), it can be seen that the histogram with pixel intensities is evenly distributed over the entire range. This case indicates that after processing using the tuned tri-threshold fuzzy intensification operator method, an image with better quality will be obtained than the original image. In this research, not all parts of the X-ray image of the lungs are used; only parts of the lungs are used, so parts other than the lungs will be discarded. Therefore, cropping is done using the polygonal crop method to determine precisely which part of the image contains the desired object area to be separated between the required object area and other parts that are not needed. This case can help detect the desired part, namely the lung part. As described above, the algorithm used to perform cropping uses the polygonal crop method, an example shown in Figure 1.
The segmentation in this research uses the Sauvola thresholding method, which is a modified local thresholding technique from the Niblack method [35]. The selection of the Sauvola thresholding method as a segmentation method is very fast in computing the threshold for each n-pixel. In addition, this Sauvola method can be used to segment images with non-uniform and blurred backgrounds. The Sauvola Thresholding method identifies image pixels based on the contrast approach at the edges of the image to minimize background variations. Table 2 is examples of the X-ray image segmentation results of the lungs of patients with COVID-19. The image used in this segmentation process is the cropped image.
Post-processing is the final stage in image processing, where the system can recognize the processed image. In this research, post-processing was carried out to mark the white spot objects in the X-ray image of the lungs using the labeling method. In this research, the labeled object is the white spot object contained in the image resulting from the white spot segmentation. After being segmented, the area of the lung object and white spots is calculated, as well as counting the number of white spots. The system will determine the coordinates, and the number of white spots detected for further search for parameters such as centroid, area, perimeter, and rectangle coordinates. Then using these parameters, the system can label the red rectangle on the white spot object and calculate the lung area and white spot. To calculate the percentage of white spots using (2). Table 3 shows an example of labeling results on X-ray image segmentation of the lungs using the Sauvola thresholding method.
(2)  Referring to Table 3, in some data samples obtained test results that are not under the initial hypothesis, where should the X-ray image data of patients with COVID-19 obtained a smaller lung area, a larger white spot area, and the number of white spots are more when compared to the test results on normal patient lung X-ray images, and vice versa. In addition, several anomalies of test results from data samples, both lung X-ray image data of patients with COVID-19 and normal patients, can be seen in Table 4. The inaccuracy of the test results with this initial hypothesis can occur due to various things, one of which is image quality. Image quality consists of several parameters, including brightness, contrast, sharpness, and the image's resolution is not good. In this experiment, we use pre-processing to improve image quality, but there is no improvement in quality in some data after pre-processing. This case can happen because the quality parameters of the image are inferior, so the quality cannot be improved using the pre-processing method contained in the system. In addition to the parameters mentioned above, another thing that can cause the poor quality of an X-ray image is the poor quality of the machine and the paper film used during the X-ray process, so that the final result is an image with poor quality.
Based on the sample test results in Table 4, several test results were not following the initial hypothesis. For example, in the X-ray image of the lungs of patients with COVID-19, the parameter values of the test results in the form of area and number of white spots are the same as the test results for the normal patient lung. Even if the image is seen visually, the image appears to have many spots white. On the other hand, in some test results found in normal lungs, the parameter values of the test results in the form of area and number of white spots are the same as the test results for the lung test data group of patients with COVID-19, even though if the image is seen visually, the image appears to have clean lungs. For the system to work optimally and obtain segmentation results following the initial hypothesis used as a reference for system test results, it is necessary to have an input image with image quality parameters such as brightness, contrast, sharpness, and resolution-the good one. In addition, because there is a process of resizing the image to a size of 500×500 pixels, the input image should be at least the same size as the resize resolution of 500×500 pixels. The image does not experience information degradation. We compared segmentation using the Souvola Thresholding method and conventional segmentation methods to measure the algorithm's performance that we propose in this paper; the results are shown in Table 5.
According to Table 5, it can be seen that there is a significant difference in lung segmentation using the conventional segmentation method compared to the Souvola Thresholding method. In the conventional segmentation method, we cannot calculate the lung area as a whole, but we must calculate the right and left parts of the lung one by one. Calculating lung area separately for right and left gives results in the system not being able to automatically calculate the area and number of white spots on the lungs. Even though these two variables are pretty decisive in classifying the severity of COVID-19 later, meanwhile, by using the Souvula Thresholding segmentation method, we can measure the lung area as a whole without having to separate the left or right parts so that it will be easier to calculate the number and area of white spots in the lungs. Based on the advantages of using the Souvola Thresholding method, we used this method in calculating the lung area and the white spots contained therein. The implementation of the Souvola thresholding method in calculating lung area, area, and the number of white spots is shown in Table 6.
Referring to Table 6, in the Sauvola thresholding method, the test success rate on chest X-ray images of patients with COVID-19 is 71.11%, and the test success rate on chest X-ray images of normal patients is 54%. From the results we got, it can be concluded that the Sauvola thresholding method obtained an incomplete success rate in testing chest X-ray images of patients with COVID-19. This case can happen because the Sauvola thresholding method has several disadvantages for images that have low contrast; some objects from images that have low contrast will be lost, so that it will affect the accuracy of the segmentation results. In addition, when calculating the threshold of n-pixels, the Sauvola method uses interpolation for other pixels to speed up the computational process, thus reducing the accuracy in thresholding [34]. Apart from the factor of the Sauvola thresholding method algorithm, other factors that affect the accuracy of the test's success are that there are data anomalies in the input image used in the study, the poor quality of the input image will cause test results that are not as expected.

CONCLUSION
In this research, pre-processing uses the tuned tri-threshold fuzzy intensification operator method, which modifies the histogram value of an image using a fuzzy technique carried out before the segmentation process can be used to improve the quality of the chest X-ray image so that the results are very influential in detecting lung objects and white patches from a chest X-ray image. Meanwhile, in the segmentation process, we use the Sauvola thresholding method, which is used to segment the lungs, and the white spots found on the X-ray image of the lungs can produce good segmentation results. However, in calculating the percentage of white spots in lung X-ray images using the Sauvola thresholding method, the average percentage of white spots for patients with COVID-19 is 56.35837111%, with a test success rate of 71.11% and the percentage of white spots for normal patients. of 50.941716% with a test success rate of 54%. This value is not maximized, but after comparison was made on the average result value of the percentage of white spots in patients with COVID-19 and normal patients, a significant difference was obtained, namely 5.41665511%, so it can be concluded that the system is said to be successful in testing the two data samples that have been given. So, it can be said that the results of white spot segmentation using the Sauvola Thresholding method can simplify the process of analyzing the image of a thorax X-ray for COVID-19 sufferers, thus obtaining more accurate, precise, and thorough image information compared to the results of analysis using the human sense of sight. Although the results of the white spot segmentation test on thorax X-ray images of patients with COVID-19 using the Sauvola method are not optimal, the Sauvola method is sensitive to low-contrast images, the interpolation method used in thresholding, and the input image quality is not good. So that the quality of the input image significantly affects the experimental results. In addition, the accuracy and precision at the time of cropping are very influential on image segmentation results. Anomalies of test results that occur in some test data samples can be caused by the poor quality of the input image. However, overall, the results of this research can be a means of transferring knowledge from medical personnel who are experienced in detecting white spots on lung X-rays into a system so that inexperienced medical personnel can detect them quickly, precisely, and accurately.