A novel CAD system to automatically detect cancerous lung nodules using wavelet transform and SVM

Received Jan 29, 2019 Revised Mar 23, 2020 Accepted Apr 30, 2020 A novel cancerous nodules detection algorithm for computed tomography images (CT-images) is presented in this paper. CT-images are large size images with high resolution. In some cases, number of cancerous lung nodule lesions may missed by the radiologist due to fatigue. A CAD system that is proposed in this paper can help the radiologist in detecting cancerous nodules in CTimages. The proposed algorithm is divided to four stages. In the first stage, an enhancement algorithm is implement to highlight the suspicious regions. Then in the second stage, the region of interest will be detected. The adaptive SVM and wavelet transform techniques are used to reduce the detected false positive regions. This algorithm is evaluated using 60 cases (normal and cancerous cases), and it shows a high sensitivity in detecting the cancerous lung nodules with TP ration 94.5% and with FP ratio 7 cluster/image.


INTRODUCTION
Lung cancer is one of the most relevant public health issues in United states, Europe and Middle East [1,2]. Early Detection and treatment of this types of cancer is require to effectively overcome this burden. As an initial and cheapest method to detect the nodule lung cancer is Chest X-ray. Computed tomography (CT) as a second diagnosis stage is the best imaging modality for the detection of small pulmonary nodules, particularly since the introduction of the helical technology [3,4]. The CT images are a high resolution images with high amount of data storage. Therefore, researchers tries to help the radiologist to easily process these huge image and automatically detected the potential nodule lung cancers using computer aided diagnosis system (CAD) [5,6].
Detection nodule lung cancer is one of the most difficult cases for the radiologist specially in CT images since they are closely connected to the surrounding parenchymal tissue [7,8]. Therefore, visual appearance for cancerous nodules have similar visual characteristics of normal tissues [9]. Therefore, this paper proposed a novel detection and classification method for cancerous cells. Three main stages are used to accurately detect the cancerous nodules in the CT lung images [10].
The paper present a brief description about the lung CT image in section 2. Then literature review section which presented in section 3. Proposed detection and classification algorithm is introduced in Section 4. Finally, discussion and conclusion are presented in Section 5 and Section 6 respectively.

RESEARCH METHOD
The novel CAD system is used to accurately detect the cancerous nodules in CT-image with minimum number of false positive regions. This technique is mainly divided into four stages. Enhancement the suspicious regions using adaptive average method is the first stage. Detection region of interest algorithm that have the cancerous nodule is implemented in the second stage. In order to reduce the detected false positive regions, four wavelet features are used to reduce the detected FP regions. These wavelet features are used with SVM in order to classify the detected regions as FP and TP regions. The proposed algorithm is shown in Figure 1

CT-image enhancement
The fundamental operation needed to assist cancerous nodule in CT-image is contrast enhancement. In many image processing applications, the Laplacian filter is one of the simplest and effective techniques for intensity enhancement that presented in as (1). Laplacian filter improves contrast of the cancerous nodules in CT-image by applying the Laplacian mask of size 9 where S(x,y) is the intensity value for the processed image, f(x,y) the intensity value for the input image and c is consider as one in this paper. Cancerous nodules appear on digitized CT as small regions of size less than 4 mm [23], with intensity values higher than their surrounding background. So it is not easy to enhance the cancerous nodules regions since surrounding lung tissue makes the abnormality areas almost invisible. Therefore, the modified average filter is implemented to smooth the edges of the processed Laplacian enhancement image [24]. This in case will slightly enhance the Cancerous nodules regions to be easily detected in the next stage.
After extensive analysis of 60 Cancerous CT images, we concluded that all cancerous nodules have grey scale values in the range from 80 to 230. In accordance with these observations, each CT-image is processed using the modified average filter that presented on (2).
where Sk is the intensity value for the processed image, rj the intensity value for the input image and m and n are the mask size. After processing the Laplacian and modified average filter, the cancerous nodule regions become slightly brighter corresponding to the neighbor regions as shown in Figure 2. This will assist in detection the region of interest that will be discussed in the next section.

Potential region of interest
In order to detect cancerous nodules region, two concentric circular masks are used as shown in Figure 3. When centered of the cancerous nodule, the inner masked region included the cancerous nodule while the outer masked region included the surrounding region. Based on CT-Image resolution which is 45 µm × 45 µm, the inner mask of the filter is determined. Whereas, the outer mask size is computed as trial and error and it was found that mask of size 120 µm × 120 µm is significant to be used in this case [25].

Figure 3. Two concentric circular masks
The two concentric circular masks was tested on 60 CT-images and it found that all cancerous nodules are detected with large number of false positive regions. Detection the PROI cluster is designed based on the fact that the cancerous nodules are brighter than the neighbor pixels. Therefore, in order to select PROI two conditions should be satisfied, average value for the inner mask should be greater than outer mask and the intensity pixel value of the center of the inner mask should be the highest intensity in the mask.
After processing 60 CT-images using PROI algorithm, it was noticed that all cancerous nodules in the CT-images are detected but many detected false positive (FP) regions as shown in Figure 4. This in case will reduce the sensitivity of the proposed CAD system. Therefore, wavelet features will be applied to reduce number of detected FP clusters and increase the sensitivity of this CAD system.

Wavelet features
Wavelet Daucechies (BD4) transform is used in this paper to generate the wavelet coefficient that will be used to classify the detected regions to TP and FP regions. ROI of size 13 × 13 pixels is implemented using wavelet DB4 to generate TP wavelet coefficient features which are minimum_value and maximum _value of the coefficients, average_value of the coefficients, and standard_deviation between the coefficients. These features are extracted from low frequency image at Daucechies (BD4) with level 2 as shown in Figure 5.

Support vector machine
Classification and regression technique is used in this paper to classify the TP and FP region using support vector machines (SVM). SVM is known to be an excellent tool for binary classification problems, similar to the one here, by seeking the optimal separating hyperplane that provides efficient separation of the data and maximizes the margin. SVM is mainly divided into two main stages: collecting data stage and learning process stage. These stages are presented as follow:

Collecting data stage
In SVM both input and output data should be known. So, the four wavelet features are extracted from 50 normal regions and 60 cancerous nodule region. As a results, we generate an input vectors: four vectors of size 50 for normal cases and four vectors of size 60 for cancerous cases. On the other hand, two outputs nodes are used to classify the cases to normal and abnormal cases based on output vector of 0.1 and 0.9 for normal and abnormal cases respectively.

Learning process stage
The values of the input matrix are arranged as a training vector in a manner similar to the Jackknife technique, where 70% of the inputs data were used for the SVM training phases and the remaining 30% were used for the SVM testing phases. ANOVA kernel technique is considered as a best classification tool comparing with other kernel methods. The ANOVA kernel, which is shown in (3), has two parameters, the gamma () parameter and the exponential degree (d) parameter. These two parameters control the shape of the kernel.
Therefore, different experimental results are implemented to find the optimal value to gamma and the degree as shown in Figure 6.

ALGORITHM EVALUATION
The cancerous nodule enhancement and detection algorithm is applied on 60 CT images. Then, four wavelet coefficient features are generated. The algorithm is subjectively evaluated using three radiologist, where number of detected FP regions are counted per image. Also TP percentage of each image is also recoded. Finally, the average of detected FP region and TP percentage is presented in the Table 1 after processing 60 CT images. From Table 1, it is clearly noticed the proposed CAD system can accurately detect the cancerous nodules with minimum number if detected FP regions. So the TP ratio achieved 94.5% with FP rate 7 clusters/ image. Figure 7 show complete CAD system detection stages. Original image Image Enhancement Image with PROI Figure 7. Accurate detection of cancerous nodules 6. CONCLUSION An adaptive CAD system that is used to accurately detect the cancerous nodules in CT-images is proposed in this paper. The proposed algorithm is divided into four stages. The cancerous nodules are enhanced using the Laplacian filter. Then, the average filter is modified based on the lower and upper grey levels of the cancerous nodules in the CT images. This incase, slightly enhances the cancerous nodules in the mammogram images. In the second phase, the potential region of interest is detected based on the visual appearance of cancerous nodules regions in CT-Images. The processed CT-image has many detected FP  regions, therefore, SVM and wavelet features are used to reduce the detected FP regions. So four wavelet features are generated then SVM is implemented to classify the detected regions to TP and FP regions. As a result, the proposed algorithm is subjectively and objectively tested on 60 CT images and it shows that is algorithm can detect the cancerous nodules with an average rate 94.5% with FP regions of 7 cluster/image.