Classification of Normal and Crackles Respiratory Sounds into Healthy and Lung Cancer Groups

ABSTRACT


INTRODUCTION
According to International Agency for Research on Cancer (IARC) [1], lung cancer is the most common cancer worldwide with more than 1.8 million new cases and 1.6 million deaths estimated in 2012. While in Malaysia, it is the third most common cancer after colorectal and breast cancers. There were 10,608 cases reported by Malaysian National Cancer Registry between 2007 and 2011 [2]. Majority of the patients present later stage of the disease ie. stage 3 or 4 diseases and therefore curative treatment is seldom an option and prognosis is poor. Due to its high prevalence in Malaysia, it is an utmost importance to have the disease detected at an early stage which would result in a higher chance of cure and possibly better survival. Sputum cytology and chest X-ray (CXR) has been used for screening of lung cancer and recently low radiation-dose helical CT (spiral CT) has been shown to be superior to conventional CXR [3]. Autofluorescence bronchoscopy is also one of the potential screening tools [4]. However, all the mentioned tests might not be simple, inexpensive and safe and not readily accessible in outpatient clinics.
Auscultation is a non-invasive, safe and inexpensive technique used to listen to lung and heart sounds. It is performed as clinical examination and it can provide useful information regarding lung condition. Computerized auscultation has overcome limitations of the traditional technique that uses an analog stethoscope. Classifications of respiratory sound through computerized auscultation have shown promising results for the diagnosis of various lung diseases [5]. Based on the review paper, lung sound analyses using computerized auscultation gave good results of sensitivity and specificity. Most of the analyses pre-processed the sound signal to reduce noises and extract useful features and use machine learning for classification. There were different filtering techniques used to reduce heart sounds from lung sounds such as wavelet transform [6], adaptive filtering [7], [8] and bandpass filtering [8]. For further analysis, fast Fourier transform (FFT) [9], short time Fourier transform (STFT) [10] or discrete wavelet transform (DWT) [11] has been applied to transform the signals into different domain such as frequency or time-frequency domain. Spectral representation of the signal makes it more useful to extract features required by learning algorithms. Respiratory sounds can be classified into normal and adventitious sounds. The adventitious sound can be further classified as discontinuous and continuous based on their characteristics [12]. For instance, crackles either fine or coarse are classified as discontinuous adventitious sounds while wheezes and rhonchus belong to continuous adventitious sounds [13]. Several studies have analysed respiratory sounds in asthma, pneumonia, chronic obstructive pulmonary disease (COPD) and idiopathic pulmonary fibrosis patients to characterize and classify as normal, wheeze, rhonchi, coarse crackles and fine crackles [11], [13]. Different types of classification methods have been employed to classify crackles for examples using Tsallis Entropy and Multilayer Perceptron [14], K-nearest Neighbor [15] and Support Vector Machine (SVM) [16].
Although many researchers have classified crackles sound, none has taken samples from lung cancer patients from the best knowledge of the authors. Only [13] have used samples from lung cancer but the samples were taken randomly from various pulmonary diseases for the analysis. In this study, the crackles sounds are extracted from lung cancer patients only.

METHODOLOGY
This section presents the methodology used in this study to classify between normal and crackles sounds in healthy and lung cancer patients. The proposed algorithm is shown in Figure 1.

Data collection and pre-processing of respiratory sounds
Data collection for this study was approved by medical ethics committee of University Malaya Medical Centre (UMMC) with reference number (MREC ID NO: 201698-4242). There were 20 normal subjects and 23 lung cancer patients participated in the data collection which took place at Clinical Oncology Unit, University Malaya Medical Centre. The participated patients have no co-existing respiratory related diseases. All the healthy subjects recruited are the non-smoker. All the subjects were given inform consent form and briefed on the study protocol. The respiratory sound of normal subjects and lung cancer patients was acquired using digital stethoscope by Thinklabs and saved as .au in a computer laptop using Thinklabs Phonocardiography by Audacity. The stethoscope was connected to the computer laptop via a sound card (Xonar U3) and the computer was disconnected from the main power supply during the recording. The sampling rate used was 11025 Hz. All the subjects were asked to breathe normally and the respiratory sound was recorded for about 20 seconds for each auscultation point. In total, there were twenty-two auscultation points, eleven points each at anterior and posterior of the chest wall including trachea as shown in Figure 2  Next, a bandpass filter with cut-off frequencies of 100 and 2000 Hz was applied to the raw respiratory sound signal to enhance the lung sound using Thinklabs Phonocardiography software. Filtering process is needed to reduce noises coming from the heart, muscle or ambient which are not related to the lung sound. Crackles sounds present in the lung cancer patient"s respiratory sound were identified manually. The respiratory sound cycle consists of crackles is extracted and exported as .wav to be read later by MATLAB for signal decomposition and feature extraction processes. There were 60 samples consist of crackles sound and 60 samples of normal sounds.

Signal decomposition using discrete wavelet transforms and feature extraction
Wavelet transform provides time-frequency representation of a signal. Discrete wavelet transform can be written as [17], where ψ is the wavelet function or the mother wavelet. j is a positive value defines the scaling and b is a real number that defines the shifting. Two mother wavelets namely Haar and db7 have been used for the signal decomposition. The decomposition of the signal using discrete wavelet transform involve convolution operation given as, where [ ] is the discrete mother wavelet and in this case, it is the high pass filter and [ ] for low pass filter. After the pre-processing stage, the signal was decomposed at seven levels using discrete wavelet transform to obtain prominent information at different frequency bands. The signal was passed through a high pass filter and a low pass filter followed by subsampling by 2. The detail, ( ) and approximation, ( ) coefficients were obtained after the subsampling through high pass and low pass filters, respectively. The process is repeated for successive level until the desired level as shown in Figure 4. There are three frequency bands ( ) that have high amplitude of detail coefficients greater than 1. These bands contain most information about the signal. Although the amplitude for and D7 was not so high compared to , and , they will be included in the features extraction as some information of the crackles may contain in these frequency bands. The frequency content for crackles is 100 to 2000Hz or higher [5]. Therefore, five frequency bands ( ) were selected for feature extraction. Mean, standard deviation and maximum power spectral density (PSD) of detail coefficients of these five bands were calculated using MATLAB. These bands have frequency range from 86.13 Hz to 2756.25 Hz.

Classification using ANN and performance evaluation
Neural networks consist of nodes which inspired by the neurons in the human brain of nervous system [18]. The network is built based on three layers namely input, hidden and output connected via nodes. The hidden layer can be more than one. Every node in the hidden layer is connected to all input (features) and on the other side of the node is connected to all output (class). Each connection between the input and the node carry a weightage. Aggregation of the input multiplied with the weightage is fed into activation function to obtain a new value as input to next layer. The most common activation function is sigmodal function. In this study, ANN has been employed as classifier using MATLAB to classify inputs into a set of target output that were initialized as matrix [1 0] for normal and [0 1] for crackles. In the training stage of the neural network, back propagation algorithm was used to adjust weights in the network if the predicted output does not match with the target output.
The Neural Network used in this study is multilayer feed forward neural network (MLFNN) trained with Backpropagation (BP). The optimal goal of backpropagation is to have minimal error that is relative to having outputs closer to the target. Assuming the data inputs are represented by Xi and weights by W; the detailed explanation is based on the Figure 5.
The neurons in each layer are fully connected to the neurons in the next layer, from layer i to j to k. Suppose that the network is designed with only one hidden layer neurons and generate only one output. Wij is the weight that connects the ith neuron from input layer to the jth neuron in the output layer, whereas Wjk is the weight that connects the jth neuron from hidden layer to the kth neuron in the output layer. In the BP algorithm, the generalization of delta rule involves two phases, which are the forward phase and the backward phase [19].  Figure 5. Signal Multilayer Perceptron with Backpropagation The forward phase: For hidden layer output, consider where b j is the bias of the hidden node and can be set to zero, Φ is the sigmoid activation function. For output layer k, the network output is given as; where ∑ The backward phase between output and hidden layer: The backward phase includes the calculation of the signal error and the weight update of the network. The network error "E" is developed as follow: (8) where t k is the desired output and O k is the output of network which the output of the output layer. The objective is to find the set of parameters that minimize the sum of the squared of the error function, where the average sum squared error of the network is defined as; where N is the total number of training pattern, E is the error function to be minimized. The network weight update between the hidden layer j and output layer k is given by; where (11) η is the learning rate, is the gradient of the cos function. The backward phase between hidden layer and input layer: Adjusting between hidden layer and the input layer by:

Performance evaluation
The output of the algorithm was evaluated using the value of true positive (TP), true negative (TN), false positive (FP) and false negative (FN) to determine the sensitivity, specificity and accuracy using a suitable statistical analysis as shown in Equations (14), (15) and (16) [20]. (14) (15) (16)

RESULTS AND DISCUSSION
Most classification problems can be solved by only two hidden layers in ANN architecture [18], [22]. In this study, two hidden layers were employed in the architecture to classify between crackles and normal respiratory sounds. There were sixty crackles and sixty normal sounds used as samples. The samples are randomly divided into 70% for training, 15% for testing and 15% for validation. Eleven ANN models with a different number of nodes were chosen to be used in the training, validating and testing. For every chosen node, the data was retrained five times and the classification results for training, validation, and testing are tabulated in Table 1. This is based on the best result obtained for testing.
As can be seen in Table 1 and Table 2, the best classification percentage was obtained when using 15 nodes and 10 nodes at the hidden layer for db7 and Haar, respectively. From the obtained results, db7 manage to get 100% correct classification for both test and validation stage which was very good as it achieved the perfect optimization. While for Haar, it only can obtain 100% correct classification at either validation or test stage. From the percentage obtained, test stage was the most important criteria that need to be a focus on because it helps in assessing the performance based on generalization and predictive power. The number of epoch for both db7 and Haar was not more than 20 which means it shows a good performance. The lower the number of epoch, the better the performance and the quicker it can achieve the best optimization.
For evaluation on classification performance, the percentage of sensitivity, specificity and accuracy was calculated based on Equations (14), (15) and (16) using the value of TP (correctly classified as crackles), TN (correctly classified as normal), FP (incorrectly classified as crackles) and FN (incorrectly classified as normal) as tabulated in Table 3 and Table 4 for db7 and Haar, respectively. From these tables, db7 based shows a better performance than Haar with 100% sensitivity, specificity and accuracy for testing and validation stages. As for Haar, only testing stage shows the perfect 100% for all sensitivity, specificity and accuracy. However, the percentage of accuracy for Haar was still good for classification of crackles and normal sounds.

CONCLUSION
These preliminary results towards the development of screening method for lung cancer using computerized have resulted with positive outcomes. Nevertheless, other factors such as age, smoking habit, ambient air pollution and occupational exposure need to be considered when interpreting the results in future. These factors can be added as features to the classifier. In this study, normal and crackles respiratory sounds have successfully been classified using ANN with backpropagation consists of two hidden layers. Both mother wavelets, Haar, and db7 can provide distinctive pattern needed as features in learning algorithm with some statistical and signal strength formulation such as mean, standard deviation, and PSD.