Hybrid bat-ant colony optimization algorithm for rule-based feature selection in health care

Received May 27, 2019 Revised May 31, 2020 Accepted Jun 17, 2020 Rule-based classification in the field of health care using artificial intelligence provides solutions in decision-making problems involving different domains. An important challenge is providing access to good and fast health facilities. Cervical cancer is one of the most frequent causes of death in females. The diagnostic methods for cervical cancer used in health centers are costly and time-consuming. In this paper, bat algorithm for feature selection and ant colony optimization-based classification algorithm were applied on cervical cancer data set obtained from the repository of the University of California, Irvine to analyze the disease based on optimal features. The proposed algorithm outperforms other methods in terms of comprehensibility and obtains better results in terms of classification accuracy.


INTRODUCTION
Different technologies and considerable data have been applied in different fields. In the medical field, data mining plays an important role in the prediction of diseases. Data mining uses machine learning, artificial intelligence, and statistical power to develop highly accurate predictive models for critical domains [1]. Cancer is a topic of interest in the medical field and the primary cause of mortality worldwide [2]. Cervical cancer is the fourth most widespread form of cancer in women [3] and seventh among all known cancers. The spread of cervical cancer is induced by the changes in genes that dominate the growth and split function of cells. This cancer spreads to other parts of the body, such as the lungs and the abdomen, without exhibiting any symptoms. Thus, no signs can be observed in the early stages of the disease and symptoms including pelvic pain, back pain, fatigue, broken bones, leg pain, weight loss, and vaginal bleeding, appear in the late stages.
Moreover, no routine screening method for cervical cancer is being practiced especially in low-income countries. Cervical cancer in such countries is prone to spread rapidly because of the increased exposure to several cancer risk factors, including human papillomavirus (HPV) [4], and some other factors, such as contraceptive use, cigarette smoking, and the number of pregnancies. In fact, doctors must know the factors that cause the disease. Therefore, in the diagnosis of cervical cancer, the extraction of relevant features is important. The chances of cervical cancer increase two or three times when the person is smoking or infected with HPV [5]. The rate of cervical cancer in women who use contraceptives is three times higher than those who do not use such, and women who have used contraceptives for more than 10 years increase their risk for contracting cervical cancer fourfold. Moreover, the incidence of cervical cancer in women who have not experienced pregnancy and are infected with HPV is lower than in women who have experienced at least one pregnancy. Therefore, extracting the relevant traits is important in diagnosing the risk factors that cause the disease. Diffusion-weighted imaging and magnetic resonance imaging can detect cervical cancer at a certain stage [6,7]. The Pap test [8] is a popular and preferred method of screening cervical cancer. However, people have low awareness of routine screening in developing countries. In addition, limited medical expertise and the lack of medical equipment increase the mortality rate of cervical cancer in low-income countries.
In this paper, BA is applied for FS to progress the performance of the ACO-based classification algorithm to analyses the cervical cancer data set from the repository of the University of California at Irvine (UCI). This work shows that the ACO method can classify cervical cancer. Combining the ACO with the bat algorithm could reduce the computation burden and enable the extraction of highly correlated risk factors.

BAT ALGORITHM FOR FEATURE SELECTION
Several swarm intelligence algorithms have been used in the attributes selection [9][10][11][12][13][14][15][16][17]. Unfortunately, no single stable strategy exists to reduce the burden of computing and extracting highly correlated risk factors to the data and improve the classifier performance and achieve high accuracy. Swarm intelligence algorithms are important in solving problems regarding attributes selection. However, these algorithms are limited. Bat algorithm was used for attributes selection. The results specified the suggested algorithm outperforms other algorithms [18]. Bat algorithm can be a valuable option to solve this problem for high dimensional data. The best subset of attributes from different data sizes is selected. The BA and the occasion technique for FS are discussed in the next section.
Bats are excellent and advanced in terms of echolocation. Bats can make a distinction between prey and barriers. This remarkable characteristic has brought it to researchers in terms of its use in various fields. Bats emit a high and short pulse of sound and wait for the sound to hit a certain object. In a brief span of time, the echo returns to the ears again. Through this way, the bat can calculate how far this object is. In addition, bats possess an amazing treadmill mechanism that makes bats capable of distinguishing between prey and obstacle and chasing in complete darkness. Based on the bat's behavior and its ability to track the prey in the dark of darkness. Yang [19] developed an interesting and new idea called the bat algorithm. The technique of meta-optimization is best known. The technique has been improved and developed using its echolocation capability to track food, prey, and barriers. The bat algorithm deals with three rules, they are: -Bats use echolocation in sensing space. Bats can differentiate between danger and food.
-Bats (bi) fly randomly with velocity (vi) at position (xi) with a fixed frequency (fmin), with varying wavelength '(λ)' and loudness '(A0)' to search for preys. Bats (bi) can automatically set the wavelength (or frequency) of their emitted pulses and adjust the rate of pulse emission r ∈ [0, 1] depending on the proximity of their target.; and -Loudness can vary in many ways. Yang (2010) assumed that the loudness varies from a large (positive), 0 to a minimum constant value . The algorithm (BA) proved to be more efficient than the PSO and genetic algorithm [19] because the algorithm deals with the impressive advantages of PSO and genetic algorithm and because BA has the capacity of frequency tuning and automatic zoom because of flexibility [20]. First, the frequency " ", velocity " ", and position " " are initiated for every bat" . Every time phase ", the highest digit of reiteration, "a" virtual bat's activity is set by updating the position and velocity using (1) to (3) as follows: where "is the randomly created digit during the period [1,0]. The coefficient" ( )"is equivalent to variable" "for bat at time phase . The feedback of "in (1) is used to observe the scope of the motion of the bats and pace. A variable "performs the existing the global best solution (position) for the rule variable ,"which is compared with all the solutions done by the "bats. The variability of the potential solution is derived. Yang [19] proposed to use walks randomly. Most of the time, one solution is chosen among the best solutions. Then, the casual walk is used to create a new solution to every bat that takes the case in Line 5 of Algorithm 1: in every ( )"performs the rate tune of all bats at time , and ∈ [−1, 1] power of the random walk and attempts the direction. Every iteration of this method, the emission pulse rate "are updated and the loudness , as follows: " where "and "are ad-hoc constants. The first step of the method involves the measurement of loudness (0). The emission rate (0)"is often times randomly elected. In any event, (0) ∈ [1, 2] and (0) ∈ [0, 1]. Wrapper FS has been popularized by [21]. The technique differs from filter FS in terms of usage of the learning algorithm. Wrapper FS relies solely on maximizing prediction accuracy as produced by the learning algorithm. A learning algorithm with the optimization that uses the Wrapper approach incorporates an optimization tool and evaluates a model, whereas the filters approach is similar to wrappers in the search approach, but instead of evaluating against a model, a simpler filter is evaluated. Thus, inductive algorithms are used by wrapper methods as the evaluation function, whereas filter methods are independent of the inductive algorithm [22]. In the context of FS, the Filter approach is faster but less accurate and computationally intensive than the Wrapper approach [23]. The Wrapper approach is one of the most widely used approaches because of its adequate results and efficiency in handling large and complex data set as compared to the Filter approach [24]. However, an expensive technique involves a complex process of building a classifier with hundreds of items to evaluate one feature subset and dispensing huge numbers of features [25,26]. Searching in feature space influences the performance of the wrapper technique, especially its quickness to find the best subset features to avoid an exhaustive search. The wrapper FS approaches which are used in this paper include three popular strategies: a) forward selection, b) backward elimination and c) stochastic search. Forward selection evaluates from no features until all features have been considered. Backward elimination starts with all features. Stochastic approaches totally depend on the specific searching strategy of the algorithm. For instance, in a genetic search that utilizes GA approaches, each state is defined by a feature mask so that a genetic operation can be performed (such as crossover, and mutation) [27].

RULE STRUCTURE BASED ANT-MINER ALGORITHM
The ACO algorithm is the main point of this study. Its work is based on the following suggestions. Each ant track follows a nominee solution to an issue. The ant tracks the path wherein the volume of pheromone deposited is proportional to the quality of the candidate solution conformable to the target problem. The path wherein the pheromone is highly concentrated is considered the first path, which means the priority path of an ant. The ACO uses different ants to search for all candidate solutions and converges to the optimal or near-optimal solutions. Lopes, Parpinelli et. al [28] were the first to suggest the use of ACO and a system called ant-miner for the detection of classification rules. The ant-miner algorithm [29] detects a set of IF-THEN rules of data in the form of IF <Term1 AND Term2 AND ...> THEN <Class> in the data mining task. In the base of the preceding part, each term is a triple attribute, operator, and then value. In the field of each attribute, the value is the potential value in this field. Only a '=' operator is used in this task, such as <Day = Sunday>. The portion of the class prediction is determined only if the expected features of all terms are met in the previous section. Set rules, created by this algorithm, cover all or almost all training cases. As a result, these rules have a few terms. For data mining, a few numbers of rules are considered good.

RESEARCH METHOD
The proposed experimental framework consists of five steps. In the first step, cervical cancer data sets are selected which used to test the performance of proposed algorithms. The number of attributes and class are defined in Table 1. The second step in this proposed framework, where the test data is subsets of the original dataset used to be trained using the Ant-Miner classifier to get the accuracy prior to the feature selection process. The experiment work is established in the third step to select the best subsets of features using BA. The fourth step shows the test path to execute the prediction model. The fifth step is used to measure the results. The framework of the prediction model shown in Figure 1.  The raw data is loaded from the UCI repository [30]. The data set is represented by 32 risk factors, including historical medical records, patient habits, and demographic information, as shown in Table 1. Four target markers, namely, Hinselmann, Schiller, Cytology, and Biopsy, also exist. The Hinselmann test indicates to conduct colposcopy using acetic acid. Concurrently, colposcopy using Lugol iodine is implied in the Schillers test, Cytology, and Biopsy. Some patients did not answer the entire questions for privacy reasons. Therefore, missing values appear in the data, which must be pre-processed. b. Pre-processing Layer: a primary phase in data mining and machine learning is the pre-processing phase of the data. This phase executes some calculations, such as data cleaning (eliminating annoying data, filling missing values), data reduction and data conversion (aggregation, normalization). The purpose of this step in this study is to conduct data initialization to reach the quality required by the classification. The high quality of information makes the decision process good. The data are initialized by replacing the missing values and converting the data from numerical values into nominal values by using the Weka tool. Two risk factors are discovered, namely, 27 sexually transmitted diseases (STDs): Time since the first diagnosis and 28 STDs: The data involving the time since the last diagnosis are deleted after the pre-treatment that because of the lack of available values. Finally, the analysis of cervical cancer data would be from 30 features in 858 patients. c. Training Layer: in this layer, the data sets are classified into training and test groups after pre-processing, depending on the number of folds in the cross-validation. In this study, after the pre-processing layer, the experimental analysis is conducted on a data set for four targets, namely, Hinselmann, Schiller, cytology, and biopsy. The data are loaded to the proposed method, which is programmed already in Java, Eclipse and run on a computer Intel (R) CoreTM i5 Duo CPU @ 2.40 processor and Windows 10. The parameters used in the proposed method are Cross_validation=5, Number_of_ants=30, Min_cases_per_rule=5, Max_uncovered_cases, and No_of_rules_converge = 10. d. Experimental Layer: In this layer, a Binary BA is used as the heuristic method to improve the effectiveness of Ant-Miner. Each bat's situation in the search distance encodes and performs a subset of attributes. Thus, for each subset, an Ant-Miner classifier is trained in one part and evaluated over another part unseen during the training to observe the fitness value of each bat. Training and evaluating subsets may be performed several times between bats because each bat may encode several subsets of attributes This layer consists of the ACO-based classification algorithm that generates output from the training group and determines the test status. The algorithm implements calculations on the data set and generate the results. The test cases and the training package are classified by using a five-fold cross-validation method. Ninety percent of the training data and 20% of the test data are used in each fold test. After the pheromone initialization, numerous bases are created in the repeat loop. The procedure is continued with the pruning, base, and the pheromone update method. When the ants build the same rule consistently more than once (No_Rule_Converg) or the_number_of_ants equals the_number_of_rules, the loop will stop. In the list of rules, the best rule will be added when the inner loop "Repeat-Until" is completed As a result, all training cases provided for in this rule will be removed from the training package. Pheromone is initialized again. The external loop controls the session responsible for configuring this pheromone. For the "Repeat-Until" loop, a limit more than the number of indeterminate training sessions is called Max_uncovered_cases. e. Performance Analysis Layer: performance analysis is used to measure the results extracted from the experiment. This layer validates the results based on different performance analysis operators, such as rule generation number, number of terms per rules and accuracy.

RESULTS AND DISCUSSION
In this paper, the experiment started with test the effectiveness of BA for FS on the size of features in the cervical cancer dataset. This part discusses the experimental results of cervical cancer datasets from UCI repository with selected features using BA and measured by ACO classifier. Table 2 shows the characteristics of the cervical cancer dataset and the number of selected features for each target in the dataset by using BA. Observation from the result in Table 2 shows the highest reduction is coming from cervical-cancer (Hinselmann), five features selected followed by cervical-cancer (Cytology), seven features selected and cervical-cancer (Schiller), eight features selected. cervical-cancer (Biopsy) shows the lowest reduction of attributes dismissed with only nine attributes selected. The result also shows that BA has reduced more than 50% of the number of original features in the cervical cancer dataset. Since the efficiency of FS in classification is undetermined. The classification performance is measured according to three criteria: rule generation number, number of terms per rules and accuracy as shown in Table 3. The experiment results show five features out of 30. In the Hinselmann test, the accuracy of the Ant-Miner classification algorithm yielded a rate of 95.93%, with 7.4 as the rule number and 9.6 as the number of terms per rule. Schiller's test shows eight features out of 30. The ACO classifier achieved 90.91% accuracy and the rule number is 7.1 with 9.9 the number of terms per rule. In addition to perfect diagnosis indexes, the cytology test shows seven features out of 30 with 94.88% accuracy from the classifier of ACO, 8.4 for rule number and 9.5 for the number of terms per rule. The biopsy test leads to different detection results, higher than the three previous tests in the features, where 9 features of out of 30 were shown with 95.8% accuracy from the classifier. The rule number shows 8.5, and the number of terms per rule is 9.7. The classification performance results are compared with the other approaches [31,32] that applied on the same cervical cancer data set without the use of FS techniques, such as SVM, as shown in Table 4. Furthermore, the comparison with the Ant-Miner after using BA for FS shows an increase in the accuracy and decrease in the number of terms rules with a small increase in the number of rules as shown in Figure 2 and Figure 3.

CONCLUSION
Patients who have undergone cervical cancer screening tests provide accurate information to physicians who use this information to detect, understand, and assess the symptoms of the disease. In this paper, a hybrid bat and ant colony optimization-based classification algorithm was proposed to analyze the cervical cancer data set. This study is the first to use metaheuristic algorithms for cervical cancer diagnosis. Patient examination data are composed according to certain criteria by using hybrid algorithms and data involving 858 patients. The variable targets are Hinselmann, Schiller, Cytology, and Biopsy. By comparing the proposed model with other works, the hybrid algorithms detect understandable rules with a high degree of classification accuracy depending on the least features selected. Thus, these algorithms are suitable for decision-making in the medical field, especially in the identification and detection of the risk factors that cause the disease.