Hybrid feature selection method based on particle swarm optimization and adaptive local search method

Malek Alzaqebah, Sana Jawarneh, Rami Mustafa A. Mohammad, Mutasem K. Alsmadi, Ibrahim Al-marashdeh, Eman A. E. Ahmed, Nashat Alrefai, Fahad A. Alghamdi Department of Mathematics, College of Science, Imam Abdulrahman Bin Faisal University, Dammam, Saudi Arabia Basic and Applied Scientific Research Center, Imam Abdulrahman Bin Faisal University, Dammam, Saudi Arabia Computer Science Department, Community College Dammam, Imam Abdulrahman Bin Faisal University, Dammam, Saudi Arabia Computer Information Systems Department, College of Computer Science and Information Technology, Imam Abdulrahman Bin Faisal University, Dammam, Saudi Arabia Department of MIS, College of Applied Studies and Community Service, Imam Abdulrahman Bin Faisal University, Dammam, Saudi Arabia


INTRODUCTION
Machine learning has become more prominent recently in many research fields, and this is due to the fast data growth and the need to meaningfully use them. Machine learning concerns discovering useful information from huge data using some machine learning techniques including anomaly detection, classification, and clustering [1,2]. Accordingly, dimensionality can impede the machine learning process as it incurs high computational cost. Dimensionality is a major issue in machine learning, especially concerning datasets. A dataset comprises a set of examples representing information on a particular case in feature form, and dataset can have substantial dimensionality, aside from carrying features that are irrelevant and redundant, and noise of high level. Such a huge number of features could not be handled by traditional machine learning methods. Feature selection is therefore vital as a preprocessing phase as it decreases data dimensionality while also removing duplicating and useless features in the dataset [2][3][4]. Feature selection process aims to obtain the optimal set of useful features while maintaining good accurateness in representing the initial features of the dataset. In this regard, classification involves determining the class value of each sample from the available class pool [5,6].
Feature selection techniques are divided into three categories according to the strategy of selection as follows: Filter, wrapper, and embedded techniques. Filter approaches do not require subsequent learning algorithms [7,8], while wrapper techniques require the use of a learning algorithm [9,10]. When compared with filter approach, wrapper approach possess more computational costs aside from showing an over-fitting risk. However, in embedded techniques, the features selection method is embedded within the model (s) training process [2,4,11], followed by the generation of an ideal group of features through the optimization of the objective function. Among the three mentioned types of feature selection, wrapper methods are chosen in this paper.
Metaheuristics optimization algorithms have shown good performance in the search for an optimal solution. Also, these algorithms are easy to implement and can solve a wide range of problems [12]. Among these metaheuristics algorithms are algorithms that are based on swarm intelligence. Swarm intelligence algorithms study the behavior of a collection of agents in self-organized societies, i.e. bees, ants, birds, and moths [13][14][15][16][17]. Techniques based on swarm intelligence have been widely used as a wrapper method for feature selection [18], for instance, bees algorithm [19], ant colony optimization (ACO) [20], butterfly optimization algorithm [21] and moth optimization algorithm [22].
Particle swarm optimization (PSO) was advised in Kennedy and Eberhart [23]. Such an algorithm relies on the behavior of social organisms that live in groups, as exemplified by birds and fish. PSO mimics the interaction between members in information sharing, and the application of PSO can be observed in various optimization domains and also together with other algorithms. To combine their advantages, filterwrapper grounded upon the PSO feature selection technique was introduced in [24]. The filtering measure is applied in encoding the location of every particle, while the classification accuracy is utilized for the fitness purpose. As can be shown from the experiments, the suggested method was marginly better than binary PSObased filtering method. On the other hand, the suggested method was yet to be compared with any wrapper algorithm and compared to the filter algorithm, the wrapper algorithm is generally superior in terms of classification performance.
In dealing with the FS problem, Ibrahim et al. [25] suggested hybrid optimization technique that comprises a combination of a slap swarm algorithm (SSA) and PSO. This combined method was called SSAPSO. The authors reported that this method improved the effectiveness of the exploitation and exploration phases [25].
PSO and firefly (FF) techniques were hybridized and called PSO-FF in [26] for the FS problem in the examination of childhood's normal "teratoid/rhabdoid" tumor (AT/RT) in brain MRI images and "hemochromatosis" in computed tomography (CT) images of liver. Meanwhile, in [27], the authors demonstrated the application of hybrid bio-inspired technique to the FS process. This proposed method is grounded upon 2 swarm intelligence techniques namely PSO and ACO. For the FS problem also, tabu search (TS) was combined with binary particle swarm optimization (BPSO) in [28]. In this study, BPSO functions as a local optimizer, whenever TS is executed for a particular generation in cancer classification, during gene expression. Somehow, the use of this approach is based on the smallest number of features which means that it may not be representative of the entire dataset. As such, the problem of the solution being stuck in local points may occur. Relevantly in [29], the application of a hybrid method comprising ACO, bee colony optimization (BCO), genetic algorithm (GA) and fuzzy C-means was demonstrated, with the aim of features selection from the mammogram images.
In the current research, the PSO technique is combined with an adaptive local search approach to quickly attain suitable solutions for the problem by combining the advantage of the exploration provided by PSO algorithm and the exploitation ability by the provided by the local search method. PSO algorithm ensures the diversity of the solutions while the adaptive local search method exploits the solutions to obtain This combination increases the flexibility of the PSO algorithm to enhance the capability to exploit the solutions in searching space while the possible ideal solution can be quickly found. The proposed adaptive local searching method is relies on the late acceptance hill-climbing algorithm [30,31], and this method is free from parameter tuning, whereby the parameters are tuned through the search of PSO algorithm which makes the algorithm more fixable. PSO algorithm sends the solution to be exploited together with the current iteration and the number of tries used to improve the solutions. Among the most significant features of the proposed algorithm is that it takes advantage of population-based algorithms that preserve the diversity of the solutions and local search algorithms that exploit the solution very fast. The results generated by the suggested algorithm are contrasted against the traditional PSO algorithm and with other contemporary approaches.
The structure of the article is as follows: Section 2 details the standard particle swarm optimization (PSO), followed by section 3 that examines and elaborates the suggested algorithm namely "particle swarm optimization with adaptive local search method". Then, section 4 reviews the empirical results, and section 5 reviews the study conclusions along with several suggestions to be considered in future studies.

PARTICLE SWARM OPTIMISATION ALGORITHM
The PSO algorithm is created by Eberhart and Kennedy in [23], and this algorithm mimics the communication behavior of a group of agents, for instance, birds flocking and fish schooling. In the PSO algorithm, a group of agents denotes the solutions (particles) of the problem and the swarm represents a population of solutions.
PSO algorithm begins by generating random solutions for each particle and assigning them an initial velocity. Particles travel within the searching space in order to search for the ideal solution. Here, the location of every particle is updated based on its knowledge and its adjoining particles. As the particles are moving, their current position i is symbolized by a vector ]. Further, the best past position of a particle is documented as the personal best and it is symbolized as pbest. Accordingly, the best location achieved by the swarm is called "global best" or "gbest". PSO searches and finds the ideal solution by updating the particles using (1) and (2) is used to calculate the moving velocity as follow [23]: In which, t symbolizes the t th iteration in the evolutionary process; d ∈ D symbolizes the d th dimension within the searching space; signifies the weight of inertia; c1 and c2 denote the acceleration constants; r 1i and r 2i are random values dispersed homogeneously in [0, 1]; and p id and p gd symbolize the elements of "pbest" and "gbest" in the d th dimension. Figure 1 depicts the pseudo-code of PSO.

PROPOSED PSO ALGORITHM WITH ADAPTIVE LOCAL SEARCH 3.1. Solution depiction
The solution of features selection is depicted as a vector of N feature (number of features within a data set), and the contents in this vector are either 0 or 1, whereby 0 means unselected feature and 1 means selected feature. PSO algorithm changes the values in the vector to improve classification accuracy; this study uses classification accuracy as an objective function to be maximized. Accordingly, the classification algorithm used in the present study is discussed in the ensuing section. The following figure shows the representation of PSO algorithm for feature selection. For demonstration purposes, suppose that we have a solution for a dataset with 5 features; the selected features are first and third, and hence, the solution will be [1,0,1,0].

Adaptive local search
The local searching method works relies on "Late acceptance hill-climbing" (LAHC) [16,30]. LAHC algorithm works based on memory (list) with length (L) to save the objective values of the solutions produced during the search. The acceptance of any new coming solutions depends on the assessment of the new solution with the last one saved within the list at the L th step. The worst solution is accepted providing that the value of the possible solution is equal to or better than the value within the list L of index v (virtual starting of the list). v is computed by dividing the current number of the iterations I by the length of L (e.g., see figure line# 12), and after that the value in L of index v becomes the possible solution. Otherwise, the worse solution will not be accepted. In this regard, the "physical" list stays static. However, its "virtual" beginning v is dynamically computed as a division reminder of the number of iterations I by the length of list L (v=I mod L). The full pseudocode of late acceptance hill climbing is presented in Figure 2 [30]. Figure 2. The pseudo-code of late acceptance hill climbing [30] The pseudocode of the offered algorithm is depicted in Figure 3. In our proposed algorithm, the stopping condition is set by counting the idle steps (idelsteps) or the maximum iteration number is attained, where the idle steps is increased by one if the algorithm couldn't further improve the local best solution (Pbest) see Figure 3 line# 7. The adaptive local search (ALS) is performed if the random number between 0 and 1 is greater than 0.75. This percentage was chosen experimentally to avoid applying the local search for every solution and to avoid long processing time. Another condition is applied to ensure that ALS is applied when the solution is getting stuck in local optima, or when no further improvement is possible, a worse solution is accepted in this stage to skip from getting stuck in local optima see Figure 3 line# 15. Three parameters should be provided to LAHC algorithm; the iterations number (NumOfIte), the list size (L), and the solution (x i ) see Figure 3 line# 17.
The adaptive local search method uses an adaptive method to set these parameters as follows: First, the number of iterations (NumOfIte) is calculated by multiplying the of idle steps number (idelsteps) with the current number of PSO iteration (PsoIter). This is to provide more iterations in local search at the end of a search of PSO to promote more iteration at the final stage. Secondly, in terms of list size, a list from the PSO

EMPERICAL EVALUATION RESULTS AND DISCUSSION
This part of the article looks into the effectiveness and strength of the suggested PSO with the adaptive local search algorithm (PSO_ALS). Further, this study compared PSO_ALS with other populationbased algorithms, and among the algorithms compared in this study include GA, MFO and FFA. Accordingly, the tests carried out in this study involved the use of 8 datasets comprising various characteristics.
The following Table 1 presents the eight datasets utilized in this study. These commonly used standard datasets were obtained from the UCI data source [32], and in fact, they have been used in several well-confirmed studies. Among the primary attributes of these datasets are as follows: number of attributes (features), number of examples (Instances), the number of possible class values. Table 1 shows the details.
For the purpose of this work, the instances in the datasets were splitted into two groups of testing and training. In specific, 80% of the instances were utilized in training, while the other 20% were used in testing. The use of this division was proposed in Friedman et al. [33]. The runs and experiments were performed using a system with the following specifications: Intel CPU i5-5200U 2.2 GHz and a RAM of 8.0 GB. The parameter values of the suggested algorithm are depicted in Table 2. Accordingly, these values have been identified based on the results obtained from 10 warming up experimental runs, and as can be observed, Table 2 shows better settings of the algorithm's parameters, which generate better accuracy.   Table 3 displays the number of selected features (NF), the best-attained accuracy (ACC) utilized in the comparison between PSO_ALS algorithm and other states of the art algorithms namely GA, MFO, FFA, and PSO. Average Accuracy results of GA, MFO, FFA, and PSO were compared with those of PSO_ALS. The results in Table 3 demonstrate the superiority of the PSO_ALS algorithm when contrasted with other techniques in terms of attained accuracy with 75%, and comparable (same accuracy) with 12.5%.
Also, GA failed to obtain superior accuracy result in comparison with MFO and PSO algorithms which obtain 1 same Accuracy result for WDBC dataset. Table 3 also shows that the performance of PSO-ALS algorithm supersedes other algorithms when it comes to the number of features just in 2 datasets. However, MFO algorithm attains the best outcomes in 4 datasets. Results of average Accuracy by the same algorithm are also displayed in Table 3. As can be observed, the PSO_ALS algorithm proposed in this study achieved six best average results, particularly in the following: German, heart, WDBC, Parkinsons, SpectF and WBC datasets. Meanwhile, PSO shows the best average in Ionosphere dataset, while MFO shows the best average in the Sonar dataset. The details are shown in Table 3, whereby the accuracies with the highest average are bolded in the table above.
The significance of the obtained results can be determined through the Mann Whitney test as was demonstrated by McKnight and Najab [22]. Table 4 accordingly shows the Mann Whitney statistical test's pvalues according to values of suitability. From these statistical tests, the spotted differences and improvements are proven to be considered meaningful. The excellence of PSO_ALS regarding average accuracy over other comparable algorithms is proven in Table 4. As the table is showing, the proposed algorithm is significant statistically for most of the cases excepting some cases.  Table 5 presents the Mann Whitney test's levels of marginal significance (p-values) based on the features number. As can be seen from the table, the observed differences between algorithms of PSO_ALS and GA for all datasets are significant statistically and for most other competitor methods are significant statistically excluding for the PSO algorithm. Figure 4 shows a comparison for the best accuracy between the GA, FFA, MFO, PSO, and PSO_ALS algorithms, and can be realize that PSO_ALS clearly achieves the best results in most datasets.   Figure 5 which demonstrates the behavior of PSO and PSO_ALS for the Heart dataset. As shown in Figure 5(a), the solution is not improved from iterations number 110 to 180. Contrariwise, in Figure 5(b) graph for PSO_ALS applied to a similar dataset, the algorithm converges smoothly and generates superior results because the behavior of the adaptive local search accepts the worst solution to skip the algorithm out from local optima. Further, in the PSO algorithm at iteration# 120, the algorithm demonstrates its ability in generating an accuracy of 87%. However, in PSO_ALS similar accuracy is produced after Iteration# 170. Thus, it can be said that the application of the adaptive local search.  In accepting candidate solution that has less accuracy in order to jump out from local optima and balance the local intensification and global diversification of the search, that has been seen from the final accuracy produced by both algorithms. For most of the datasets considered in this research, the technique of PSO_ALS yielded equal or improved results of accuracy and selected features number. Somehow, it should be noted that not all the differences observed are significant statistically when compared to other competitors. Accordingly, Table 6 shows a comparison of the best results obtained by PSO_ALS algorithm against some solutions reported in the literature, involving the use of eight tested datasets. For the purpose of this study, accuracy is utilized as the main goal when comparing the performance of the algorithms. In Table 6, the highest Accuracy is shown in bold. Table 6 shows that PSO_ALS proposed in this study achieved values that are highly comparable to those of most competitors in term of accuracy. However, PSO_ALS shows superior performance in some datasets, when compared to other algorithms.

CONCLUSION AND FUTURE WORKS
In the current article, the application of the PSO Algorithm with the adaptive local search method (PSO_ALS algorithm) was demonstrated for features selection problem. In the current work, the algorithms are employed to a benchmark of 8 standard UCI datasets. The PSO_ALS algorithm results were contrasted against these generated from the four approached found in the literature. The method proposed in this work demonstrated the performance superiority when compared with other equivalent methods, by balancing local intensification and global diversification of the search through the application of PSO algorithm that finds the best global solution within the search space and adaptive local search method in exploring the local search space. The utilization of the adaptive local search technique improves the results of the suggested algorithm. PSO_ALS shows performance that is superior to other comparable approaches, and also to the basic PSO algorithm.