Enhancing feature selection with a novel hybrid approach incorporating genetic algorithms and swarm intelligence techniques

ABSTRACT


INTRODUCTION
Feature selection plays a crucial role in the preprocessing phase of machine learning, it eliminates irrelevant and redundant features (noisy attributes), which increases the performance of a classifier and reduces the computational complexity [1].In the healthcare sector, feature identification and selection play a vital role in enhancing accuracy in prediction, classification, and detection systems.This crucial preprocessing step not only enables reduction of dimensionality but also permits a better understanding of pathologies [2].In an exhaustive search space, the number of possible combinations to determine the most relevant and non-redundant features is 2 n , where n represents the number of features (NP-complete problem) Int J Elec & Comp Eng ISSN: 2088-8708  Enhancing feature selection with a novel hybrid approach incorporating … (Salsabila Benghazouani) 945 [3].Numerous feature selection algorithms have been suggested in the existing literature, and they are generally classified into three groups: filter algorithms, wrapper algorithms, and integrated algorithms [4].
The filter approach is independent on the learning algorithm and uses information-theoretic measures to assess and classify the features [5].The advantage of this method is represented in terms of computation efficiency and it is power against overfitting [6].By contrast, wrapper approaches employ a learning algorithm to assess subsets of features, which gives high accuracy for classifiers.However, they require a high computation time [7].Integrated approaches integrate the advantages of filtering and wrapping methods.They incorporate the selection of variables during the learning process which allows to reach a compromise between the computational cost and the model performance [8].However, the fundamental difficulty with the filter technique is that the features are chosen autonomously without using the machine learning classifier.While the wrapper technique chooses features using an optimization algorithm and works directly with the classifier [9].Compared to the standard exhaustive search, optimization algorithms offer the advantage of efficiently selecting the optimal subset of features in a reasonable time.Numerous optimization methods have recently been utilized to tackle feature selection (FS) problems, and they significantly outperform more traditional FS techniques.However, no meta-heuristic FS approach surpasses other optimization algorithms in many datasets.Such as, Rostami et al. [10] compared the performance of different swarm intelligence (SI) based feature selection methods on several datasets.The findings indicate that on the support vector machine (SVM) classifier, in the colon dataset, the cuckoo optimization algorithm (COA) outperforms the particle swarm optimization (PSO) method.However, in the isolated letter speech recognition (ISOLET) dataset, the PSO-based method performs better than COA.In study [11], an improved salp swarm algorithm (ISSA) is developed and compared to other swarm techniques.The findings revealed that when employing the k-nearest neighbor (KNN) classifier on the Waveform dataset, the SSA-based method exhibited superior performance in comparison to PSO.Conversely, for the Parkinson's dataset, the PSO-based method surpassed the SSA-based method in terms of performance.
This paper seeks to remedy these limitations by proposing a powerful feature selection approach based on a genetic algorithm (GA) that combines the advantages of various swarm intelligence (SI)-based feature selection techniques.The objective is to efficiently use helpful information from various SI-based feature selection techniques to obtain a better average fitness value and higher classification performance than other optimization algorithms in many data sets from different fields.The suggested feature selection approach has been applied to seven databases from the field of experimentation and publicly available UCI databases (colon, breast cancer Wisconsin, heart, arrhythmia, sonar, ionosphere, waveform).Moreover, the potency of the suggested method was then tested.The remainder of this article is structured to follow: section 2 describes the literature survey and related works.Section 3 details the proposed feature selection method GASI.Experimental results and discussion are shown in section 4, which is then succeeded by a conclusion and future perspectives in section 5.

LITERATURE SURVEY AND RELATED WORKS
A crucial issue for machine learning tasks is the large dimensionality of a data set with huge feature spaces and a limited number of samples [12].Dimensionality reduction is a technique to tackle this issue by removing redundant and noisy features.This improves the classifier's performance and reduces its complexity in terms of computation and memory space.Dimensionality reduction approaches are typically categorized into two groups: feature selection and feature extraction.A reduction based on a data transformation is called a feature extraction, which replaces the initial data set with a new reduced one built from the initial set of features.A feature selection-based reduction chooses the most pertinent features from the dataset.In the following subsection, we briefly describe some feature selection approaches: filter, wrapper, and embedded.
Filter techniques use statistical performance measures to evaluate features and select the best ranking ones; these approaches are not dependent on the learning algorithm [5].The filter methods are categorized into two categories: multivariate and univariate.The univariate approaches assess the pertinence of the attributes to the target class by an assessment criteria like mutual information (MI), information gain (IG), and Gini index (GI) [13].This approach does not consider the interactions between features [3] and is prone to getting stuck in a local optimum [14].The multivariate methods take into consideration the dependencies between features which allows the elimination of irrelevant and redundant variables.Among the multivariate methods, the maximum relevance minimum redundancy approach (MRMR) [15] and the relevance redundancy feature selection (RRFS) method [16].
The wrapper method is based on the learning algorithm to assess the variables to choose an optimum subset of characteristics with high classification precision [3].Although this method uses a classifier and considers the interactions between variables, it remains computationally expensive [17].Generally, a cross-validation mechanism is often used to reduce time complexity and avoid overfitting problems [18].The embedded approach differs from other feature selection approaches.On the one hand, learning algorithms are not employed in the filter procedures.On the other hand, the wrapper approaches utilize a learning machine technique to assess the quality of feature subsets, independent of knowledge about the classification or regression function's specific structure [19].Whereas the embedded approach integrates feature selection into the training process, it uses a machine learning method to seek the best subset of features while assuring a balance between computational cost and model performance [8].Among the learning algorithms that use this concept: decision trees (DT), support vector machine (SVM), and AdaBoost.For example, DT is a tree-based classifier with several nodes and leaves.Each leaf is a label of class, while each node represents a particular feature.The relevance of a feature is determined by its location in the DT.Therefore, in DT-based integration approaches, the tree is first generated using an ensemble of models, and subsequently, the features engaged in the classification are chosen as the definitive subset of features [20].
Feature selection is considered among NP-hard problems due to the number of feasible subsets of variables increasing exponentially with the number of predictors [3].Metaheuristics are approximate search methods often used for NP-hard problems, as they can achieve satisfactory (near-optimal) solutions in a short time [6].Several feature selection methods use metaheuristics to escape local optimum and decrease computational complexity in high dimensional datasets [21].This meta-heuristic is also based on an initially randomly generated population, then a fitness function that assesses the performance of the individual solutions of this one; a new population will be created if any termination criteria are not satisfied.This process is then iterated until one of the end criteria is fulfilled [22].
Genetic algorithm (GA) is an evolutionary computation algorithm that draws inspiration from the Darwinian evolution of biological populations.This well-known approach imitates the mechanism of natural selection, where the most appropriate individuals are selected for the reproduction of the next generation's children.This suggests that GA functions as child chromosomes are produced from their parents' chromosomes.Genetic operators, including crossover and mutation, are among the most crucial components of GAs and play a major part in utilizing the search space to discover novel solutions.While the mutation operators are in charge of creating new information by altering part of it, the crossover operators can search for new solutions using data already present in the population.In GAs, crossover operators are typically used to find novel solutions considerably more frequently than mutation operators.Although during the search procedure, the mutation operators assist in escaping the local optima.Genetic algorithms (GAs) have successfully shown their high ability to solve optimization problems, including feature selection problems [7], and numerous authors have suggested several GAs variants to solve the feature selection problem [4], [23]- [25].In 2016, Cerrada et al., [26] showed that GA could effectively achieve optimal global solutions for problems with large search spaces.
PSO is a robust optimization approach focused on SI, developed by Kennedy and Eberhart [27].The approach is founded on the collective behavior of the shoal of fish and the flight of birds.PSO is utilized in various machine learning and feature selection applications.For instance, in [28], a multi-objective feature selection strategy using gray wolf optimization (GWO) and Newton's law derived PSO is created to reduce the number of chosen features and the rate of classification errors.In [29], a comparison of classification accuracy between PSO and a hybrid method that employs the Harris Hawk optimization algorithm (HHO) for optimizing SVM is performed.In [30], a hybrid meta-heuristic approach combining PSO and adaptive GA operators is introduced.This approach aims to optimize feature selection in machine learning models specifically designed to detect instances of tax avoidance.In [31], a thorough examination is conducted on current classification methods and gene selection techniques.The paper specifically emphasizes the effectiveness of emerging methods, like the SI algorithm, in the tasks of feature selection and classification for microarrays with high-dimensional data.In [32], a proposed system is presented that achieves automatic classification and detection of different pest attacks and plant infections.This is accomplished by employing a combination of radial basis probabilistic network (RBPN) and a genetic algorithm-based particle swarm optimization (GA-based PSO) method.In [33], a feature selection technique derived from PSO is proposed, which incorporates multiple classifiers.This approach utilizes adaptive parameters and strategies to tackle feature selection problems on a large scale, with the aim of improving classification accuracy and reducing computational complexity.
Differential evolution (DE) is a stochastic search algorithm focused on swarm intelligence, proposed by Storn and Price [34].This optimization method was first developed to resolve the Chebyshev polynomial issue, but it also has been demonstrated effective in solving complex optimization issues [35].Zhang et al. [36] presented a multi-objective feature selection method focused on differential evolution and defined a mutation operator to evade local optimum.Li et al. [37] proposed a novel large-scale multiobjective cooperative co-evolution method for feature selection to search for subsets of optimal features efficiently.
Enhancing feature selection with a novel hybrid approach incorporating … (Salsabila Benghazouani)

947
The cuckoo optimization algorithm (COA) is a recent evolutionary optimization approach focusing on swarm intelligence, which is derived from the life of a bird named the cuckoo [38].It is principle is inspired by the behavior of the cuckoo bird in nesting and egg-laying to overcome optimization issues and find the global optimum [39].In [40], a combination of neural network and cuckoo search algorithm is deployed for feature selection in heart disease classification.The firefly algorithm (FA) is an excellent instance of SI, in which underperforming entities collaborate to generate high-performance solutions.In [41], Yang introduced the FA with the basic notion being based on the optical connection between fireflies.
The salp swarm algorithm (SSA) is a recently developed algorithm based on SI that imitates the behavior of sea salps [42].The SSA demonstrated a high performance when evaluated with various optimization issues.In [43], a new SSA and chaos theory combination is suggested to enhance feature selection accuracy.In [44], the dynamic salp swarm approach for feature selection is used to resolve the local optimum issue of SSA and to strike a balance between exploiting and exploring.
The Jaya algorithm (JA) is a recently implemented population-based meta-heuristic algorithm.Roa presented it in 2016 to handle constrained and unconstrained optimization problems.In [45], a novel hybrid feature selection approach is developed, incorporating the binary JA for the classification of microarray data is suggested to seek the optimum subset of features.
The flower pollination algorithm (FPA) is a meta-heuristic optimization technique that centers around the pollination process found in flowering plants.It was introduced by Yang in 2012 [46].The primary goal of a flower is essentially to reproduce through the process of pollination, which involves the transfer of pollen and is frequently aided by pollinators such as birds and insects.

THE PROPOSED FEATURE SELECTION METHOD GASI
This section proposes a novel feature selection approach by integrating genetic algorithms and swarm intelligence-based feature selection techniques incorporating particle swarm optimization, differential evolution, cuckoo optimization algorithm, firefly algorithm, salp swarm algorithm, Jaya algorithm, flower pollination algorithm and other feature selection methods such as SelectFromModel and recursive feature elimination (RFE).The proposed approach GASI is founded on two principal pillars.The primary axis builds an initial smart population composed of the precious results of swarm intelligence-based feature selection algorithms (PSO, DE, COA, FA, SSA, JA, FPA, SelectFromModel, and RFE) that aim to discover the most optimal subset of features.The second axis introduces this intelligent population to the genetic algorithm in order to search for a better subset of features that contains a smaller number of features and improves the classification performance.The architecture of the suggested feature selection approach GASI is illustrated in Figure 1.
In this framework as shown in Figure 1, several SI-based feature selection techniques are applied to a dataset.Then an intelligent population composed of the feature subsets produced by these techniques is fed to the GA in the second step.The GA starts with this population and attempts to converge to the optimal subset of features employing genetic operators.An evaluation is made for each individual in the actual population based on a specified fitness function.A novel population is produced using genetic operations (selection, crossover, and mutation).This method is developed to maximize the classification accuracy and reduce the size of the feature subset.The following subsections describe the proposed GA method.

Encoding of individuals
In this context, individuals are represented using binary arrays consisting of n bits, where n corresponds to the number of features in the original dataset.A bit with a value of 1 in this array indicates the inclusion of the corresponding feature in the subset, while a bit with a value of 0 signifies the exclusion of that feature.This binary encoding method serves as an efficient means of representing feature subsets, enabling algorithms to make decisions about which features to include or exclude during various data analysis and optimization processes.It is a fundamental approach in feature selection and dimensionality reduction tasks.

Smart population with SI-based feature selection
Instead of creating an initial population with a predetermined number of randomly generated individuals, we take advantage of the best solutions obtained by many powerful SI-based feature selection approaches.For that purpose, an intelligent population is constructed from the optimal subsets of features produced by the different SI-based feature selection approaches, with additional randomly generated individuals to keep the diversity of the next generation.This intelligent collection of features will be fed into the genetic algorithm as the initial population to search for the optimal subset of features that maximizes classification performance and reduces the size of the features.

Fitness function
The fitness function operates by simultaneously considering two distinct objectives: the enhancement of classification accuracy and the reduction of the number of selected features.To convert this function into a minimization problem, we introduce weights for each of these objectives.These weights enable the amalgamation of these criteria into a unified representation of the fitness function.Consequently, the fitness function can be articulated in the following manner: In (1) represents the fitness function that assesses the fitness value attached to each individual.Where X is a vector of features illustrating a selected subset of features, α and β in (1) are the weights assigned to each objective, the classification error, and the proportion of features selected, respectively, which fulfill the below conditions in (2) and (3).

949
The parameters α and β, where  ∈ [0, 1] and  = 1 − , are utilized to regulate the importance of classification accuracy and feature reduction.The values of α and β used in previous studies [47], [48] are also employed in the current experiments, with α set to 0.99.In (4) defines the error rate of classification   () that needs to be minimized and is complementary to the accuracy of the classifier defined in (5), while   () represents the proportion of selected predictors in (6), where D is the size of the individual and   is a binary variable that specifies whether the feature I is present or not in a selected individual (7).

𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 =
+   +  +  + (5) ∈ 0,1 where TP, TN, FP, and FN correspond to the number of true positives, true negatives, false positives, and false negatives, respectively.The individuals are ranked according to their fitness value, and the chosen number of individuals with the lowest fitness value are considered the parents of the next generation.

Genetic operators 3.4.1. Selection
Once a population of chromosomes has been created, GA will search for a few pairs of parent chromosomes to apply a crossover operation.To select the parent chromosomes, we employed the Roulette selection approach [49].Every chromosome gets space and place on the roulette wheel based on its fitness.The wheel is rotated after the chromosomes have been placed on it.When the wheel stops turning, a random pointer on it points at the chosen chromosome.The best chromosomes have a large space, implying a high probability of being selected.The likelihood of selecting individual x is proportional to it is fitness and determined by (8): where fitness (xi) is the fitness value of the individual xi.An elitist approach guarantees that the better individual is systematically moved to the next generation with no crossover or mutation.It is essential to maintain the constant convergence of the genetic algorithm.Tournament selection where individuals are randomly selected and compete for survival allows for the inclusion of individuals with lower fitness values, promoting diversity by giving them a chance to contribute to the next generation.

Crossover
Crossover is applied to each pair of chromosomes selected by the abovementioned method with a specified probability Pc.A high probability Pc involves the appearance of new individuals in the population.The crossover is applied by selecting a random point on the chromosome where the exchange of the parent's parts occurs.This process then gives rise to a new offspring based on the selected exchange point with particular parts of the parents as shown in Figure 2.

Mutation
The term "mutation" refers to the random change in the value of a gene on a chromosome.Mutation acts as a background noise that prevents evolution from freezing.It extends space exploration and ensures that the global optimum can be reached.Therefore, this operator avoids converging to local optima.The technique used is a uniform mutation, so each bit of a chromosome has a low probability Pm of being flipped.

RESULTS AND DISCUSSION
This section aims to assess the proposed genetic algorithm in terms of fitness values, feature space reduction, and prediction accuracy.The proposed method GASI uses an initial population based on all the feature selection techniques chosen in this paper (FPA, JA, SSA, DE, COA, FA, PSO, RFE, and SelectFromModel).We have also compared these results with other SI-based feature selection techniques, which would allow a robust empirical study.It is worth noting that even if this genetic algorithm's computational cost is higher than other feature selection approaches, they surpass them in terms of fitness value and accuracy.The rest of this section describes the employed datasets, the used classifiers, the evaluated approaches, the evaluation measures, the results, and the discussion in the following subsections.

Datasets
Based on the World Health Organization (WHO), heart disorders and cancer are the two leading causes of mortality in developing and under-developed countries.Breast, lung, colon, and rectum tumors remain the most commonly diagnosed cancers worldwide [50].For these reasons, we tested the proposed method on seven well-known datasets to evaluate its performance.These datasets are colon, breast cancer Wisconsin, heart, arrhythmia, sonar, ionosphere, and waveform collected from the UCI repository.The description of the datasets is given in Table 1

Classifier description
As learning algorithms, we used two popular classifiers, namely logistic regression (LR) and AdaBoost (AB), to evaluate the proposed method.LR is a method for predicting a dichotomous dependent variable.This approach finds the best fitting model that describes the association between the attributes of the dependent variable and a set of independent variables [51].AdaBoost, an abbreviation for adaptive boosting, is a meta-algorithm for machine learning proposed by Freund and Schapire [52].A classifier AdaBoost is a meta estimator that first fits a classifier and adapts it in multiple instances on the same dataset.Subsequently, the weights of the misclassified samples are adjusted to prioritize severe cases, leading to subsequent classifiers focusing more on them.

The evaluated methods
In the experiments, we utilized SI-based feature selection approaches, specifically FPA, JA, SSA, DE, COA, FA, and PSO defined in section 2. Additionally, we incorporated two more selection methods, SelectFromModel and recursive feature elimination (RFE), which will be described subsequently.This diverse set of feature selection methods was chosen to comprehensively explore their effectiveness within the experimental context, enabling a thorough examination of their impact on the overall results.The inclusion of these various methods provides a robust foundation for assessing the role of feature selection in the study's outcomes.
SelectFromModel is one of the feature selection techniques for extracting essential and relevant features.It removes features whose corresponding importance values are below the given threshold value.This model works with estimators that have important features or coefficients [53].
RFE is an integrated technique compatible with various learning algorithms like SVMs and Lasso.It is primary function involves iteratively and explicitly reducing the number of features by recursively eliminating those with low weights or importance scores.RFE is particularly useful for optimizing model performance by focusing on the most relevant features in a dataset [54].
Table 2 displays the average accuracy of the classification (in%), over ten trials, of SelectFromModel and RFE using the logistic regression and AdaBoost classifiers.The results indicate that in colon cancer, breast cancer, and waveform datasets, the SelectFromModel method slightly improves the classification performance compared to the RFE method.In contrast, the latter enhances the classification accuracy in the remaining datasets.

Proposed method GASI parameters
As the parameters significantly impact the efficiency of the genetic algorithms, they should be chosen carefully to obtain the highest performance.Table 3 presents the parameters employed in GASI evaluation.The mentioned values were determined empirically through several experiments of the proposed approach.

Evaluation measures
The effectiveness of the proposed strategy GASI was evaluated in terms of the average accuracy of the classifier, the minimum number and rate of remaining features as in ( 12)-( 14), the average, the best, and the worse fitness values as in ( 9)- (11).The proposed strategy is then compared with other metaheuristic algorithms using these measures.A mathematical formula for the evaluation measures is given in ( 9) to (14).
Average fitness value  (14) where   is the maximum number of runs and   * represents the best fitness score attained at the  ℎ run.  * is the optimum accuracy of the classifier achieved at the  ℎ run, length ()  indicates the number of features that have been selected, and TN denotes the total number of features in the given dataset.

Experimental evaluation
In this sub-section, the performances of the suggested method are evaluated and compared to other powerful competitors.The results are presented in terms of fitness values, remaining feature rates, and classification accuracy.Each feature selection approach is executed ten times in each experiment, and the average of these different runs is utilized for comparing the different approaches.In addition, each dataset is normalized and randomly divided into a training set (70% of the dataset) and a testing set (30%).All these approaches are executed using Python on an Intel Core-i7 CPU with 16 GB of RAM.
Table 4 displays the average classification accuracy and rate of the remaining features (in %) over ten runs of the suggested approach GASI and the different SI-based feature selection approaches (i.e., FPA, JA, SSA, DE, COA, FA, and PSO) using the LR and AB classifiers, the best results are indicated in bold.The results in Table 4 show that the suggested approach is more optimal than many other SI -based feature selection techniques.It was able to select fewer features in most datasets while increasing the classification performance.Table 4 demonstrates that the proposed method GASI consistently outperforms the other SI-based feature selection technique.For instance, in the Colon cancer dataset using the RL classifier, GASI method achieved a classification accuracy of 98.94%.Contrarily, these values were reported as 94.37, 96.31, 97.89, 94.73, 95.25, 94.73, and 95.25, respectively, for the FPA, JA, SSA, DE, COA, FA, and PSO approaches.In addition, the AdaBoost classifier enhanced the classification accuracy to 100% for the suggested approach GASI.However, the accuracy of the FPA, JA, SSA, DE, COA, FA, and PSO methods was 92.62, 90.52, 95.25, 93.15, 99.47, 95.25, and 95.26, respectively.Table 4 also presents the number of selected features.The results show that all methods significantly reduce the dimensionality by only selecting a small part of the original features.For instance, by employing the RL classifier, the GASI approach performs better than the other SI-based methods in the colon cancer and sonar datasets, selecting only 0.2746 and 0.2694, respectively.Furthermore, the PSO method chose an average of 0.2343 and 0.3788 features in the breast cancer and arrhythmia datasets, respectively.However, the FPA method chose an average of 0.3615 features in the heart dataset, compared to the JA method's average of 0.2411 features in the Ionosphere dataset and the DE method's average of 0.7190 features in the waveform dataset.
Table 5 presents the evaluated results in terms of the average, the best (minimum), and the worst (maximum) fitness values.The results reveal that the proposed method, GASI, performed better in all datasets than other SI-based feature selection algorithms and delivered the smallest average fitness function value.For example, in the colon cancer dataset employing the RL classifier, the GASI approach provided an average value of 0.0131 in the fitness function.On the other hand, for the FPA, JA, SSA, DE, COA, FA, and PSO methods, these values were 0.0569, 0.0249, 0.0255, 0.0560, 0.0512, 0.0568, and 0.0507, respectively.Using the AdaBoost classifier, the mean value of the fitness function is 0.0037 for the GASI method.However, the values of other methods FPA, JA, SSA, DE, COA, FA and PSO were 0.0777, 0.0253, 0.0464, 0.0724, 0.0099, 0.0516 and 0.0044 respectively.
Figures 3 and 4 illustrate the mean classification accuracy on all datasets for the RL and AdaBoost classifiers, respectively.From these results, we can observe that, on all classifiers, the proposed method GASI obtained the highest average classification accuracy.The results in Figure 3 show that the GASI method achieved an average classification accuracy of 92.52% which ranked first with a margin of 2.89% compared to the JA approach, which achieved the second-best average classification accuracy.The FA method scored third with a margin of 3.54% compared to the best method.Furthermore, according to the results in Figure 4, on the AB classifier, the suggested approach GASI obtained the first place with an average classification accuracy of 91.53% with a margin of 1.81% compared to the COA method, which achieved the second-best average classification accuracy.In contrast, the DE approach secured the third position with an average classification accuracy of 89.12%.
Enhancing feature selection with a novel hybrid approach incorporating … (Salsabila Benghazouani) 953 Figure 5 provides an in-depth analysis of average fitness values across various datasets, with Figures 5(a) to 5(g) offering specific comparisons of mean fitness values for datasets such as colon cancer, breast cancer, heart, sonar, ionosphere, waveform, and arrhythmia.The results consistently affirm the superior performance of our proposed GASI approach when compared to other SI-based feature selection methods.This superiority is consistently demonstrated by GASI achieving the smallest average fitness value during evaluations conducted using both logistic regression and AdaBoost classifiers.These findings accentuate GASI's effectiveness in enhancing feature selection and its potential for wide-ranging applications across diverse datasets and machine learning algorithms.

Discussion
In this section, we delve into the main arguments that robustly demonstrate the performance of the suggested approach.These critical points not only provide a comprehensive understanding of the approach's effectiveness but also underscore its potential benefits.Through a detailed exploration of these arguments, we aim to establish the approach as a compelling and viable solution within its intended domain, offering valuable insights and outcomes for further consideration.− A machine learning task requires an efficient feature selection approach that can choose the most optimal number of features and obtain a better performance.Using a wide range of features increases the  Enhancing feature selection with a novel hybrid approach incorporating … (Salsabila Benghazouani) 957

CONCLUSION
With the massive amounts of digital data of various types and the exponential growth of artificial intelligence-based applications, the size of the data is increasing, leading to massive databases with a large number of features, especially in the medical field.At the same time, data mining and machine learning tasks require fast speed and greater accuracy.Over the past few years, numerous meta-heuristic methods have been developed to reduce the size of the dataset by eliminating redundant and irrelevant features that represent noise for the model.This paper suggests a novel powerful feature selection method, which uses a strategy that com-bines many SI-based (i.e., FPA, JA, SSA, DE, COA, FA, and PSO) feature selection approaches and employs a genetic algorithm that uses a multi-objective fitness function to discover the optimal subset of features in many data sets from different areas.This approach is applied to seven well-known datasets from the UCI repository for feature selection.The results obtained were compared with many powerful different SI-based feature selection approaches, and the experiments show that our method obtained better solutions in terms of fitness value and classification accuracy.Day by day, world health is affected by numerous invasive pathologies, especially heart disorders and cancer.This study shows the necessity of raising healthcare professionals' awareness about the efficient use of powerful feature selection techniques that may be successfully applied to medical databases for detecting, classifying, and predicting diseases.For future work, the suggested technique can be employed in high-dimensional datasets, and it can be combined with other metaheuristic techniques to more effectively improve the exploration of the searching space and accelerate convergence.Moreover, the suggested approach can also be used to solve various real-world problems.

Figure 1 .
Figure 1.Flowchart of the proposed method GASI

Int
Enhancing feature selection with a novel hybrid approach incorporating … (Salsabila Benghazouani) 955 probability of selecting irrelevant and redundant attributes, which negatively influences the model's performance, while the strong reduction of the number of features risks losing the original information of the dataset.In this paper the proposed multi-objective fitness function allows both reducing the number of features and minimizing the classification error.Subsequently, the features selected from the cancer dataset present the maximum information for diagnostic or predictive tasks.− The main goal of the suggested method is to take advantage of the best solutions obtained by many different SI-based feature selection approaches.This approach uses a genetic algorithm with a different strategy to develop a powerful feature selection technique that finds the best subset of features in many data sets from different fields.This strategy is based on an initial intelligent population composed of the best solutions obtained by the different SI-based feature selection approaches.In addition, the genetic operators (crossover mutation selection) keep the diversity of the generation to enhance the quality of the search space exploration and avoid the local optimum problem.− The temporal complexity is not a real obstacle because the selection of the characteristics is made with the exploitation of the model.This preliminary stage will not be repeated with each use of the machine learning model.

Figure 3 .Figure 4 .Figure 5 .
Figure 3. Average classification accuracy over all datasets on the Logistic regression classifier

Table 1 .
. Description of the seven studied datasets

Table 2 .
A comparison between SelectFromModel and RFE in terms of classification accuracy using the logistic regression and AdaBoost classifiers Enhancing feature selection with a novel hybrid approach incorporating … (Salsabila Benghazouani) 951

Table 3 .
Common parameters for the proposed method

Table 4 .
Average classification accuracy and remaining feature rates of the different feature selection approaches with logistic regression and AdaBoost classifier.The best results are marked in bold

Table 5 .
Average, best, and worst fitness values of the different feature selection methods using logistic regression and AdaBoost classifier.The best results of fitness values are indicated in bold