A novel ensemble modeling for intrusion detection system

ABSTRACT


INTRODUCTION
In this digital age, maintaining information through online businesses and social networks remain insecure. Numerous intruders both human and robot, are gaining unauthorized access to information. Also their illusive nature in the internet has increased complexity in detecting intrusions. Mostly prevailing Intrusion Detection Systems (IDSs) have shown chaotic performance in identifying different attacks [1]. It is certainly possible to get a stable and accurate decision for all the attacks by unifying the decisions of multiple classifiers [2,3]. Therefore merging multiple IDSs is not a great concern, in terms of computation and best solutions can be achieved. With better analysis of data using ensemble learning, all the attacks can be identified. This integration most probably improves predictive accuracy. An Ensemble of classifiers has arisen as a feasible solution to the class imbalance problem [4].
Feature selection is of utmost significance for any learning algorithm which when poorly done (i.e., a poor set of features is selected) may lead to problems associated with incomplete information, noisy or irrelevant features. The learning algorithm applied is slackened gratuitously due to higher dimensions of the feature space, and also undergoing lower prediction accuracies by learning irrelevant information. Constructive feature selection methods generate better classification accuracies [5,6]. The crucial objective of feature selection is to attain a feature space with (1) low dimensionality, (2) retention of sufficient information [7]. On operating, applicable feature selection methods produce simplified models which are easy to interpret and reduce training time and also augment the generalization ability.
In the former works, machine learning methods employed a single learning model. Still, it has been witnessed that multiple prediction models can be utilized for solving the same problem. Therefore an approach, known as ensemble learning, was built on the statement that combining the output of multiple experts is better than using the output of any single expert [8]. Ensemble learning has been efficaciously realistic to classification problems and is also a mechanism for boosting other machine learning functions such as feature selection. In feature selection terminology, the individual selectors in an ensemble are called as base selectors. If the base selectors are all of the same kind, the ensemble is termed as homogeneous.
In this paper, we have built a novel ensemble model for Intrusion Detection System using Fuzzy Ensemble Feature Selection (FEFS) algorithm and Fusion of Multiple Classifier (FMC) algorithm. FEFS is done as; examining the prevalence of different feature selection methods, the unification of five methods is done to obtain a strong feature set which is indeed beneficial for better classification. The technique accustomed to joining the outputs is based on fuzzy logic. Its main perspective is to select the most optimistic features in KDD cup 99 dataset. KDD Cup 99 [9] is an eminent intrusion evaluation dataset and is a classic example of large-scale datasets. A Fusion of Multiple Classifier (FMC) is for the process of classifying attack and normal data, through the unification of Support Vector Machine (SVM), K nearest neighbor classifier (KNN) and Artificial Neural Network (ANN). Then by this ensemble classification method, we have achieved better accuracy and lower False Alarm Rate (FAR). This paper is being prepared in a subsequent way. In section 2, related works were described. Methodology for construction of Ensemble modeling is discussed in a detailed manner in section 3. Then in section 4, total experiments made and results attained were discussed specifically. The Last section specifies the conclusions and discussions.

RELATED WORK
Ensemble feature selection procedures utilize an idea analogous to ensemble learning for classification [10]. There are several works done, constructing ensemble feature selection techniques, for the selection of the optimal feature set [11]. Olsson et.al have specified ensemble of multiple feature ranking methods that combine three generally used filter based feature ranking techniques like information gain, document frequency thresholding, and the chi-square method mainly for text classification problems. In recent works, Wang et.al has integrated ensemble of six filter based rankers and accomplished notable results [12]. Basant Subba et.al has applied two statistical methods namely Linear Discriminant Analysis (LDA) and Logistic Regression (LR) which were useful successfully to develop new intrusion detection models [13]. In [14] Afef Ben Brahim et.al has developed a robust feature aggregation technique for combining the results of three diverse filter methods. This aggregation technique is relied on determining feature algorithms confidence and conflict with the other ones in order to assign a reliability factor controlling the final feature selection.
Because of the imbalanced distribution of classes in the KDD cup 99 dataset, the results cannot be precise. Recent studies have shown a solution which is to incorporate Ensemble learning. The major challenges and opportunities with the imbalanced data set were clearly given in [15]. Ensemble learning is effectively implemented on classification problems [16,17]. Bukhtoyarov et al. [18] have developed ensemble based on Genetic Programming known as (GPEN) to categorize the input intrusions as Probe or non-Probe attacks, with nine of the 41 features. Borji [19] has given an ensemble methodology using four base classifiers viz. SVM, k-NN, ANN, and decision trees. In the works done in [20], a new ensemble approach is proposed for effective intrusion detection. This ensemble approach is the grouping of attribute selection, multiclass SVM and k-NN classifier. Besides, an Incremental Particle Swarm Optimization is also embedded as an ensemble classifier for boosting the classification accuracy in their works. In this Perspective, ensemble learning and various fusion methods [21,22] are considered to have a potential development in classifier"s performance we have made the proposed investigations. Figure 1 describes the proposed ensemble modeling architecture of Intrusion Detection System. It is incorporated with two different phases. First one, which performs Feature selection named, Fuzzy Ensemble Feature Selection (FEFS). Next is, classification phase named, Fusion of Multiple classifiers (FMC) which is employed for classifying the data as attack and normal.

Fuzzy ensemble feature selection (FEFS)
Merging feature selection methods were executed to achieve stable and robust outputs. An Ensemble can be made by usage of the aggregation operations. This is achieved by considering the advantages of five filtering methods such as Canberra distance, City block distance, Euclidean distance, Cebyshev distance, and Minkowski distance. Fuzzy logic is applied for aggregating the five filters. The main thought behind employing fuzzy logic is backtracking. In contrast, some of the features may be left in the traditional methods where certain threshold is exploited. Hence weights are allocated to all values. Aggregation of all the filters is done by making use of fuzzy union operation of the fuzzy sets. On the data set, Euclidean distance is computed for all the features. Now for the same data set cebyshev distance, Canberra distance, City block distance, and Minkowski distance are calculated for all the features. All these values are fuzzified. Then Aggregator is applied. It is shown by the FEFS structure in Figure 2.
For any particular feature F i ϵ J, then the cebyshev distance is computed as - For any particular feature F i ϵ J, the Canberra distance is computed as Likewise, city block distance is also computed for the same data set. It is given as For any particular feature F i ϵ J, then the minkowski distance is computed as  (4) and (5) we get five sets of values. Then the conversion of these values into fuzzy is made. This is known as fuzzification. They are termed as fuzzy sets namely f_ca 1, f_ci 1 f_eu 1 f_mi 1 , f_ce1. They are said to be feature scores. The procedure of transformation is done using trapezoidal membership function. A special case of trapezoidal is L-Function. Presume y is the element to be transformed then f_y will be (i.e. fuzzy conversion for y) .Here "a' and "b' is minimum and maximum values in the whole set. Transformation is done after applying all the filters on all the features. Feature score calculation is shown in line 9 to line 13 in the algorithm given in Figure 3. Then for each feature their feature scores {f_ca 1, f_ci 1 f_eu 1 f_mi 1 , f_ce 1 } are combined using Aggregator. Here fuzzy Union operation is utilized for combining them. The fuzzy union operation will return the maximum of all the membership values obtained from all five feature scores [23]. It is shown in line 15 of the Figure 3 Find those features whose F i =1. For instance, consider a feature F j .To this feature, five filters is applied. The fuzzy logic is applied to each of the filters. Then, they are transformed to fuzzy values. Then after getting five feature scores for the feature F i , they were aggregated by operating fuzzy union on them. Then F i will be a single value. The whole process is done for all the remaining features. Finally, all the features whose membership value is equal to 1 are selected as the best feature set. It is shown in line 17 of Figure 3.

Fusion of multiple classifiers (FMC)
The merging of multiple classifiers can be firm and predict better than single classifiers [24]. The proposed FMC is based on majority voting method over individual base classifier which improvises detection of attacks. An FMC algorithm is developed based on three individual classifiers. They are 1, K-Nearest Neighbor (kNN) classifier, 2.Support Vector Machine (SVM) and 3.Artificial Neural Network (ANN). All the three base classifiers is an expert in a specific region of the predictor space because they treat the attribute space under different theoretical basis [25]. The three classifiers could be joined in such a manner in order to yield an ensemble majority voting classifier that is superior to any of the individual rules.
At this level, the result of FEFS is taken and provided to the FMC algorithm. The structure of the proposed FMC is depicted below in Figure 4. KDD dataset is a dataset with n no. of tuples and α no. of features. The class label is termed as classlab.It can either be "0" or "1". The whole process is summarized in 1967 the FMC algorithm depicted in Figure 5. Feeding the preprocessed data to K-NN classifier is done. It is given in step2. Again it is fed to the SVM and ANN respectively. Therefore three local decisions Y1, Y2, Y 3 are produced. Then the consequences of three base classifiers are fused. Each local decision Y i will be either "0" or "1". Here "0" means attack and "1" means non-attack. Then the fusion of local decisions from three base classifiers can be obtained by using the ensemble decisive function i.e. Majority voting method. Suppose the final decision from the ensemble classifier Y is defined as Where d t,j {0,1},t=1,2,…T and j = 1,2..C.Where T is the number of classifiers and C is the number of classes. Here we have considered two classes and three classifiers. Then Y chooses the class that receives the highest number of votes.

EXPERIMENTS CONDUCTED & RESULTS OBTAINED
Experiments were made on the KDD cup 99 dataset. The researchers in their works have used the portion of the dataset from the KDD cup 99 data set for building IDSs not including the complete train or test dataset [26]. So, we have taken a subset of KDD cup 99 containing 14207 records and call as "KDD dataset". The size of the dataset is taken in proportion to the relative size of the KDD cup 99 dataset and R2L,U2R records are taken as usual from the original data set.

Data preprocessing stage
The KDD cup 99 data set which is a raw data set is taken for conducting investigations on the proposed approach. Appropriate preprocessing techniques were applied. The data in the above-mentioned dataset are converted to numeric. Discretization of continuous variables is made to the data set. Symbolic values of three features have been given numeric values ranging from 1 to N. Interquartile range (IQR) also been applied to remove noise and outliers in the data set. A subset of KDD cup 99 data set is taken for experimentations. It has classes with same proportions as in KDD cup 99. Therfore it is named as KDD dataset. It has 14207 instances with 3000 Normal instances, 10000 DoS instances, 574 probe instances, 1968 401 R2L, and 52 U2R instances. All the five classes in the KDD data set are assigned numeric values. They will be assigned as "0" for U2R, R2L, Probe, DoS and "1" for Normal. The 41 consecutive Features are labeled as F a , F b , F c , F d, …F ao respectively.

Applying FMC
In the complete experiments conducted we have used 10 fold cross validation for analyzing the proposed Novel Ensemble model. The 10 fold cross validation is also referred to as rotation estimation. It is a recommended method over the holdout method and leave-one-out methods for estimating a classifier. The dataset has been split at random into ten parts of the equivalent size. Every part is kept out in turn and the training is conducted on the remaining nine parts, then the testing is made on holdout set. The training is made totally 10 times on different training sets and lastly, the average of ten error rates is considered for attaining complete error estimate. Four different experiments were made to indicate the results. 1. With FEFS outputs given to SVM, 2.With FEFS outputs given to ANN, 3.With FEFS outputs given to K-NN and 4. The Proposed Novel Ensemble Model (FEFS+FMC).At the testing part, instances of the KDD data set are fed to the suggested FMC process by leaving their class-label to which they exist. This ensemble classifier gives the network traffic data either as normal or as an attack. We performed our experiments using Java 1.8 and R data mining software tool. Finally, the results are visualized and recorded. To determine the statistical significance of our results, we compare our proposed method with features selected with individual classifiers.
In the ideal situation, some parameters like accuracy, the true positive rate should have maximum values while others like the number of features, error, should have the least amount. However in exceptional circumstances, some parameters may have more effect than the others, so weight has to conform accordingly. The target metrics for classification are listed below in Table 1. Comparison of performance of all the four experiments on the KDD dataset using the Accuracy rate, Detection Rate (DR), FAR, Precision of the proposed novel ensemble model is illustrated in Figures 6-9 respectively. The proposed ensemble approach implements significantly better than well-known individual methods such as Support Vector Machines (SVMs), K-Nearest Neighbor (KNN) and ANN. The overall relative improvement of accuracy, the Detection Rate for the proposed approach is high, and also the False Alarm Rate has been decreased.
The classification models are evaluated using the area under the ROC curve (AUC) performance metric. AUC is widely used, providing a general idea of the predictive potential of the classifier. A higher AUC is better, as it indicates that the classifier, across the entire possible range of decision threshold, has a higher true positive rate. From certain studies, it is proved that AUC has lesser variance and is more consistent than remaining performance metrics (such as precision, recall, F measure) [27]. The ROC obtained for the proposed model is shown below in Figure 10. The AUC is 0.9 which is pretty good. The results summarized for the KDD data set is interpreted in the Table 2

DISCUSSIONS AND CONCLUSIONS
This research introduced a novel ensemble architecture designed for IDS. It is based on two algorithms Fuzzy Ensemble Feature selection (FEFS) and Fusion of Multiple Classifier (FMC). FEFS is an ensemble of five scores. These scores are obtained by using feature-class distance functions. Aggregation is done using fuzzy union operation. An FMC is the fusion of three classifiers. It works based on Ensemble decisive function. Experiments were made on KDD cup 99 data set have shown that our proposed system works superior to well-known methods such as Support Vector Machines (SVMs), K-Nearest Neighbor (KNN) and Artificial Neural Network (ANN). Our examinations ensured noticeably the prominence of using ensemble methodology for modeling IDSs. And consequently, our system is robust and proficient. Since all the reflected performance measures could be improved, such systems will be beneficial in numerous real-world applications. Our experiential results are indicating that ensemble learning is effective in enhancing attack detection rate and lessening the FAR. Performance comparisons were made on the proposed framework versus other base classifier methods with the reduced feature set. The AUC is 0.9 which is pretty good. The Proposed model has achieved 0.9, 0.95, 0.96 and 0.9 of precision, recall, F-measure and ROC area respectively. Since current IDSs are unable to detect all kinds of new attacks because they are designed to restricted applications on the limited environment. Thus, indeed there is a necessity of safeguarding the networks from known attacks and parallel should take measures to identify new and unseen, but possible system abuses, by emerging novel reliable and efficient IDSs. The area of future research includes improvements for machine learning methods to detect novel/unseen attacks. ISSN: 2088-8708 