LSTM deep learning method for network intrusion detection system

ABSTRACT


INTRODUCTION
Nowadays, the world is experiencing a great revolution in the field of information technology, everybody is exchanging continuously information across the network. This implies the establishment of new tools and mechanisms of prevention and detection, and the strengthening of those that exist, like Network Intrusion Detection System (NIDS), in order to enhance security and protect the network from intrusions. The function of a NIDS is to observe, evaluate and classify traffic transiting through the network, it is based, in advance, on established methods and techniques in order to differentiate between normal and suspicious traffic. Furthermore, attackers are attracted by information and knowledge passing through the network, and to exploit and profit from them, they are forced to overcome obstacles and barriers of security by creating new attacks, and evolving the existing ones. While the current NIDS are not evolutionary, their identification algorithms do not progress to identify automatically new menaces, which pushes us to think about advanced and intelligent methods of detection that can identify new attacks and accompany the progression of the existing ones.
Moreover, attacks can be of different types, like DoS (Denial-of-Service) and U2R (User to Root) etc…, this problem of diversity leads us to find a resolution to detect and stop them all in a unique way. Currently, Deep Learning is experiencing huge success in several domains, it is a set of techniques used to recognize objects, extract information hidden in the data, and make predictive analytics [1], one of these methods characterized by its long-term memory is the Long Short-Term Memory (LSTM) [2]. And, to solve the issues cited above, we propose in this paper a new approach for NIDS based on the Deep Learning ISSN: 2088-8708  LSTM deep learning method for network… (Alaeddine Boukhalfa) 3317 tools for displaying. The paper is mainly oriented towards visualizing logs without any intervention against menaces. Our work presents a new approach compared to the other solutions, it is oriented to memorize a long-term attacks in order to discover the new others, and to deal with all intrusions in a unique manner.

EXPERIMENTAL ENVIRONMENT
In this section, we describe the data for experimentation, the evaluation indicators, the work environment, the adopted method and the compared methods.

Dataset and pre-treatments 3.1.1. Dataset
To evaluate our model, we used the NSL KDD [3] dataset. As mentioned above in the previous section, it is a derived version from KDD Cup 99 [6], that groups network traffic collected by 1998 DARPA IDS [4]. NSL KDD contains normal records, and records of attacks namely: DoS (Denial-of-Service) which destroy the service availability [13], Probe which extracts detailed information from the servers [14], U2R (User to Root) which try to exploit vulnerabilities in the system in order to obtain super user privileges [15], and R2L (Remote to Local) which send packets to a machine over a network who have no account on in order to lead to vulnerability issues and access secure information [16]. The distribution is illustrated in Table 1 and Table 2, Table 1 shows the distribution in two classes, whereas Table 2 shows the distribution in five classes.  The dataset does not include redundant records, it contains 43 columns, 42 columns define the characteristics of the recording as Duration, Protocol_Type, Service, Flag, etc..., and one column defines if it is a normal record or an attack, this column represents the label of the record.

Pre-treatment
For data preparation requirements before treatment, on the one hand, we have tried to normalize our dataset by converting character columns to numeric columns with the help of the famous 1-to-n encoding technique, Figure 1(a) and Figure 1(b) explain the differences before and after pre-treatment. And on another hand, we have separated the label column from the other columns via an Extract-Transform-Load (ETL).

Work environment, evaluation method and performance indicators 3.2.1. Work environment
The configuration of our machine is: the operating system is Windows 7, with Intel ( R ) Core (TM) i3 2370M CPU @ 2.40 GHZ ( 4 CPUs ), and 4096MB of RAM. The ETL is Talend Open Studio (TOS) for Data Integration, which is an open source software [17], efficient, flexible and easy to handle [18].

Evaluation method
The k-fold cross validation method is employed to measure the success of a classifier, it splits the dataset into two subsets, the first for training and the second for testing [19]. The operation is repeated k times separately, and the average of the k performances is calculated and returned. The advantage of this method is that the entire dataset is used for both training and testing, which makes the evaluation more accurate. We adopted 5-fold cross validation to evaluate our model, if we increase the k, some attacks like U2R and R2L will decrease for each subset, and they can be neglected during the treatment. To separate the training subset and testing subset, we also employed the ETL (TOS).

Performance indicators
The model assessment indicators are: Where:  Accuracy is the fraction of true detection overall data instances.  Sensitivity defines the ability of the model to detect correctly.  False Positive Rate (FPR) is calculated as the ratio of negative events wrongly classified as positive to the total negative events.  Precision is the fraction of relevant instances among the all proposed instances.  Recall is the fraction of relevant instances that have been found over the total of relevant instances.  F-Measure gives the harmonic mean of precision and recall.  TP, TN, FP and FN are retrieved from the confusion matrix, they mean respectively: True Positive, True Negative, False Positive and False Negative.

Adopted method and compared methods 3.3.1. Adopted method
As mentioned above, LSTM is a Deep Learning method, it is specially a Recurrent Neural Network (RNN) [20], which is characterized by its memory, that why it is adopted in this work, in order to memorize as long as possible attacks and predict new others. As shown in the Figure 2 [21] the LSTM gathers: an Input Gate which determines if a new input can transit or not, a Forget Gate which deletes information if is not important or let it impact the output, an Output Gate which determines the output, a single Cell which represents the Constant Error Carousel, and the activation functions which compute the activation of the three gates.

Compared methods
There are several methods in Machine Learning domain, so it is difficult to compare our suggested method to all these techniques, we will try to compare it only with the most efficient and popular of them, as SVM, KNN [22] and Decision Tree. SVM is a classifier established on margins, it uses small sample and achieves good generalization results [23]. KNN is a simple and efficient technique which uses the closest training examples to classify object in the feature space [24]. A decision tree is a method of classification, the various possible decisions are located at the ends of the branches (the leaves of the tree) and are reached according to decisions taken at each step [25].

RESULTS AND ANALYSIS
This part is dedicated to announce and discuss the various obtained results. We have evaluated the model for two types of classification, binary and multi-classification. For binary classification the dataset is divided on two classes: class of normal records and class of attacks. For multi-classification, and Given U2R are not dense in term of the number of attacks, the classification results of this type of attack are not satisfactory, so we decided to group them with R2L attacks in one class, so the dataset is divided on four classes: class of normal records and three classes of three categories of attacks (Probe, DoS, U2R-R2L). We have evaluated the metrics: Accuracy, Sensitivity, False Positive Rate, Precision and Recall, and we have compared them with the others of the other classifiers. Figure 3, Figure 4 and Figure 5 show respectively: Accuracy, average of Sensitivity and average of False Positive Rate (FPR) for binary and multi-classification. While Table 3 and Table 4 reveal Precision, Recall and F-Measure for binary and multi-classification. As indicated by Figure 3, the reached values of the Accuracy are 99,98 % for two classes classification and 99,93 % for four classes classification, which means that LSMT can properly memorize and identify traffics, and its detection capacity is better than the other machine learning classifiers. Also, as exposes Figure 4, the values of the average of the Sensitivity reached by the model LSTM are 99,986 % for binary recognition and 99,738 % for multi-recognition, this explains that the suggested model is very able to differentiate correctly between the different types of traffic, better than the other models. In addition, as noted by Figure 5, the values of average of False Positive Rate (FPR) achieved by LSTM are only 0,068 % for two classes classification and 0,023 % for multi-classes classification, which means that the margin of error of the detection of the method is minimal, and the values achieved are minimal compared to other classifiers.   The supreme values of Precision, as illustrated by Table 3 and Table 4, are those of our proposed Deep Learning model LSTM. For the identification of two classes, the Precisions reach up to 99,999 % and 99,969 % for respectively normal traffic and attack traffic, more than the other classifiers. For the identification of four classes, the Precision reaches up to 100 % for normal records, this is due to the density of this class of traffic, also for Probe and DOS attacks, the maximum Precisions achieved are respectively 99,863 % and 99,924 % more than the other classifiers, only one minimal value of Precision 95,896 % noted by LSTM (less than only that of KNN 97,483 %) in the case of the classification of the class U2R-R2L, explained by its minimum density. This justifies that the LSTM is generally very accurate more than the others. The values of Recall of LSTM for the identification of two classes, as shown in Table 3, are very high, 99,973 % for normal traffic and 99,998 % for attack traffic. Also, the values of Recall for the identification of four classes, as shown in Table 4, are also very high, 99,938 % for the normal class, 99,106 % for U2R-R2L class, 100 % for Probe class, and 99,906 % for DOS class (less than only that of KNN 99,980 %). This proves that LSTM can find normal instances and attack instances more than the other models. The experiment result has proved that the new method LSTM is very efficient, it can effectively memorize and differentiate between traffics: normal and attack, in the both cases of classification, binary and multi-classification.

CONCLUSION AND FUTURE WORK
In this paper, we proposed a new idea for NIDS established on the Deep Learning method LSTM, which will recognize attacks and keep a long-term memory of them, in order to block the other new attacks, and at the same time, will treat, with a unique manner, all type of these attacks. To validate the effectiveness of our new suggested approach, we employed the famous NSL KDD as dataset for training and testing, and Accuracy, Sensitivity, False Positive Rate, Precision and Recall as metrics for the evaluation, and we have compared the new method LSTM to the other Machine Learning classifiers. The experiment has demonstrated that the metrics of the detection of the LSTM method reach very high values more than the other classifiers. which proves that our new proposed method is effective for NIDS. In the future, we plan to implement really a new intelligent NIDS in the real world using our new proposed Deep Learning model LSTM.