Deep learning algorithms for intrusion detection systems in internet of things using CIC-IDS 2017 dataset

Due to technological advancements in recent years, the availability and usage of smart electronic gadgets have drastically increased. Adoption of these smart devices for a variety of applications in our day-to-day life has become a new normal. As these devices collect and store data, which is of prime importance, securing is a mandatory requirement by being vigilant against intruders. Many traditional techniques are prevailing for the same, but they may not be a good solution for the devices with resource constraints. The impact of artificial intelligence is not negligible in this concern. This study is an attempt to understand and analyze the performance of deep learning algorithms in intrusion detection. A comparative analysis of the performance of deep neural network, convolutional neural network, and long short-term memory using the CIC-IDS 2017 dataset. This paper carried out a brief study of the relevance of deep learning in IoT IDS and did a comparative study with three deep learning models such as DNN, LSTM, and CNN. The results show DNN gives 94.61% accuracy, while LSTM and CNN achieves 97.67% and 98.61%, respectively. From this comparative study and literature review, it has been proven that deep learning models outperform the other methods applied in IoT IDS environment. Despite the deep learning models having better accuracy, our future scope is to develop a hybrid deep learning model for IoT intrusion detection with better accuracy in attack prediction and experimenting with the real-time dataset. The hybrid model is not only for combining two models but also for detection methods and IoT IDS placement strategy. Developing a hybrid deep learning model for IoT intrusion detection for better accuracy in attack prediction and experimenting with real-time dataset is the future scope which is the need of the hour.


INTRODUCTION
Internet of things (IoT) can be considered as the boon of the latest century. The adoption of this technology in various walks of life and in every business, medical and engineering field showcases the extent to which this technology is being embraced by all. Since the concept of artificial intelligence (AI) is also incorporated into it, IoT devices become smarter and can take better decisions. According to International Data Corporation (IDC), IoT device-generated 73.1 ZB of data in 2025, and the estimated number of IoT devices will be 41.6 million [1]. Even though IoT helps to automate many applications and thereby reduce human interventions, security is the primary concern to be addressed. So, the identification of varying attacks is a significant concern among the researchers.
From the beginning of the design of the IoT network and smart devices, there were also attempts to protect data and devices from intruders. Security of the data collected and stored is always a major concern for researchers working in this area as the mode and type of attacks vary every moment. There are different approaches for attack detection such as filter packets-with firewalls and proxies, adopting encryption-with cryptographic protocols, data storage encryption or virtual private networks, password authentication method, audit and log activities-for web servers, database servers, and application servers, attack identification using intrusion detection system, intrusion prevention system [2]. An intrusion detection system (IDS) is a technique that can track network traffic and identify malicious traffic or any kind of attack and give alerts [3]. It is a combination of software and hardware. The idea of the IDS was started in 1970 [2]. The IDS are categorized into four based on the occurrence, placement strategies, and detection method. Based on occurrence strategy, the collection of information can be host-based, network-based, network node-based or hybrid mode. In the placement category, the placement

1135
of IDS can be centralized, distributed, and hybrid. In detection method categorization, it can be signaturebased IDS, anomaly-based IDS, and hybrid IDS. [4], [5]. The concept of IDS started with computer networks for identifying abnormal traffic. For this implementation of IDS, different methods were used based on game theory, complex event processing, automata [6], data mining, statistical model, payload model, rule-based [4], and AI. Even though other techniques exist, AI has a prominent role in intrusion detection as it has proved to detect attack better. AI based IoT IDS can overcome the shortcomings of the existing traditional methods. Most of the current IoT IDS technologies are static, unable to learn from the previous attack. AI is a powerful method that can learn from the previous attacks over time, identify attacks from the usual traffic, and alert the corresponding system. AI methods such as machine learning (ML) and deep learning can provide powerful capabilities to IoT security requirements [7]. From the earlier stages of AI implementation in IoT IDS, the researchers have experimented with different ML techniques. Though ML techniques give better accuracy and overcome other shortcomings of the traditional methods, it has some other limitations. In ML techniques, the classification and regression tree (CART) has a significant role. CART gives high performance with low training time, but it shows less performance for complex dataset [8]. In ML, conventional methods follow shallow learning which sometimes focuses on feature engineering and selection. In the traditional detection method, the learning capacity is less, reducing the complex dataset. The learning process gathers partial information from every data, so a large amount of data is needed for training. A large amount of data is very crucial in the case of the heterogeneous dataset. Deep learning has a significant role in a large amount of data and has the ability to automatic feature learning and handles advanced problems upon a bulk amount of data [9]. This paper focuses on three deep learning models deep neural network (DNN), long short-term memory network (LSTM), and convolutional neural network (CNN). Section 1 gives an introduction to IoT, its security issues, and existing solutions. Section 2 details the impact of deep learning in IoT IDS from recent studies available in the literature. A detailed explanation of the method adopted for this study is mentioned in section 3. Section 4 presents the results and discussions, followed by conclusions and future scope in section 5.

RELATED WORK
Deep learning has had a vital role in IoT intrusion detection rather than any conventional method. This section gives a glimpse of the importance of deep learning in IoT attack detection. Yin et al. [10] proposed a recurrent neural network (RNN) with the NSL-KDD dataset and performed binary and multi-classification. In another study, the DNN model using the KDD CUP 99 dataset is presented [11]. It was focused on multi-classification, and the first epochs onwards result showed 99% accuracy.
Bi-directional long short-term memory recurrent neural network (BLSTM-RNN) for binary classification in IoT intrusion detection was carried out [12]. The results show the proposed model achieved 95% accuracy. CNN gives more accuracy on intrusion detection [13]. A comparison of CNN with other deep learning methods was performed. The CNN model was proposed and tested with two datasets: NSL-KDD and UNSW-N15. The result shows the proposed CNN model gives better results with existing deep learning models. In another study, Ding and Zhai [14] presented an intrusion detection model based on CNN. They focused on multi-classification with the NSL-KDD dataset. The performance of the proposed model was evaluated with other ML and deep learning models such as radio-frequency (RF), support vector machine (SVM), deep belief network (DBN), and LSTM.
A novel feed-forward neural network (FNN) is proposed for binary and multi-classification using the BoT-IoT dataset [15]. This study gave a detailed explanation of the proposed framework and used accuracy, precision, recall and F1-score as evaluation metrics. Wu and Guo [16] proposed the LuNet model and tested it with two datasets. In another study, a DNN model is proposed for binary and multi-classification and tested with six datasets [17]. Sindian [18] proposed an enhanced autoencoder approach called EDSA for detecting DDoS attacks. Ahmad et al. [19] proposed a new DNN model for identifying attacks from both authentic and non-authentic sources. Nowadays, most researchers are stepping forward to work with new datasets rather than traditional datasets. Using the BoT-IoT dataset, Popoola et al. [20] proposed a hybrid model to detect BoT attacks in IoT. The researchers worked on both binary and multi-classification.
Syed et al. [21] introduced intrusion detection system IoT time-series data using RNN and bi-LSTM with feature selection. In this study, they worked in the BoT-IoT dataset with different feature selection methods to evaluate the model. A model is proposed to identify three different DDoS attacks using the DNN and LSTM model for binary classification [22]. An enhanced UNSW-NB 15 dataset is used for intrusion detection using deep learning models [23]. A network anomaly detection method is suggested for the NSL-KDD dataset by using deep learning in the unsupervised active inferences layer [24]. It can be inferred from the literature reviewed that the majority of the research is done using the existing dataset and the newly proposed models are not that much compared with the latest deep learning models.

METHOD
This section focuses on data pre-processing and detailed implementation of three deep learning algorithms. The CIC-IDS2017 dataset is used for DNN, CNN and LSTM models. The proposed method illustrates the overall idea of the work. The pictorial representation of the proposed method is shown in Figure 1.

Data pre-processing
Data pre-processing is an inevitable step before feeding the data into the model. The entire dataset contains eight CSV files. First, append all the available datasets into a single dataset, then perform data pre-processing and data cleaning. In the standardization of column names, check whether any comma or other special characters exist, and such kind of values are removed. To correct the dataset, check whether any infinite values are present and find out that 'flowbytes/s' and 'flowpackets/s' contain 1,509, 2,867 infinite values, respectively. Then, check for the null values in the columns, generate the total number of null values of each column, and identify that 'flowbytes/s' and 'flowpackets/s' have 2,867 values. Next, generate the description of all the columns with count, mean, standard deviation, minimum values, 25%, 50%, 75%, and maximum values. Here, all the null values are replaced by zeros and generated in the dataset head details.
The next focus was on exploratory data analysis (EDA). It is a method to analyze data and summarizes the data characteristics frequently through the visual approach. Using principal component analysis (PCA) method to remove the highly correlated data, perform standardization and label encoding of 1137 the data, subsequently. The entire data was reduced to 71 features, including the label. The dataset was split into training (70%) and testing (30%) data and then checked with normalization and transformation of both train and test data, followed by summarizing the transformed data with precision 3.

Deep learning models
Deep learning is a subset of machine learning and tries to learn from a vast dataset using a multilayered neural network. Deep learning follows a transfer learning methodology rather than a shallow learning approach. So, deep learning can provide better accuracy in terms of classification, which gained weights from the previous layers. This section focused on the implementation of different deep learning architectures such as DNN, LSTM and CNN. To evaluate models used confusion matrix, accuracy, precision, recall, and F1-score as an evaluation metrics. A confusion matrix is a table which summarizes the predictions of classification models. It contains a total summarization of corrected and incorrected predictions based on each class. To draw up the confusion matrix, calculating true positive (TP), true negative (TN), false positive (FP), and false negative (FN) is needed. Then, we calculated metrics as in (1) to (4).

Deep neural network
A DNN architecture is a type of neural network which follows a feed-forward network. It contains multiple fully connected hidden layers rather than input and output layer. From the input layer, information passes to hidden layers in a feed-forward manner, and by using the backpropagation algorithm, the output layer learns weights repetitively [25]. The proposed DNN architecture contains an input layer with 250 neurons, three hidden layers with 32, 72, and 32, respectively, and the output layer with five neurons. The connection mode was fully connected. The hidden layer activation function is rectified linear unit (ReLU), and SoftMax is used as the output layer activation function. To identify the loss used categorical cross-entropy as a loss function, and Adam optimizer was used to minimize the error function. Table 1 shows the values of evaluation metrics of DNN.

Long short-term memory
LSTM works efficiently for time series data. LSTM architecture uses looping feedback connections and feedforward connections, which is helpful to model to hold information for a while. LSTM can learn from long and short dependencies without loose and excess accumulation of data, and, at the same time, is smart enough to remember things from the past and predict the subsequent scenarios. LSTM uses a series of gates such as forget gate, input gate, and output gate to control the flow of information in each cell present in the architecture [25], [26]. The formulations of LSTM architecture are shown below. The output of the forget gate is denoted as Ft and WF, UF, bF are weights and bias parameters of forget gate. It is the output of forget gate and WI, UI, and bI are the input gate weight and bias. During training, these weight and bias parameters are optimized. xt and ht are input vector and hidden vector at time t.
Ct holds the value kept in the memory cell which calculated by the output of input and forget gate along with current value of input. By using these values, the output and hidden states are calculated. ⨀ is the elementwise vector product.
The LSTM model implemented contains four hidden layers having 64, 64, 128, and 128 neurons. ReLU is used as the activation function of hidden layers, and SoftMax was used as an activation function of the output layer. In fitting model two, the loss function in categorical data used the categorical cross-entropy function and binary cross-entropy function for binary data. Table 2 gives values of evaluation metrics of LSTM.

Convolutional neural network
CNN is a supervised learning method that is used to classify labelled data into a different pattern. CNN has several building blocks such as convolution layer, pooling layer, and fully connected layer. The CNN architecture can train multiple nonlinear layers with fully connected layers. So, it can automatically learn important hierarchical features from the raw data. CNN is mostly dealing with more complex feature extraction with better accuracy. The CNN architecture can reduce the number of parameters and gradient diffusion problem also. It leads to the successful training of the model in an effective manner [17], [25], [27].
The time series network traffic data input vector is y = (y1, y2….yn-1, cl), where is features and is class label. The feature map fm applying in convolution operation on the input data with filter , and f is the feature. The feature map fm from the set of features f is obtained as (12), where bias term denotes as and hl is implemented in each set of features f in record { 1: , 2: + 1, … − +1 } to generae feature map as (13), where ℎ − +1 and applying max pooling operation on each feature map as ℎ → = max{ℎ }. A fully connected layer mathematically as (14).
In the implemented model, there are three hidden layers with a ReLU activation function. Each hidden layer contains 120, 60, 30 neurons, respectively, and the output layer contains 15. In between the hidden layer, it used MaxPooling layer with pool size 2. This architecture used sparse categorical cross-entropy as a loss function with Adam optimizer. Table 3 gives values of evaluation metrics of CNN.

RESULTS AND DISCUSSION
This section focuses on comparing implemented three architectures such as DNN, LSTM, CNN as well as existing models. In this result evaluation used accuracy, precision, recall and F1-score as evaluation metrics of the model. Figure 2 illustrates the evaluation metrics comparison of three implemented models.

Figure 2. Comparison of DNN, LSTM and CNN models
The comparative analysis of the graph shows that CNN yielded much better results in terms of accuracy, precision, recall, and f1score. Table 4 gives the overview of the comparative study of the models with other existing models. From the results, we can identify except DNN model other two models have better accuracy. The main reasons are system dependencies and lack of correct feature selection. The main advantage of LSTM is that it has the edge over any other conventional feedforward neural network. The CNN allows the model both time and space correlations for better performance.

CONCLUSION
In this emerging technological era, IoT devices have a very important role in the day-to-day life of all human beings. We can see various applications of IoT in all fields such as automation, health care, enhancement of customer experiences, and smart safety. Even for most people depending on the IoT devices security is the major concern for all of them. For this security purpose, researchers are focused on IoT intrusion detection systems. Even though there are various traditional methods and machine learning models available for the implementation of IoT IDS, deep learning models have a significant role in that, because deep learning method has ability to maximize the utilization of unstructured data as well as it can work on huge amount of data and perform better than other techniques. This paper carried out a brief study of the relevance of deep learning in IoT IDS and did a comparative study with three deep learning models such as DNN, LSTM, and CNN. The results show DNN gives 94.61% accuracy, while LSTM and CNN achieves 97.67% and 98.61%, respectively. From this comparative study and literature review, it has been proven that deep learning models outperform the other methods applied in IoT IDS environment. Despite the deep learning models having better accuracy, our future scope is to develop a hybrid deep learning model for IoT intrusion detection with better accuracy in attack prediction and experimenting with the real-time dataset. The hybrid model is not only for combining two models but also for detection methods and IoT IDS placement strategy. Developing a hybrid deep learning model for IoT intrusion detection for better accuracy in attack prediction and experimenting with real-time dataset is the future scope which is the need of the hour.