An ensemble model to detect packet length covert channels

ABSTRACT


INTRODUCTION
A covert channel is a channel that is exploited to transfer secret data covertly in a way violates system security policy [1], [2] a sender uses covert channels to communicate with a receiver in order to send him\her a covert message that cannot be seen by others [3].In 1973, the concept of covert channel was presented by Lampson [4] to be executed in stand-alone machine environment.Where, in 1987, this concept was expanded to include computer networks [5].Recent research indicated that the existence of network covert channels, which offer significant assistance for undertaking covered communications including transmitting secret data and/or stealing confidential information, is one of the major risks to internet security [6].Covert channels in networks, are techniques that aim to hide information in normal network traffic to stay undiscovered [7], and have recently attracted more attention [8].Actually, the environment for building various types of covert channels on computer networks has gotten rich [9], [10].In addition, even the new technologies in this area do not pay much attention in the design phases to the weaknesses that can be used to establish different kinds of covert channel threats [11].Some factors that play an important role in the development of covert channel attacks on networks were presented in [12].These involve i) the rapid advancement of network techniques that help in introducing different scenarios for committing serious attacks that based on covert channels, ii) switch techniques that offer the capability for secret messages to change their locations, which complicates the way to detect such attacks, and iii) internal control protocol

5297
techniques that utilize micro protocol approaches to secure dynamic route and reliable communication to covert traffic.Covert channels that use data packet lengths as a carrier have become popular approaches to hiding secret messages.In which the difference of data packet lengths is exploited to send a secret data.In the literature, there are many techniques that exploit network packet lengths which lead to form different types of covert channel attacks.Voice over internet protocol (VoIP) traffic is an enrich environment for building packet length covert channels, as it includes continued transmission of huge amount of data that attract creating such covert channels.Especially mobile VoIP, which becomes popular for transmitting large amounts of data, and therefore mobile VoIP traffic becomes a potential target to establish covert channels with high throughput.
Unfortunately, packet length covert channels can be applied, even if encryption methods are implemented and can exploit several network protocols.Resolving such covert channels can be attained by identifying and developing some indicators to help in building adequate detection approaches.Simply it is possible to eliminate this kind of threats, packet length covert channels, by having all packets taking the same length but this approach diminishes the network capacity [13] and therefore is not considered as an adequate solution to such type of covert channels.The use of machine learning methods to combat covert channel attacks has become a hot topic for network security [14].Eventually, the ongoing development of different techniques to construct covert channel attacks is clearly noticeable.Moreover, covert communication can be exploited to carry out different malicious activities that pose massive risks to our data and confidentiality.Therefore, more research efforts are required to develop adequate solutions capable to counter these threats.
In this paper stacking ensemble model has been introduced to predict the presence of any attack caused by covert channels that exploiting packet lengths.This model is a machine learning design that uses stacking method to aggregate the results of the base classifiers to generate better results.The ensemble model has five classifiers as the base classifiers, whereas the logistic regression (LR) classifier is the meta-classifier that is used to combined the base classifiers' output to form the ensemble model output.This section provides brief information to covert channels, illustrating their basic concept with focusing on packet-length-based covert channels.The next section highlighted the main common types of covert channels.Subsequently, relevant work is given in section 3 which provides general overview in covert channels that use packet lengths and sheds lights on some detection methods that were presented to counter them.Section 4 gives a description to our proposed classifier.It illustrates the methods to construct and evaluate the proposed approach.Section 5 provides a discussion of the results obtained, whereas the study is concluded in section 6.

COVERT CHANNEL TYPES
Covert channels are mainly divided into two types: storage covert channels (SCC) and timing covert channels (TCT) [15].In storage channels, storage location is used to hold a covert message [16] such as protocol fields, whereas in timing channels, a covert message is modulated using the timing characteristics of the network traffic.In other words, covert storage channels use an object to send a covert message while covert timing channels use timing aspects to convey covert messages.The construction of covert channels by combining both storage and timing techniques leads to a third type that takes the advantages of both categories, timing and storage, and therefore it has the capability to pose real challenge.It is known as hybrid channels.

RELATED WORK
To avoid being detected, the attackers are motivated to construct hidden channels to convey secret data.These channels violate the security policy of the system and cannot be detected using traditional intrusion detection methods.The construction of covert channel over network has become modern way to leak information and poses real challenges.For covert channels that exploit packet lengths, the detection of such channels may be more difficult, as some of them generate covert traffic that is highly imitated normal traffic, and therefore the detection methods that watch the variations between normal and covert traffic may fail to catch them.

Overview
The concept of covert channels that exploit the variations of network packet lengths to pass secret messages was initially proposed by Padlipsky et al. [17] and Girling [5].In these covert channels which presented in [5], [17], the transmission parties (sender and receiver) should share some rules before starting their transmission session.However, these approaches are vulnerable to detection.They cause notable changes that can be caught by detection methods and therefore it is easy to identify such channels [18].In the covert channel proposed in [19], a matrix of unique packet lengths should be shared between the two parts that intend to initiate covert communication before starting their communication session.However, their proposed covert channel can be detected [20].Ji et al. [20] proposed a packet length covert channel which has the ability to deliver normal network traffic; therefore, detection methods fail to discover it.However, Nair et al. [21] proposed a detection method that is capable to detect their channel.A packet length-based covert channel proposed by Ji et al. [22] that was expected to cause no abnormal behaviours and therefore the authors indicated that their approach has great resistance to detection methods.
Hussain and Hussain [23] proposed a covert channel with high capacity of covert message.It exploits packet lengths and data payload.However, the construction of this type of covert channel is complicated and may be vulnerable to detection due to the use of the data payload [24].
Abdullaziz et al. [25] proposed a covert channel based on network packet length.One packet can carry 1 bit of a covert message.This covert channel has high resistance to be detected and does not require a shared key as many packet-length covert channels require; however, it has low bandwidth [26] and data shifting may occur as a result of user datagram protocol (UDP) packet drop [27].
Sabeti and Shoaei [26] introduced two approaches that are expected to have greater resistance against detection methods.In these approaches, two lengths are used to modulate one bit and it may necessary to change their order to fit the secret message.However, long covert messages require huge number of network packets as one bit of a covert message requires pair of network packets.Our ensemble approach that has been proposed by this study is expected to detect these channels because swapping of the packets causes abnormality that can be discovered by our proposed approach.
Liang et al. [28] constructed a packet length covert channel using mobile VoIP by mapping the length of the packet with covert data.As the authors stated, the throughput, undetectability, and reliability of the scheme were evaluated.However, in the event that a malicious party is able to decrypt the packet content, the original ordering of the packets can be obtained and therefore this breaks the covert communication.

Detection using machine learning
Packet length covert channels can be eliminated by getting all packet lengths equal in size.However, this method is computationally expensive and thus will affect network performance.The process of reconstructing network packets by examining the size of each packet and then applying either padding or splitting technique definitely causes more overheads.Therefore, developing detection methods that capable to spot such attack is appropriate alternative.
The use of machine learning and deep learning technologies has been demonstrated in a number of domains, including information security.Numerous machine learning techniques have been used to identify harmful behaviour and exhibit high performance.Tang et al. [14] indicated that the application of machine learning techniques to thwart covert communication has become a hot topic in network security.Readers who are interested can view some recent studies that use machine learning or deep learning techniques to thwart covert channel attacks in [29]- [34].As an example, Dua et al. [6] proposed a two-layered system to locate and detect storage covert channels that use the IPv6 protocol.deep neural networks (DNN) and the one-vsrest technique with support vector machine (SVM) are used in their suggested system.The authors use their own dataset, which was produced using the pcapStego tool and the CAIDA dataset, to evaluate their system.With a prediction time of 0.0719 seconds and high accuracy that reaches 99.7, their suggested system is suitable for use in real-time applications as the authors stated.
Each learning algorithm has its advantages and drawbacks, not a perfect one at all; therefore, ensemble approaches may improve performance by aggregating different classification models so to boost up their advantages and to reduce their limitations.Stacking is an ensemble technique where a meta-classifier is used to combining the outputs of different classifiers to enhance the prediction process.This method produces powerful models that capable to improve classification accuracy.Yang et al. [35] proposed a detection approach based on stacking technique, which is an aggregation method that combines weak learners in order to produce a more powerful classifier.The base classifiers for their model are k-nearest neighbors (KNN), SVM, and random forest (RF), while a neural network (NN) model is used to aggregate them.The suggested approach is aimed to detect covert channel attacks that exploiting DNS.In comparison with many existing approaches, this approach was performing better.In the same context, Yang et al. [36] proposed a detection method based on stacking technique to detect domain name system (DNS) covert channel.Their results showed the effectiveness of their model to identify the targeted covert channels.The authors of this model stated that their model is capable for identifying even unknown covert channel traffic.Furthermore, Yang et al. [37] proposed stacking model to detect a DNS covert channel.Their model showed excellent performance compared to existing approaches, as the authors indicated.
Cassavia et al. [38] proposed a deep ensemble-based model to detect covert channels that exploits time to live (TTL) filed of IPv4 protocol.The model's ability to detect the presence of these covert channels

5299
was proved by the authors, who also emphasized that the model was built to be lightweight and only require a small number of training cases.Whereas, Li et al. [39] stated that stacking classifications models may be used as a more affordable alternative to deep learning classification approaches and they presented a stacking-based ensemble classification approach.In which, the employed primary classifiers (base classifiers) are KNN, LR, SVM, and naive Bayes (NB) whereas RF was used as a secondary classifier (metaclassifier).Their proposed model showed better performance compared to the other models in both middle and small size of dataset with acceptable time-consuming.
A review article reviewed some techniques of network steganalysis and steganography with focusing on their shortcomings and novelties.The article reported that more research work is required on network steganalysis [40].This is to fill the gap as the network steganography techniques have witnessed rapid increases as a result of the advanced development of network technologies.

METHOD
This research presents an ensemble classification model for covert channel attacks that are exploited network packet lengths to initiate illegal ways to exchange hidden data.In this type of covert channels, the covert sender exploits the packet length values to convey a covert message he/she wants to send.Odd represent 1 and even represent 0 or the opposite scenario; therefore, it may require to change the packet lengths to fit the covert message.The covert receiver watches the lengths of the received packets to decipher the covert message.

Dataset
A dataset has been created according to the above description of our target covert channel.Wireshark 3.4.2was employed to capture network traffic.Then, the recorded traffic was divided into two parts.The first part was kept as is to represent normal traffic.The Scapy 2.4.4 and Python 2.7.18 were used to modify the rest of the captured traffic to construct covert traffic.Accordingly, a dataset of 300 instances was developed.It contains 150 cases of normal traffic and 150 cases of covert traffic.

Data preprocessing and feature extraction
In this study, a bag of words method is used for feature extraction purposes which is a flexible way for feature extraction.This method is applied after the preprocessing phase.The bag-of-words method is a simplified representation of text using criteria such as occurrence of words (word frequency).It has been used in many fields such as natural language processing, and computer vision [41].Many recent works employed this method to enhance classification accuracy such as [42]- [46].The preprocessing phase includes applying a filter to consider only the most important tokens that have significant influence in prediction accuracy.Orange 3.28 was used to implement both phases, data preprocessing and feature extraction.Orange is a data mining package to analyse data [47].

The proposed classifier
The proposed ensample model uses a stacking strategy to aggregate the output of a group of classifies to enhance the prediction accuracy.It is commonly known that ensemble classifiers achieve a higher accuracy rate compared to single classification models.The base classifiers of our ensemble model were selected carefully based on their reputation in reflecting good performance.They include KNN, decision tree (DT), SVM, NB and RF, while LR classifier is used as meta-model to aggregate the outputs of the base classifiers.

Experiments and evaluation
The base classifiers of our proposed ensemble model were fully trained and tested using the created dataset that consists of 300 samples of normal (overt) traffic and malicious (covert) traffic.To build the ensemble model, LR classifier is used to combine the base classifiers whereas a cross validation method is applied for the model evaluation.The cross-validation method is a sampling method that uses different parts of the given dataset to train and test a machine learning model on different iterations.It is a resampling method used to evaluate machine learning models [48].Interested readers for more information on cross-validation methods are referred to [49].In our case, the 10-fold cross-validation method was used.90% of the dataset used for training, and the rest for testing.In the 10-fold cross-validation method, the dataset is divided into ten parts (folds).Then the model is trained and tested ten times.In each time, different fold is used for testing and the rest of the dataset for training.Each time the testing and training data are changed, which leads to a more comprehensive validation.The performance of our proposed ensemble model compared to individual classifies was examined by computing the confusion matrix that gives the number of cases that are correctly classified compared to misclassified cases.Furthermore, some other evaluation  ISSN: 2088-8708 Int J Elec & Comp Eng, Vol. 13, No. 5, October 2023: 5296-5304 5300 measures were calculated: the classification accuracy, sensitivity, precision, the area under the curve (AUC) and specificity were calculated.

RESULTS AND DISCUSSION
As it shown by Table 1, the reported results indicated that our proposed ensemble classifier achieved high accuracy reached 99.3% to detect packet length covert channel attacks.It outperformed all ensemble base classifiers (DT, KNN, SVM, RF, and NB).In addition, the ensemble classier reported high performance when considering other performance evaluation metrics that include AUC, recall, specificity and precision.Figure 1 visualizes the accuracy of the proposed stacking ensemble model compared to other classifiers, which clearly depicts that the new model has outperformed the others.The classification errors, that were evaluated using confusion matrix, are displayed in Table 2, which illustrates good performance of the proposed classifier by casing negligible classification errors that tend to zero.This confirms our assumption that ensemble models have the capabilities to improve perdition accuracy.More evaluation was done by computing the receiver operating characteristic (ROC) curves of all classifiers.Figures 2 to 7 show these curves for KNN, RF, NB, SVM, DT and the proposed classifier, respectively.ROC curves have proven that the proposed classifier outperformed all other classifiers by showing better performance.The performed experiments and the evaluation measures employed have proven that our proposed classifier has enhanced the detection accuracy of packet length covert channel.It achieved the high accuracy with the least classification errors compared to other classifiers.It is noteworthy to mention that RF classifier achieved good classification accuracy that reaches 98.7% followed by NB which achieved 97.7% but still our proposed classifier outperforms them by reaching 99.3% accuracy rate.

CONCLUSION
Machine learning technology has proven its usefulness in various fields, especially in information security.Many machine learning approaches have been applied to detect malicious activities and reflected high performance.In this study, an ensemble classification model was presented to discover the existence of covert channels that exploit data packet lengths to pass covert messages.This attack exploits the variations of packet lengths to leak secret information.It is an undetectable and dangerous attack which is capable of generating malicious traffic (covert traffic) that is typically look like normal traffic (overt traffic), and thus detection approaches would fail to spot the variations.The developed ensemble detection scheme achieved outstanding performance compared to the single classification models.Its detection accuracy reached 99.3% with neglected classification errors.Up to our knowledge, the proposed ensemble model has achieved a classification accuracy rate that outperformed all existing detection methods presented to discover such attack.
Int J Elec & Comp Eng ISSN: 2088-8708  An ensemble model to detect packet length covert channels (Muawia A. Elsadig)

Int
An ensemble model to detect packet length covert channels (Muawia A.Elsadig)

Table 1 .
Classification performance