Detection of the botnets’ low-rate DDoS attacks based on self-similarity

Received Aug 4, 2019 Revised Jan 16, 2020 Accepted Feb 1, 2020 An article presents the approach for the botnets’ low-rate a DDoS-attacks detection based on the botnet’s behavior in the network. Detection process involves the analysis of the network traffic, generated by the botnets’ low-rate DDoS attack. Proposed technique is the part of botnets detection system–BotGRABBER system. The novelty of the paper is that the low-rate DDoS-attacks detection involves not only the network features, inherent to the botnets, but also network traffic self-similarity analysis, which is defined with the use of Hurst coefficient. Detection process consists of the knowledge formation based on the features that may indicate low-rate DDoS attack performed by a botnet; network monitoring, which analyzes information obtained from the network and making conclusion about possible DDoS attack in the network; and the appliance of the security scenario for the corporate area network’s infrastructure in the situation of low-rate attacks.


INTRODUCTION
Nowadays the cybercriminals implement different ways to obtain the profit from the legitimate businesses, which have become theirs target. Malware are one of the most powerful cybercriminals' tools for attaining such goals [1][2]. One the type of the malicious action against the users' computer systems, cloud infrastructure the distributed denial-of-service (DDoS) attacks-the attempt to disrupt normal traffic of a targeted server, service or network by overwhelming the target or its surrounding infrastructure with a flood of Internet traffic [3].
In the modern cyber world a botnets are the main tool for performing of such type of attacks [4]. The bots of botnets are compromised devices designed to attack a single server, network or application with an overwhelming number of requests, packets or messages. A low and slow attack is a another type of DoS or DDoS attack that relies on a small stream of very slow traffic with requests which can target application or server resources, thereby preventing genuine users from accessing the service. To carry out low and slow attacks cyber attackers can use HTTP headers, HTTP post requests, or TCP traffic.
Unlike a brute-force attacks, the low and slow attacks require very little bandwidth and can be hard to mitigate, as each bot is a legitimate Internet device and generate by them slow attack traffic is very difficult to distinguish from those of legitimate clients [5][6]. One of the way of the low-rate DDoS attacks detecting is the traffic analysis concerning its self-similarity of traffic. This method allows identifying the hidden malicious traffic in real-time. and are clustered and a result of the clustering is the assignment of each feature vector to a cluster, which is corresponding to a given cyberattack.
The low-rate DDoS attacks identification based on the traffic self-similarity analysis is the part of botnets detection process performed by a self-adaptive system-BotGRABBER system [25]. It presents the framework for assuring the networks' resilience under the botnets' cyberattacks. In order to detect the botnets its main features are to be gathered and analyzed. The features are formed as feature vectors and are clustered and a result of the clustering is the assignment of each feature vector to a cluster, which is corresponding to a given cyberattack.
This article presents detailed description of the botnets' detection process, which is based on the traffic self-similarity analysis, as this factor may indicate its presence in the network. Thus, the low-rate DDoS attacks detection includes learning and monitoring stages. a. The learning stage consists of the following steps:  Knowledge formation based on the features that may indicate low-rate DDoS attack performed by a botnet;  Presentation of the knowledge about the low-rate DDoS attack as a set of feature vectors;  Labelling the obtained feature vectors of the low-rate DDoS attack for the purpose of clusters' formation, where each cluster corresponds to some type of the low-rate DDoS attack. b. The monitoring stage includes the following steps:  Gathering the inbound and outbound network traffic;  Construction of the feature vectors based on the information obtained from the network, based on the botnet's features and the self-similarity of the traffic, generated by the botnets' low-rate DDoS attack. c. The detecting stage includes the semi-supervised fuzzy c-means clustering of the obtained feature vectors for the purpose of its assignment to one of the clusters and choosing the proper security scenario for the attacks mitigation. d. The appliance of the security scenario for the corporate area network's infrastructure. The subject of this paper is to present the approach for the botnet detection of the low rate DDoS attacks via the BotGRABBER system. Let us discuss this step in detail.

Presentation of the knowledge concerning the botnets' low-rate DDoS attacks the as the set of feature vectors
Let us define the of features, which are to be analyzed to identify the above-mentioned botnets' low-rate DDoS attacks as , where x 1transmission protocol; x 2an average payload length per connection; x 3a number of a different size of packets transferred to a total number of frames per connection; x 4a total number of bytes per connection excluding the header; x 5a total number of bytes transmitted per connection; x 6a duration of the connection; x 7a number of bytes transmitted from origin to destination; x 8a number of packages transmitted from origin to destination; x 9a boolean feature that indicates whether the inbound traffic as an associated outbound traffic record; x 10a duration of the connection, observed from the earliest of the associated inbound or outbound traffic until the end of the latter traffic; x 11total size for the session in bytes; x 12total number of packets in the session; x 13self-similarity of the outbound/inbound packets in the session, determined by examining the variance in size of the outbound/inbound packets using the Hurst coefficient; x 14velocity of outbound/inbound traffic measured in packets per second; x 15velocity of outbound/inbound traffic measured in bits per second; x 16velocity of outbound/inbound traffic measured in bytes per packet; x 17standard deviation of packet size within the session measured in bytes; x 18 invalid values of TCP flags seen in this session; x 19 the ratio of the number of most common packet.
All aforementioned features are the base of the set of feature vectors = { } =1 , = 19, where each of feature vector x k describes the specified low-rate attack, N-the number of the feature vectors. The main feature, that indicates the presence of the low-rate attack, is the self-similarity of the outbound/inbound packets in the session, determined by examining the variance in size of the outbound/inbound packets using the Hurst coefficient.

A self-similarity of network traffic and the Hurst coefficient
The main point of the botnets' low rate DDoS attack detection is assign the malicious traffic from legitimate one taking into account the self-similarity features of the attack and normal traffic. For this purpose, the proposed technique estimates the self-similarity features based on H values are used. In general, network traffic can be represented as a fractal-a figure whose small arbitrarily enlarged parts are similar to the base one. In other words, a certain object can be considered as self-similar if there is an exact or approximate coincidence of such an object with a part of itself. A network traffic is able to have the property of self-similarity. It can be manifested as the frequency of received data packets in different time scales, which at different scales looks like a fractal. Because of the self-similarity is a random process the selfsimilarity degree can be determined by the Hurst coefficient, which is able to analyze the time series during which network traffic was gathered.
In general, if the coefficient H takes a value of 0.5, this indicates that the events are random and there is no long-term dependence between them. In this case, network traffic is not self-similar. If the coefficient H takes values from 0.5 to 1, then this means that the observed time interval is a continuous series of time. Furthermore, the higher the value of the coefficient H, the greater the degree of long-term relationship between events and the greater the degree of self-similarity is observed. When the Hurst coefficient is close to value 1, network traffic takes the maximum value of the degree of self-similarity, which means that with any time series scaling, the frequency of data packets will receive the most similar form. This value is defined as a function of the time interval of the time series as follows: ( ) is the range of the first n cumulative deviations from the mean; ( )a standard deviation; [ ] is the expected value;the time span of the observation; is a constant.
For the most accurate determination of the Hurst coefficient, the time interval should be sufficiently large. Therefore, the effectiveness of detecting low rate DDoS attacks based on the traffic self-similarity significantly depends on the time interval value during which the network traffic collection and analysis was carried out. In order to evaluate the network traffic self-similarity let us define it as a random process, which can be divided into the discrete time intervals as follows = ( 1 , 2 , … ). If the time intervals are equal to n, then this random process will have the form ( ) = ( 1 , 2 , … ), whose components are determined by the formula: To describe the dependence of random processes and ( ) , let us determine the correlation coefficients ( ), which describes the dependence of the process and the correlation coefficient ( ), which describes the process ( ) . In the general, the process can be considered as self-similar if the Hurst coefficient takes values from 0.5 to 1 and the equality is be fulfilled: In this case, the self-similar process is very similar to the process ( ) , since the correlation coefficient ( ) isn't changed after the time scaling of length is performed. This means that the frequency of the received data packets for a certain time interval takes approximately the same form after the scaling was carried out.
In order to determine the Hurst coefficient, let us divide the length of the network traffic for fixed time intervals. To describe the time of arrival of traffic, let us define the time domain , which is considered as an independent variable for the analysis of time phenomena. Obtained time intervals Xi are described as follows = ( | = 0,1,2, … ), where is the total time of the traffic monitoring. Let us define the mean value of the packet receiving frequency as . The description of the value of the difference between the maximum and minimum frequencies for each of the time intervals can determined as the function ( ) as [26]:

3655
To describe the average deviation of the data packets frequency from the mean value of the frequency, let us determine the mean square deviation ( ), which is determined by the formula: In this case, the ratio of ( ) ( ) becomes: where Нthe Hurst coefficient, сconstant. Then the Hurst coefficient will be evaluated as follows: In the general, in order to determine the self-similarity rate, we are to calculate the value of the function ( ) and the standard deviation for each of the time intervals of the length . Further, for each of the time intervals the ratio It is also worth mentioning, that the increasing the value of , leads to recalculation of the formula (8) and the Hurst coefficient using the formula (7), since the change in the number of investigated time slots leads to the recalculation of the Hurst coefficient, and the new value of the degree of self-similarity of traffic. The constructed feature vector, which includes the value, is to be clustered the semi-supervised fuzzy c-means clustering, where each cluster corresponds to the specified cyberattacks (and the security scenario to be applied) and one cluster corresponds to the absence of the attack [26][27].

Evaluation settings
In order to evaluate the efficiency of the approach for the detection of the botnets' low-rate DDoS attacks, a detection accuracy tests using real world network traffic were carried out. For this purpose, a Slowloris and R.U.D.Y attacks [4][5] were employed. The main abilities of the tools are the generating malicious low-rate attacks. On other hand, experiment included generated real traffic that mimics users' behavior (e.g. SSH, HTTP, and SMTP) using the malicious traffic dataset [28].
To carry out experiments, the university local area network of hosts including 50 hosts (hosts with Microsoft Windows operating system), one dedicated server (Linux openSUSE operating system with nginx HTTP server) and network devices (MikroTik CCR1009-8G-1S-1S+PC routers) was employed. Network traffic was captured by the means of tcpdump utility. All experiments were organized in real time and real networks, and lasted during from several seconds to one hour. To carry out the experiments, an attack on the mentioned web server was attacked by different attacks with different set of parameters.
The main parameters of low-rate DDoS-attack (e.g. R.U.D.Y. attack) are: a number of network connections to the server; a value of the Content-Length field of the corresponding POST HTTP requests; a frequency of sending packets from each open connection. The parameters of DDoS-attacks as in the case of the R.U.D.Y. attack used for conducting experiments are presented in Table 1. The set of parameters involved into the traffic self-similarity detection are:  Total time duration, T, sec;  Number of the time intervals, I;  Number of the data packets in each time interval, k 1 …k i ;  Scaling coefficient, с.

Results
The result of the experiment, which include different sets of parameters for malicious traffic samples are presented in the Table 2. As a data samples of the low-rate DDoS attacks the traffic samples, which include the self-similarity property, were used. Examples of the results for five different samples are presented in the Table 2. The results of the experiment, demonstrated that the obtained values of the Hurst coefficients for different malicious samples varied depend on the different parameters. Thus, the number of the time intervals affected greatly.
The largest values for the Hurst parameters were obtained when the total time duration was higher. At the same time, the influence of the Hurst coefficients de-creased when the total time number was lower and number of the time intervals was higher. For mentioned five samples the highest values of the Hurst coefficients were in the range of 0.713..0.804. It indicated that the traffic generated by low-rate DDoS attacks was self-similar, and it had made it possible to detect the malicious traffic of data packets among normal traffic. The results of the low-rate DDoS attacks detection via BotGRABBER with and without network traffic self-similarity analysis s presented in the Table 3, where the overall accuracy is 97.46% and 90.06% respectively. Thus, proposed approach is acceptable for its involvement into the BotGRABBER botnet detection system as the engine for low-rate attacks detection unit.

DISCUSSION
As BotGRABBER system involves the network traffic self-similarity analysis for the botnets' low-rate a DDoS-attacks detection there are several factors, which may affect the prediction accuracy. One of them is the diversity of training samples. Most conspicuously, that not all possible feature vectors, that describe different low-rate a DDoS-attacks, are may be adequately represented in the training set. Thus, system may be further improved by choosing more refined set of malicious traffic samples for different types of the low-rate DDoS-attacks.
The experiments demonstrated, that the BotGRABBER is able to achieve acceptable detection results, but the efficiency of the detection may be decreased in because the traffic flow of some attacks is very similar to users' ones and some of botnets' features were not taken into account for the detection process. On the other hand, the main to detect low-rate a DDoS-attack the system has to evaluate result taking into account several parameters with different values real time (various values of the total time duration, the number of the time intervals, the number of the data packets in each time interval, change the scaling coefficient, which leads to computational growth.

CONCLUSION
The article presents the approach for the botnets' low-rate a DDoS-attacks detection based on the self-similarity of network traffic. Proposed technique is the part of bot-nets detection system-BotGRABBER system. The novelty of the approach is that the low-rate DDoS-attacks detection is based on the network analysis concerning its self-similarity which is defined with the use of Hurst coefficient and the features inherent to the botnets. Experimental research demonstrated, that the Hurst parameters for the network traffic self-similarity analysis (the range of 0,713..0,804) were defined correctly, that made it possible to detect the low-rate a DDoS-attacks with high accuracy. Experimental research proved that the involvement of the network traffic self-similarity analysis is able to increase the botnet's detection efficiency up to 97%.