An adaptive anomaly request detection framework based on dynamic web application profiles

Received Nov 15, 2019 Revised Mar 23, 2020 Accepted Apr 13, 2020 Web application firewall is a highly effective application in protecting the application layer and database layer of websites from attack access. This paper proposes a new web application firewall deploying method based on dynamic web application profiling (DWAP) analysis technique. This is a method to deploy a firewall based on analyzing website access data. DWAP is improved to integrate deeply into the structure of the website to increase the compatibility of the anomaly detection system into each website, thereby improving the ability to detect abnormal requests. To improve the compatibility of the web application firewall with protected objects, the proposed system consists of two parts with the main tasks are: i) Detect abnormal access in web application (WA) access; ii) Semi-automatic update the attack data to the abnormal access detection system during WA access. This new method is applicable in real-time detection systems where updating of new attack data is essential since web attacks are increasingly complex and sophisticated.


INTRODUCTION
Currently, web application security has been a hot topic for many researchers and internet service providers. According to the Symantec report [1], web-based attacks accounted for 10 percent of total malicious requests detected in 2018, and website security is a necessary issue at present. In [2], Mookhey presented the characteristics, compositions and operation principles of WA. Besides, some other works in [2][3][4] have shown several vulnerabilities and threats that attackers could exploit to attack the web applications. According to the surveys in [3] and [4], the vulnerabilities of the protocol hyper text transfer protocol (HTTP) or hypertext transfer protocol secure (HTTPS) are often preferable to be exploited by attackers. HTTP and HTTPS are the two most popular protocols to communicate for end users. Before returning the contents to display on the web browser, web applications process the content of the user requests. According to the standard described in [5], the structural components of an HTTP or HTTPS packet include request line, status line, header fields, message body, and some other components. Here, in order to attack web applications, attackers will try to change the content of these components, thereby creating a vulnerability in the process of processing request. As a result, the web applications return the outputs as attacker desired.
So far, there are two main methods of detecting web application attacks: signature-based methods based on a set of predefined rules, and anomaly-based methods that rely on data analyzing and statistics to find abnormal characteristics in the requests. Both signature-based and anomaly-based methods have certain In Figure 1, it can be seen that in normal requests, all accepted values are "text/html,application/ xhtml+xml,application/xml;q=0.9,*/*;q=0.8," while in operations that are not like normal users, may be due to the scanner tools or intentional change of attacker, the accepted value is set to "*/*" (accepted all). This is an important characteristic to determine the abnormal requests in Figure 1

5337
"*/*" in the whole website is considerable greater than other types of requests. In this situation, the abnormal characteristic of the abnormal request will be ignored and that request is decided as normal. On the contrary, FP presents normal request missed classified as abnormal ones. For example, in case the website has a form filling function and when the content of the form contains the keyword found to be abnormal based on rule-based systems like Mod-security, it will be blocked. However, when that content is acceptable for the website, the administrator needs to edit the filter to add an exception to this case and needs to enrich the rule list. This process sometimes is complicated and can make web application protection less effective. The issues discussed in a) and b) are our practical experience when implementing different unusual access detection models. The effectiveness in practice of previous studies in detecting unusual requests is not high. In fact, when those approaches are applied on abnormal data using ModSecurity tool [8], the highest recall score is only about 30%. In order to overcome the weaknesses that have been pointed out, in this paper, a new method to build a web firewall based on Dynamic Web Application Profiling (DWAP) analysis is proposed. DWAP is a method to summarize the characteristics of a specific website's URIs. These properties include methods (GET, POST), headers and parameters of the URIs. Based on the DWAP application analysis, the following contributions are presented in this paper:  Applying DWAP to abnormal detection systems. The problems discussed above can be solved if the detection model is trained using each URI since the variable values generated from each URI are no longer spare and anomalies are easily recognized if the abnormal feature described in the example in b) is detected. Moreover, by developing a separate model for each URI, it is possible to extract new features presenting the characteristics of the method, header and parameters of each URI. This could not be done in previous approaches. These features fulfill the ultimate purpose of DWAP application that is to optimize the abnormal request detection on each URI.  Apart from applying DWAP to detect abnormal requests, a real-time optimization for model update method is also presented. This issue plays a very important role in the web attack detection model as well as anomalous access detection based on anomaly-based algorithms. However, previous works did not pay much attention to this problem. All current security applications need to be constantly updated to accommodate new attacks. That is the main reason why Mod Security is still a popular security detection tool today because its rule system is kept up to date and maintained by community contributions. It can be seen that anomaly-based models need to be trained based on the data from the specific concerned website. In fact, the number of unusual requests is much smaller than normal requests, which generates a burden job for administrators in composing training data. In order to tackle this problem, a request grouping method is proposed to support the data classification process. This method can help reduce administrator's data composing time by 50-70%, thus making our proposed anomaly-based detection model easy to deploy in practice. Experimental results on the same data set show that there is a significant improvement in detection performance of our method. The recall index of the new approach can reach 90%. The following content of the paper is organized as follows. Section 2 presents all related works on abnormal request detection techniques. The newly proposed method is presented in section 3. Section 4 introduces some main applications of the new framework. Experimental results and all discussions are included in section 5. Section 6 concludes what have been done and discusses some suggestions for the future works.

RELATED WORK 2.1. Web attack detection research
There are two main types of web attack detection systems. The classification is mainly based on the detection mechanism of the methods.  Signature-based methods [3,4]: this is a well-known approach and has been investigated by many researchers. So far, the research community of web attack detection has built up a complete Core Rule Set [9] to support network users. Currently, the Core Rule Set is used in most of the web firewalls [3].  Anomaly-based methods: there have been many different anomaly based approaches on network security.
One of those approaches is based on the manual feature extraction techniques. Shi and et al. [10] present a list of features for queries that include URI's properties, such as length, quantity, type and dangerous levels of each feature. After that, they applied Naïve Bayes, Decision tree and SVM algorithm on those features to detect abnormal requests. Aother approach is based on the natural language processing. Zhang M and et al. [11] introduced a method that uses CNN to classify the attacks. Word2vec model is used to transform the raw request into a matrix, and then a CNN is adopted to extract request's features. The research [12] introduces another approach using Gated recurrent unit (GRU) to analyze the contents of the requests. Every character in the request is converted into a one-hot vector with 129 dimensions, and every cell in GRU is used to analyze this request's content. Yang [13] also attempts a similar method that uses GRU to classify requests. In this research, he uses an encoding method which reconstructs a character into a 2-dimensional matrix. The authors of the research [14] use N-gram and Generic Feature Selection algorithms to extract features from DARPA and ECML/PKDD2007 datasets [4]. In order to detect abnormal requests, they applied some clustering algorithms like C4.5, CART, Random forest or random tree. Aside from applying anomaly-based to detect abnormal requests in general, there are also some other researches focusing on detecting some common attacks on web application [15,16]. In particular, Nagarjun and Ahamad [17] presented an attack detection method based on image processing technique to detect special characters that represent XSS attacks. Yong Yang [18] introduced an approach to detect anomalies by analyzing the sequence of web access behaviors. In addition, Jagdish et al. [19] designed an anomaly detection system in E-commerec systems based on features showing business characteristics such as price, goods, etc. Thes features are also adopted in this paper, but at a more general level and the extraction process of these features is implemented automatically.

Data updating and monitoring research
To overcome data imbalance problem in the training process as well as in the abnormal request detection process, there have been some researches and proposals. Hu Y [20] proposes a human-machine system to improve detection models. On this system, the role of the expert is to re-classify the data after running the unlabeled classification. The author uses K-mean to classify the dataset into two groups and selectes a certain percentage from those 2 groups to reclassify. In the research [21], Dong and et al. presente a solution to reclassify requests which are not in the boundary trained by the SVM algorithm.

Feature extraction in DWAP analysis
The request's features in DWAP analysis method are built to detect the abnormal requests at the component level. By analyzing on each URI, every request's component like header or set of parameters is deeply analyzed. In order to do that, the feature set is divided into 2 groups. The first group is used to look for abnormal characteristics appeared in the attacks. The second group is exploited to analyze abnormal content in each component of the request.

Malicious keywords feature a. Attack keywords
Keyword is the main identification characteristic of some types of attacks. For example, in SQL Injection attack, the attackers try to find a way to insert their SQL queries into the data sent to server. The appearance of those keywords in the request is a sign to determine whether a request is an attack or not. The keywords listed in Table 1 are summed up from OSWAP's document about web attacks [9]. In those attacks, the keyword is the most important component to insert illegal queries. These keywords are put into OSWAP's rules to detect the attacks. However, the evaluation based only on the appearance of the keyword may lead to the incorrect warning because some websites allow the existence of those keywords. Therefore, when building the URI's features, the appearance of these keywords as a feature should be combined with other factors in order to successfully conclude whether the request is an attack or not. Table 1 summarizes some Malicious keywords.

b. Anomaly non-letter
Non-letter features are one part of the attack signs when they accompany with the keywords shown in Table 1. In order to insert malicious keywords, attackers must find the way to pass the application's compiler. For example, they may insert comments or characters to deceive the input structure. In this paper, the frequency of those characters in the queries is used to verify the input information. These character groups are listed in Table 2.

Anomaly request content feature
In previous researches on abnormal access detection, the content values of each request component are not thoroughly analyzed. The main reason is about the difference between the functions of different URIs. Each URI uses different configuration for its header as well as its query values. This causes difficulty in proposing an effective method based on analyzing multiple URIs to detect anomalies such as exploiting logic errors or detecting requests from scanning tools. In this paper, a set of new features extracted through the content analyzing process for each component of a request is utilized for anomaly detection process. Using this feature extraction helps review and evaluate the values of the request components in details and clearly. For instance, in real scenarios, abnormal requests can be identified if there is a strange URI (such as webshell), or a strange field in the header (such as X-Forwarded-For used to bypass firewall), or a new parameter that seldom or never appears in requests by normal users. The proposed features can be used to build the profile tree. Further, this tree is used to detect abnormal requests by comparing their contents with the data in the tree. If there is any mismatching between the requests and the profile tree, those requests may be considered as abnormal. This is the improvement of the DWAP analysis based method compared with previous methods. a. Anomaly header value Headers are important and are frequently changed targets. In this paper, four main header fields are investigated, which are Content-Type, Accept, Accept-Charset, Accept-Encoding. The values of those headers are extracted and normalized to form a vector. In order to facilitate the feature classification process, the features are divided into groups as below:  Group 1: including the values of Content-Type [5] and Accept [5]. Its structure contains type and sub-type. The value of this type and subtype are compared with that of type and subtype list in normal requests. Methods and procedures of this investigation are described in Algorithm 1:   User requests usually contain important information for web-server to process. The content of the request may be presented in a form of structured data such as the query in GET method or the payload in POST/PUT method, or unstructured data like documents, files, etc. For structured data, the values of length, ratio of letters and numbers in each input string are extracted. Additionally, the existence of abnormal parameters is also checked. Methods and procedures to to examine anomaly parameters are described in the Algorithm 3 above.

APPLICATION OF DWAP ANALYSIS ON WEB APPLICATION SECURITY 4.1. DWAP analysis for anomaly request detection
Based on the features obtained from the DWAP analysis technique applied on the request component presented in Section 3.2, further processing steps are needed to discriminate normal accesses from abnormal ones. In this paper, Random Forest classifiers [22] are adopted to distinguish between abnormal and normal requests. Random Forest is an ensemble classification method [23]. This algorithm is based on an ensemble of classifiers, which normally are Decision Trees to make the final prediction. The theoretical foundation of this algorithm is based on Jensen's inequality [24]. According to Jensen's inequality applied to the classification problems, it is shown that the combination of many models may produce less error rate than that of each individual model.

DWAP analysis for constructing training datasets
The main characteristic of the abnormal request detection method using DWAP analysis method is that it does not use existing datasets for training data, but it utilizes the data of the deployed website. In fact, the number of anomaly requests is much smaller than that of normal requests in the whole dataset. As a result, it is necessary to have suitable sampling methods and techniques to create a good training dataset that helps theabnormal request detection process become more effective. From this point, a new sampling method based-on DWAP analysis technique and unsupervised learning algorithm is proposed. This method firstly divides the data into different clusters. Then, it selects requests from the newly divided data clusters. The combination of the DWAP analysis technique and the unsupervised learning algorithms not only ensures the randomness of sampling, but also increases the rate of abnormal requests that appear in the sample data. Consequently, this helps generate a more balanced training dataset, and reduce time and effort to search for abnormal requests. The proposed method can be summarized as follows: Step 1: Data clustering: this step aims at aggregating requests that have similar characteristics. Data Clustering is known as a method to gather correlated observations in to separate groups. This method has been adopted by Riyaz [25] for deployment on large databases and has shown that practical applications of these clustering algorithms are promissing. Since the features are extracted such that they can distinguish between normal and abnormal requests, the clustering process of these features not only separates normal and abnormal requests, but also classifies the attack requests in different forms. The remaining issue is to find the optimum number of clusters for the data. In this paper, K-mean algorithm is adopted for clustering task. This clustering method is based on the minimization of the distances from all data points within each cluster to the cluster centroid [26]. In order to find the number of the clusters for the K-mean algorithm, the Elbow method is used. This method is based on the graph presenting the correlation between the total distances from all data points in each cluster to their cluster centroid and the number of clusters. The Elbow criterion is met when the number of clusters N is chosen such that the ratio between the total distance with N groups and that of N+1 groups is smallest. The Elbow method is summarized as follows:  Let ΔSSEi is the total sum of squared error distances of i clusters  Let ri is the ratio between ΔSSEi and ΔSSEi+1  The optimal number of cluster N corresponds to the smallest ri: Step 2: Sampling data from clusters: the process to take M samples from N groups:  If the number of samples in one particular cluster is smaller than , then all samples in that cluster are selected. The reason for this is that the number of abnormal requests is very small compared to normal ones, and due to anomaly characteristics, they are usually not in the same category as normal requests. As a result, abnormal requests tend to be separated in small clusters. So after this process, the remaining number of samples need to be taken is M1 and the remaining number of clusters is N1.  Repeat (i) with the number of samples need to be taken as M1 and cluster number is N1. The sampling process will end after i iterations when the numbers of samples in all remaining clusters are greater than the ratio .
 From each of remaining clusters, samples are randomly selected.
The whole process is presented in Algorithm 4: sampled_data <-sampled_data + random choice M/N data in cluster 13: return sampled_data Discussion: If the rate of the abnormal request, K1, in one dataset is very small, i.e. K1 << 1, then among M selected samples, the rate of anomaly request is still K1. Moreover, anomaly requests are usually separated from normal requests after the clustering method. Although there is no guarantee that all data samples in each cluster have the same label but if K1 << 1, there is a great chance that the almost all number of anomaly requests are selected from small clusters. As a result, the clustering method combined with sampling algorithm proposed in this paper can efficiently filter out almost all abnormal requests, which can help reduce the building time of the dataset for the DWAP analysis. This sampling process has more advantages than random sampling approach.

Dataset
In order to evaluate the efficiency of the proposed algorithms, two 2 datasets are used. -Dataset 1: The first dataset is CSIC 2010 [7], which is developed by Carmen. The dataset includes about 36000 normal requests and 25000 abnormal requests. Since most of the samples in the CSIC 2010 dataset are attacking requests, it may not be suitable for evalulating the detection of the abnormal requests. CSIC 2010 dataset is filtered and divided into 8 main URI groups as presented in Table 3.  Dataset 2: The second dataset is made by using some security tools like Acunetix, Burp Suite, SQLMap to scan the vulnerabilities from our prototype websites. Those scanning tools search and exploit vulnerabilities in both the query and the request's headers. The collected data is classified following the defined criteria in the previous section. Besides the abnormal requests collected by scanning tools, we made some normal requests by normally operate on the same URIs. Each URI contains 5000 normal requests and 5000 abnormal requests.

Classification measures 5.2.1. Evaluation criteria to detect abnormal request
In this research, three evaluation metrics are used as follows:  Precision is defined as the ratio between the number of true positive alarms (TP) and all the samples classified as positive (TP + FP). The higher the precision score, the more number of positive alarms are correct.
 Recall is defined as the ratio of true positive alarms among all samples that are actually positive.
 F1-score is the harmonic mean of precision and recall.
where, TP -True positive: is the number of records that are correctly labeled as "abnormal requests"; FN -False negative: is the number of records that are actually "abnormal requests" but are classified as "normal requests"; TN -True negative: is the number of records that are correctly labeled as "normal requests"; FP -False positive: is the number of records that are actually "normal requests" but are misclassified to "abnormal requests".

Criteria for evaluating the effectiveness of applying DWAP analysis for constructing training dataset
In order to evaluate the efficiency of the sampling method in the construction of the training data, the imbalance in sampling process between the random sampling method and the newly proposed sampling method. This value is expressed by parameters K1 and K2 as follows:  The parameter K1 represents the proportion of abnormal requests in the data recognized by random sampling methods.  K2 is the ratio of abnormal requests in this proposed sampling method. In this paper, these two values are compared with respect to different K1 values and the number of selected number of samples M.

Experimental results and comments 5.3.1. Criteria experimental scenarios and experimental results for detecting abnormal request a. Experimental scenarios
The efficiency of DWAP analysis techniques in detecting abnormal requests using Random Forest clustering algorithm is evaluated based on both datasets described in section 5.1. All three performance metrics are recored. Each dataset is divided into two subsets: the training data containing 80% of the dataset is used for training the classification model; remaining 20% of the data is used for testing. The number of trees for Random Forest algorithm is set at 300. b. Experimental results and comments Experimental results of using DWAP analysis technique to detect abnormal requests on datasets 1 and 2 are shown in Tables 4 and 5. The results in Table 4 show that using DWAP analysis techniques can accurately and efficiently detect abnormal requests. In particular, Precision scores across all data range from 99.46% to 100%. This result shows that the positive alarm of this method is very reliable. Table 5 shows that even when the dataset contains a higher percentage of normal request as illustrated in dataset 2 the new DWAP analysis technique is still highly effective, while traditional toolset using ModSecurity rules are not efficient. Specifically, recall value of the toolset is just 30% while that score of the proposed method is more than 90%. Besides, the new method can obtain perfect precision score on all URI sets. The F1 score of the proposed method is also much higher, which is over 95%, compared to the toolset. The results shown in tables 4 and 5 demonstrate that DWAP analysis techniques are not only able to efficiently detect requests attacks, but they are also capable of correctly detecting abnormal requests. This factor plays an important role in sampling process such that the abnormal request distribution in the dataset is optimized. Figure 3 presents the clustering results based on the value of the SSE of the clusters. From Figure 3 it can be seen that the SSE value varies a lot when N = 2 and N = 3, resulting to the ratios r2 and r3 are almost equal to 1. When N = 4, the variation of SSE decreases significantly, so is the value of r4. SSE returns to a little variation when N > 4 resulting to r5 and r6 are almost equal to 1. Therefore, N = 4 is chosen as the cluster number of the data. Figure 4 illustrates the distribution of normal requests and abnormal requests in each cluster after the K-mean algorithm. The data is classified into 4 clusters and the distribution of all labels are shown on the graph. The ratio of abnormal requests in this scenario is K1 = 31%. When applying the proposed sampling method with M = 1000, the percentage of abnormal requests in the sampled data reaches K2 = 71%. The results show the effectiveness of the proposed sampling method compared to random sampling method. Additionally, the comparison between the distribution of anomaly requests in the data of both methods are also recorded. Table 6 shows the change in K2 value when the value of K1 varies from 1 to 30%. The result of Table 6 shows that K2 is greater than K1 in different K1 distributions. From the results in Table 6, it can be sees that the distribution of anomaly request in the proposed data is higher than that of anomaly request obtained by the random selection method. On average, the ratio of K2 is 1.5 to 2 times higher than the ratio K1. It also shows that when the DWAP analysis method is combined with the proposed sampling method, the ratio of detected abnormal requests in clustered data is much higher than using the random sampling method. Table 7 shows some changed results of K2 when the number of sample M changes. Based on the results of Table 7 it can be seen that when reducing M, K2 tends to increase because the number of items taken in each group will be smaller. As a result, the number of requests retrieved in each group will decrease. This leads to the possibility that there are some small groups taken entirely when they have less than elements. At this time, although the number of selected anomaly requests is reduced, the number of selected normal requests is much more reduced. Consequently, the proportion of the anomaly requests will increase. It means that these groups are treated fairly with normal request groups. When the number of clusters increases, the number of anomaly request groups also increase. This results in the ratio of selected anomaly requests from the dataset approaches the proportional of the number of anomaly clusters over the total number of clusters. The best value M is chosen as:

M=Number of cluster * MIN(number requests of each cluster)
From formula (6), can see that the DWAP analysis technique can help extract important features for clustering model to separate anomaly clusters. This is because when reducing M, the ratio of K2 increases proportionally to the ratio of anomaly clusters.

CONCLUSION
Anomaly access to web application detection is a challenging problem to ensure information security. Web application attack techniques always seek to transform and hide themselves to bypass the monitoring of web firewalls. This paper has proposed a method to extract request features using DWAP analysis techniques. Experimental results show that the proposed method has many advantages over traditional approaches. The proposed method works well when it does not try to find anomaly information by comparing the requests among all datasets, but it puts this comparison into local data. Specifically, the comparisons between the values of the requests are made within the same URI. Experimental results not only show the significant improvements in the detection accuracy of the new model, but also show that the DWAP analysis method can be applied in many different areas with its ability to extract correct request's characteristics.