Survey on detecting and preventing web application broken access control attacks

ABSTRACT

The rest of this paper is divided into five sections.In section 2, we explore various categories of existing countermeasures.Section 3 analyzes the relevant literature and its contribution and limitations.Section 4 includes a discussion of gaps and potential research directions.The paper concludes in section 5 along with outlining future work direction.

COUNTERMEASURES FAMILIES
Research in mitigating broken access control vulnerability risks can be categorized into three major categories.The first category focuses on preventing the creation of the vulnerability itself in the coding phase in addition to detecting and fixing the vulnerability through code review activities.The second category aims to detect the vulnerability in the running application and make the needed effort to mitigate it.The third category targets detecting and preventing the exploitation of existing broken access control vulnerabilities in the running applications.
The existing literature has proposed research to secure web applications from logic flaws during the different phases of the software life cycle.The researchers suggest a wide diversity of coding practices recommendations during the development stage (e.g., in [11], [12]) in addition to secure web frameworks suggested as in the following references [13]- [17] to address the security aspects from different perspectives, some frameworks functions are providing the developers with a set of tools to implement the security policies (e.g., [18], [19]) in development and architecture phases.Nevertheless, some frameworks functions exist to leverage the auditing and authorization capabilities.Source code review plays a crucial role in detecting security flaws in software systems including BAC vulnerabilities.Studies (e.g., [20]- [22]) found that source code review while utilizing automated source code review tools can effectively detect multiple security issues categories including broken access control.These studies emphasize the importance of incorporating source code review into the software development life cycle to improve the security of the application and strengthen it against potential exploitations.
Security testing is one of the most effective and practical techniques for identifying vulnerabilities in software solutions including broken access control vulnerabilities.Studies (e.g., in [22]- [27]) found that security testing can effectively find BAC vulnerabilities as it analyzes the behavior of the running system and evaluates the implementation of access control policies and applied controls and security levels.Other studies [28] discussed the use of security testing in combination with code review alongside threat modeling to identify BAC vulnerabilities in web applications; nevertheless, studies emphasize the value of the usage of automated security testing tools for effective and automated detection of broken access control vulnerabilities as it can provide a thorough assessment and analysis of the target application and help identify issues that may be overlooked by manual testing.
Limited research exists to prevent the exploitation of broken access control vulnerabilities in the running applications during runtime focusing on exploit and attack prevention instead of detecting the vulnerabilities.The research in this area used different techniques and methodologies to distingue the application's benign and malicious requests.Research in these areas aimed to learn application valid invariants and states to reject invalid states (e.g., [29]- [31]).This category of research aims to overcome the limitations in Blackbox testing and Whitebox testing such as including its requirements for cost and resources for applying the needed mitigations, in addition, some research in this category targeted to addresses the code access requirement constraint.
Research directions and approaches related to the usage of frameworks and coding can be considered ineffective by design as they require to be enforced and used at the early development stage as they cannot be used with running applications.They require the source code existence in case a refactoring approach is needed, it also only deals with the vulnerability existence, not the active exploitation prevention.Additionally, the research focused on static code analysis [32] has a major limitation as it requires the source code as well which is not valid in most cases since a wide percentage of the applications is closed source and source code owners are not willing to provide for multiple reasons, also it requires time to fix and deploy putting the whole effort on developers, and it is a technology and programing language highly dependent which make it inefficient in most of the environments which usually have diversity of applications and technologies and it still facing high false positive rates.In addition, it is considered a detection tool for the vulnerability existence, not the active exploitation prevention Dynamic analysis solutions such as DetLogic [23] and LogicScope [24], along with similar models, aim to detect vulnerabilities in running applications that are limited by design.These methods are heavily ISSN: 2088-8708  Survey on detecting and preventing web application broken access control attacks (Ahmed Anas) 775 dependent on the assessor's ability to efficiently crawl the application during the learning phase.As a result, any areas, pages, or functions not explored by the assessor will not be assessed or protected.Furthermore, it requires additional costs and resources to implement necessary fixes for the discovered vulnerabilities.There is no guarantee that closed-source providers and product manufacturers will rectify the issues in a timely and acceptable manner.
Other approaches exist as well to support defending against logical attacks from different specific perspectives such as the requirements engineering language for adaptive information security (RELAIS) approach [33].RELAIS is designed to address a specific issue related to parameter tampering attacks, where it is not feasible to establish all defensive mechanisms during the design phase.RELAIS utilizes the knowledge gained during system runtime and failures in specific about the environment along with applying machine learning techniques like Bayesian classification, logistic regression, and input approximation techniques to only extract requirements and determine behavior to identify needed adaptation to be implemented later.

ATTACK AND EXPLOIT DETECTION TECHNIQUES
One of the approaches to mitigate the broken access control vulnerability risk is to detect and prevent the exploitation phase of the vulnerability.One of the methodologies to detect exploitation and anomalies is by learning the application specification and then identifying the deviations between the intended application behavior and its runtime behavior.Researchers utilized different data sources and parameters for performing the learning process in their proposed solutions and also used different algorithms and concepts to have better verdicts and overcome limitations.
Researchers presented BLack-bOx approach for detecting state violation attaCKs (BLOCK) [29], The BLOCK technique is about detecting types of broken access control attacks in web applications using a black-box method.In the context of this technique, the web application is viewed as a stateless system where the intended behavior of the system is inferred by examining how clients interact with the application.Web requests and responses are assessed at runtime using the identified invariants from the web request/response sequences and associated session variable values during normal system operation.A potential state violation attack is one that deviates from these invariants in either the request or the response.BLOCK researchers implement their techniques as proxy [34] between client and server to intercept the messages between client and web application server and collect the session data from session files in training mode, then they generate malicious traffic to evaluate the solution's effectiveness.Results show high false positive rates due to the following limitations; BLOCK only focuses on the relation between requests, responses, and session variables which lake of visibility on the persistent information stored in the database, the stored information in the database may be used to maintain the session across more complex scenarios and workflows and multiple different web sessions, additionally, BLOCK cannot capture indirect relations, BLOCK requires manual intervention to guarantee adequate manual learning and filter out false positives.BLOCK by design will consider all unvisited paths or built invariants as attacks which as well contribute to raising false positive rates.The solution did not take into consideration tasks that must be executed by humans such as workflow procedures [35].
The researchers introduced securing database from logic flaws in web application (SENTINEL) [30], SENTINEL is a black-box approach to detect SQL queries that violate the application's intended behavior and produce logic flaws, their methodology is composed of systematically extracting a set of invariants from observed SQL queries, responses, and session variables, they model the interactions between the web application and the database based on the principals of the extended finite state machine; they consider SQL queries that not have corresponding invariant as a potential attack and drop it.They implemented their solution through two components sensor and analyzer, the sensor collects traffic such as SQL quires its responses along with session variables and communicates with the analyzer, while the analyzer performs offline training by extracting SQL signatures and infers the set of invariants associated with signatures, In runtime, the analyzer evaluates incoming SQL queries and directs the sensor to block any violating queries, SENTINAL overcomes some of the limitations in BLOCK since it takes in consideration the persistent state in the database, additionally its visibility on SQL queries provides more capability in blocking attacks targeted database integrity.SENTINAL limitations include that the solution does not take into consideration NoSQL [36] database backend web applications and can only be applied to the traditional flat relational data model, moreover, it can only address traditional SQL queries that have the same patterns in different languages [37], moreover, can only be applied to specific web development languages and platforms, another limitation from the performance point of view that it introduces performance overhead in SQL response time because of the communication overhead between sensor and analyzer and the analysis time during which analyzer extracts SQL signature and evaluates the query, SENTINAL provides a slight enhancement on false positive rate comparing to BLOCK but still requires some additional techniques to suppress false positives.SENTINAL as well by design considers all unvisited paths or recorded invariants as attacks that also contribute to raising false positive rates, nevertheless, any change in the application structure or data layer will make the learned invariants invalid and require a new round of learning to avoid false positives and incorrect application wide blockage state.
Invariant detector (IVD) [31] is an approach for automatic learning and enforcement of authorization rules in online social networks [38], the solution approach is to block requests attempting to exploit vulnerabilities in the authorization logic of online social networks.The solution technique starts with the learning phase which can take place at the staging phase or testing or pre-release stages of the features in scope, IVD intercepts requests made by an online social network (OSN) to its database, and it stores the likely invariants as attributed graph model [39], nevertheless, IVD design allows it to adapt automatically to OSN changes by continuously learning invariants to minimize manual learning requirements, the IVD composed of three components, request sampler which handles the database queries, the second component the invariant inference which uses the first component output for offline learning, the last component is the invariant checker which is responsible for validating the requests and block the malicious request.IVD work limitation is that it is specific for online social networks and requires manual intervention through writing manual customized rules to minimize potential false positives and false negatives which makes it a challenge to apply it generally in other applications and other domains, IVD's scope is to verify only on write access control policies and their relevant vulnerabilities and exploitations an example for that is that only "Alice's friends can post to Alice's profile"; IVD can contribute to confidentiality indirectly through blocking exploits that can result in "friend" maliciously created a relation between two users, which allows the attacker to collect information from some users as a friend role, IVD, however, does not validate access control on reads and, thus, cannot completely enforce data confidentiality [40], the authors as well planned to overcome some limitation in their future work by extending the approach to cover the more complex invariants.The solution does not take into consideration NoSQL database backend web applications.
InteGuard [41] is an approach to tackle the particular threat surface related to logical flaws resulting from third-party services API integration, the research discussed the new threats associated with the integration process with external web services such as payment services and single-sign-on (SSO), the research aims to reach a solution for how to securely integrate different three parties involved, including the merchant service (integrator), the provider of the third party services such as PayPal and the web clients using the application.The InteGuard which acts as a proxy deployed to analyze the ingress and egress traffic of the service integrators web' site composed of three main components, a trace collector used to generate and label learning traffic, an internet content adaptation protocol (ICAP) server to intercept the direct communication between the integrator and the provider, both of the components analyze and extract the needed parameters and relations and analyzing responses scripts and hypertext markup language (HTML) content then send it to the security policy generator to extract the invariants and construct a finite state machine (FSM) that reflects the security policies for the integration.The ICAP server inspects HTTP messages that have been sent to the integrator's website to perform the global policy by checking and extracting elements of interest from the messages and checking their compliance with security policies.Policies can be tuned when false positives are discovered during the protection mechanism operation.Limitations of this approach include that it does not protect provider-side flaws while several attacks can only be detected at that side such as variants of the unauthorized login by auth.code (or token) redirection attack from [42].InteGuard only analyzes the application and network traffic for invariants creation and loses the advantages of inferring invariants at the database layer [31] which instead offers more advantages by providing both comprehensive invariants and scalability.The solution did not take into consideration tasks that must be executed by humans such as workflow and multistep procedures [35] and does not enforce authorization policies or constraints.InteGuard primarily focuses on browser-based web applications and not on mobile-based application merchant platforms [43].
Swaddler [44] is an anomaly detection method for detecting workflow relevant attacks, Swaddler uses anomaly detection in the internal state of the web application to detect vulnerabilities.The model works in one of two modes, training and detection mode, the training phase is required to record the characteristics of normal events including the code execution paths linked to session variables, two components are used in the implementation, the sensor is an extension of the hypertext preprocessor (PHP) interpreter that analyze the ongoing state of an application, while the analyzer is an anomaly-based system that assesses the regularity of the application's state and anomaly score thresholds to distinguish between regular and anomalous values are created when the sensor has finished its processing, it invokes the original handler of the statement, passing and resume the application path execution.After creating the profiles, which means that the models have collected the needed info about normal events as well and the appropriate thresholds have been established, the system can work in the detection mode.During this mode, the system calculates anomaly scores and reports any anomalous states.One of the main Swaddler limitations that it is requires Survey on detecting and preventing web application broken access control attacks (Ahmed Anas) 777 source code access which makes it technology and source code dependent along with that it necessitates the modification of the PHP engine to enable monitoring of the web application's execution flow and the need to access execution paths to monitor, which could present major practical deployment difficulties.Moreover, it is crucial to consider the performance trade-off of Swaddler.Moreover, Swaddler cannot detect categories of data flow violation [45] such as insecure direct object reference (IDOR) and types of broken access control (BAC).Follow-up research as in [46] considers the challenge of tuning in operational settings and a large number of false positives due to the usage of unsupervised learning.Double Guard [47] is an intuition detection system (IDS) that targets to detect privilege escalation attacks, hijack future session attacks, injection attacks, and direct database attacks.Double guard is deployed in such a way as to monitor both users' requests to a web server and its subsequent corresponding database requests to the back-end to detect malicious activity, doable guard utilizes container-based web server architecture where each session to be assigned to a separate web server and isolated from other sessions and mapped to a single user to enable separating different information flows by its every session and allow linking it to corresponding database query or request.Limitations of double guard include the inability to detect some of the broken access control categories including that it cannot detect insecure direct object reference attacks (IDOR) since it focuses on query structure anomalies, moreover, it is designed to work on detection and not prevention mode [48], and does not employ incremental learning which requires the model learning phase to be reset in case of application new legitimate features added, additionally, there is a big challenge in scalability and performance management given that the proposed architecture which requires container image per each single user session.Double guard identifies user session per their internet protocol (IP) address which is not reliable given the users' clusters who uses proxy (e.g., corporates or countries) and mobile users who dynamically change their IP address.
TamperProof [49] is an inline defense tool deployed as a proxy between the client and server, the tool can be used to protect legacy against parameter tampering attacks.The solution approach is to infer and enforce field, and value constraints on each input submitted to the server dynamically, it analyzes each form generated by the server to extract the constraints enforced by HTML and JavaScript and record the constraints.validation takes place by rejecting any request to the server that does not satisfy the constraints corresponding to the page form used to submit the input.TamperProof also injects a hidden field into each web form an identifier referred to as patchID which is used to validate the requests and sequencing.The main limitation of the TamperProof tool that is it cannot provide protection applications that alter the client code of a web form and Asynchronous JavaScript and eXtensible Markup Language (AJAX) requests in addition to those written for Web 2.0 or Web 3.0 or any dynamic altering client code which make it almost unusable for most of the currently running applications; additionally, web pages which employed HTML and JavaScript obscuration will be another challenge.
PHP-Sensor [50] is a model that approaches to discover workflow violation attacks and cross-site scripting (XSS) attacks that target PHP applications.To perform the workflow violation attack detection, the model observes the sequences of HTTP requests/responses and their associated session variables while in offline mode to extract a specific set of axioms.This set of axioms is later applied to assess the HTTP request/response during online mod where If an HTTP request/response fails to comply with its corresponding axiom, it is considered a workflow violation attack within a PHP web application, additionally, it can detect XSS worms through monitoring HTTP web requests and responses.The deployment will utilize a web proxy which is responsible for the workflow violation filter process, it identifies workflow violation attacks by constructing the expected flow model of the web application by analyzing the communication between the web browser and the server.The PHP-Sensor defensive model, which detects workflow violation attacks, comprises two primary phases: the offline phase and the recognition phase.The proposed framework, PHP-sensor, detects workflow violations using a recognition phase that involves a set of axioms.Each HTTP web request key is linked to two specific types of axioms, while input/output pairs are linked to different types of axioms.Axioms are converted into estimation functions that yield true or false values based on whether input pairs satisfy them.The recognition phase authenticates input and output pairs by checking if the associated axioms are fulfilled, and if not, the HTTP request or HTML page is blocked.Limitations of the PHP-Sensor model include that it is technology dependent since it works only on PHP, in addition, limitations of the model are that it does not detect other categories of broken access control such as direct object reference attacks (IDOR) and unauthorized access to functions and privilege escalations, moreover, PHP-Sensor is not applicable to web applications developed on platforms such as AJAX.Additionally, the PHP-Sensor's capability to control complex restrictions within the database is limited.The authors of the solution also planned to overcome some of these limitations by considering other internal states of PHP web applications.Furthermore, they will focus on optimizing several methods to reduce the performance overhead caused by PHP-Sensor.Exploitation prevention and detection current approaches and techniques for BAC have different working models and implantations which result in different value added and shortcomings.In order to infer the current techniques challenges and research directions, a thorough evaluation and comparison took place as illustrated in Tables 1 and 2 based on the following characteristics, false positives (FP), presetancy aware (PA) in which we evaluate if the technique takes into account the persistent state information in the application logic, workflows aware (WA) in which we evaluate if the technique takes into account the workflow BAC relevant attacks, manual intervention required (MIR) for false positive filtering, manual crawling is required (MCR) for the application functions as any non-manually learned functions is considered a false positive, source code required (SCR), performance noticeable overhead (PO), NoSQL aware (NSA) in which it can handle the persistency while the backend is non-relational NoSQL databases, specific web development languages and platforms (SDLP), IDOR in which we evaluate if the technique can detect IDOR,: broken access control privilege escalation (BACPE), business or domain specific (BDS), and blocking mode (BM) in which we evaluate if the technique can work in blocking mode and not only in detection mode with no blocking capabilities.Based on the thorough comparison as elaborated, we can group the current gaps and limitations along with the corresponding research direction as listed following: a.The wide spectrum of the existing literature that aims to detect broken access control vulnerability and its sub-category such as insecure direct object reference have a serious limitation as those solutions by design requires application crawling through human and spiders and then considering all not crawled paths or recorded invariants as attacks, that strategy impacts model efficiency including striking the false Research is required to provide less technology dependent and more generalized solutions that can cover the wide spectrum of technologies and implementation diversifications.c.Some solutions require the source code access during learning and blocking mode, providing the source code which is not applicable in most cases since a wide percentage of the applications is closed source and source code owners are not willing to provide for multiple reasons, nevertheless, it makes the solution very dependent on the language used, research is required to overcome this challenge to provide less source code dependent techniques.d.Some of the solutions provided are very dependent on specific business type or services and needs further research to be generalized to be a more generic solution for securing against broken access control attack.e.Most of the provided solutions are not capable of detecting the severe insecure direct object reference category, research is required to close this gap and find innovative and effective techniques to safeguard against the vulnerability.f.Horizontally, deeper research is required to overcome the performance impact on most of the solutions to be practically visible in enterprises and to enhance false positive and false negative rates.

CONCLUSION AND FUTURE WORK
The existent solutions cannot be considered reliable and comprehensive enough to provide an adequate level of protection against critical broken access control attacks including some of its subcategories such as insecure direct object reference.This research provides a thorough analysis of the current literature, identifying current gaps and limitations.This paper suggests research areas and directions for tackling this critical issue.
Future work can be conducted to evaluate machine learning and artificial intelligence techniques to develop a methodology and model that can be an effective and efficient solution to detect and prevent broken access control attacks.The target model implementation should ensure that any identified limitations issues and gaps have been addressed.The research will also assess the principles of incremental learning, a process where the model continues to learn and adapt from new data, thereby enhancing its predictions or decisions over time.

Table 1 .
Solutions comparison part 1

Table 2 .
Solutions comparison part 2 779positive rates, nevertheless, one of its results that any change in the application structure or data layer make the learning phase or learned invariants invalid and requires a new round of learning to avoid false Positives and incorrect application wide blockage state, research is required in this point to overcome this challenge through exploring the available machine learning and artificial intelligence techniques and algorithms that can provide a solution to this focal issue.b.Most of the provided solutions are either very dependent and custom only to be operating on or protect specific technology in a matter of programming language used or technology used such as PHP applications only or specific databases types which not able to protect other types of databases NoSQL database backend web applications.
Survey on detecting and preventing web application broken access control attacks (Ahmed Anas)