In this blog post I will discuss regular expressions (regex) and their dark side. In particular, I will discuss how a regex can trigger a denial-of-service attack.
What is a Regular Expression Denial-of-Service (ReDoS) attack?
A ReDoS attack is one of many different denial-of-service attacks. The main goal of these attacks is to make a service unavailable to the end user.
In a denial-of-service attack, an attacker tries to use various techniques to paralyze a specific system or parts of the infrastructure. For example, a large number of requests could be sent to a server. The server would have to process and answer all of them simultaneously, which would lead to a disproportionately long response time. It is also quite possible that the use of many resources could lead to a system failure.
ReDoS attacks follow the same pattern. The attacker exploits the functionality bahrain consumer email list of various regex engines. An input is constructed that generates a significantly higher checking effort in the engine than would normally be the case. This can provoke a system crash or failure.
How do regular expressions work?
Before we proceed, let's look at how regular expression matching works under the hood - this will help us understand why certain regular expressions are particularly vulnerable to attacks.
The matching of regular expression patterns can be done by constructing a finite state machine (FSM). This can be imagined as an abstract machine. With the help of operators, a series of checks are carried out and a statement is generated accordingly.
An FSM can be in exactly one state at any given time. The set of states is finite. A transition occurs when a finite state machine goes from one state to another. An example of a finite state machine is a coffee machine that dispenses a certain type of coffee depending on the user's selection.
As mentioned earlier, regular expression comparison can be done by constructing a FSM. Regular expressions can also be easily converted from a finite automaton to a nondeterministic automaton, especially for expressions where there are multiple possible next states for each received input. In such cases, after the conversion, the regular expression engine can use multiple algorithms to determine the next states.