# Infinite backtracking problem
Some regular expressions are looking simple, but can execute veeeeeery long time, and even "hang" the JavaScript engine.
Sooner or later most developers occasionally face such behavior.
The typical situation -- a regular expression works fine sometimes, but for certain strings it "hangs" consuming 100% of CPU.
In a web-browser it kills the page. Not a good thing for sure.
For server-side Javascript it may become a vulnerability, and it uses regular expressions to process user data. Bad input will make the process hang, causing denial of service. The author personally saw and reported such vulnerabilities even for very well-known and widely used programs.
So the problem is definitely worth to deal with.
## Introductin
The plan will be like this:
1. First we see the problem how it may occur.
2. Then we simplify the situation and see why it occurs.
3. Then we fix it.
For instance let's consider searching tags in HTML.
We want to find all tags, with or without attributes -- like `subject:`. We need the regexp to work reliably, because HTML comes from the internet and can be messy.
In particular, we need it to match tags like `` -- with `<` and `>` in attributes. That's allowed by [HTML standard](https://html.spec.whatwg.org/multipage/syntax.html#syntax-attributes).
Now we can see that a simple regexp like `pattern:<[^>]+>` doesn't work, because it stops at the first `>`, and we need to ignore `<>` if inside an attribute.
```js run
// the match doesn't reach the end of the tag - wrong!
alert( ''.match(/<[^>]+>/) ); // `.
That regexp is not perfect! It doesn't yet support all details of HTML, for instance unquoted values, and there are other ways to improve, but let's not add complexity. It will demonstrate the problem for us.
The regexp seems to work:
```js run
let reg = /<\w+(\s*\w+="[^"]*"\s*)*>/g;
let str='...... ...';
alert( str.match(reg) ); // ,
```
Great! It found both the long tag `match:` and the short one `match:`.
Now, that we've got a seemingly working solution, let's get to the infinite backtracking itself.
## Infinite backtracking
If you run our regexp on the input below, it may hang the browser (or another JavaScript host):
```js run
let reg = /<\w+(\s*\w+="[^"]*"\s*)*>/g;
let str = ``.
Unfortunately, the regexp still hangs:
```js run
// only search for space-delimited attributes
let reg = /<(\s*\w+=\w+\s*)*>/g;
let str = `` in the string `subject:` at the end, so the match is impossible, but the regexp engine doesn't know about it. The search backtracks trying different combinations of `pattern:(\s*\w+=\w+\s*)`:
```
(a=b a=b a=b) (a=b)
(a=b a=b) (a=b a=b)
(a=b) (a=b a=b a=b)
...
```
## How to fix?
The backtracking checks many variants that are an obvious fail for a human.
For instance, in the pattern `pattern:(\d+)*$` a human can easily see that `pattern:(\d+)*` does not need to backtrack `pattern:+`. There's no difference between one or two `\d+`:
```
\d+........
(123456789)z
\d+...\d+....
(1234)(56789)z
```
Let's get back to more real-life example: `pattern:<(\s*\w+=\w+\s*)*>`. We want it to find pairs `name=value` (as many as it can).
What we would like to do is to forbid backtracking.
There's totally no need to decrease the number of repetitions.
In other words, if it found three `name=value` pairs and then can't find `>` after them, then there's no need to decrease the count of repetitions. There are definitely no `>` after those two (we backtracked one `name=value` pair, it's there):
```
(name=value) name=value
```
Modern regexp engines support so-called "possessive" quantifiers for that. They are like greedy, but don't backtrack at all. Pretty simple, they capture whatever they can, and the search continues. There's also another tool called "atomic groups" that forbid backtracking inside parentheses.
Unfortunately, but both these features are not supported by JavaScript.
### Lookahead to the rescue
We can get forbid backtracking using lookahead.
The pattern to take as much repetitions as possible without backtracking is: `pattern:(?=(a+))\1`.
In other words:
- The lookahead `pattern:?=` looks for the maximal count `pattern:a+` from the current position.
- And then they are "consumed into the result" by the backreference `pattern:\1` (`pattern:\1` corresponds to the content of the second parentheses, that is `pattern:a+`).
There will be no backtracking, because lookahead does not backtrack. If it found like 5 times of `pattern:a+` and the further match failed, then it doesn't go back to 4.
```smart
There's more about the relation between possessive quantifiers and lookahead in articles [Regex: Emulate Atomic Grouping (and Possessive Quantifiers) with LookAhead](http://instanceof.me/post/52245507631/regex-emulate-atomic-grouping-with-lookahead) and [Mimicking Atomic Groups](http://blog.stevenlevithan.com/archives/mimic-atomic-groups).
```
So this trick makes the problem disappear.
Let's fix the regexp for a tag with attributes from the beginning of the chapter`pattern:<\w+(\s*\w+=(\w+|"[^"]*")\s*)*>`. We'll use lookahead to prevent backtracking of `name=value` pairs:
```js run
// regexp to search name=value
let reg = /(\s*\w+=(\w+|"[^"]*")\s*)/
// use new RegExp to nicely insert its source into (?=(a+))\1
let fixedReg = new RegExp(`<\\w+(?=(${attrReg.source}*))\\1>`, 'g');
let goodInput = '...... ...';
let badInput = `,
alert( badInput.match(fixedReg) ); // null (no results, fast!)
```
Great, it works! We found both a long tag `match:` and a small one `match:`, and (!) didn't hang the engine on the bad input.
Please note the `attrReg.source` property. `RegExp` objects provide access to their source string in it. That's convenient when we want to insert one regexp into another.