reg->regexp

This commit is contained in:
Ilya Kantor 2019-09-06 16:50:41 +03:00
parent 4232a53219
commit 32e20fc97c
35 changed files with 132 additions and 132 deletions

View file

@ -1,8 +1,8 @@
Answer: `pattern:\d\d[-:]\d\d`.
```js run
let reg = /\d\d[-:]\d\d/g;
alert( "Breakfast at 09:00. Dinner at 21-30".match(reg) ); // 09:00, 21-30
let regexp = /\d\d[-:]\d\d/g;
alert( "Breakfast at 09:00. Dinner at 21-30".match(regexp) ); // 09:00, 21-30
```
Please note that the dash `pattern:'-'` has a special meaning in square brackets, but only between other characters, not when it's in the beginning or at the end, so we don't need to escape it.

View file

@ -5,8 +5,8 @@ The time can be in the format `hours:minutes` or `hours-minutes`. Both hours and
Write a regexp to find time:
```js
let reg = /your regexp/g;
alert( "Breakfast at 09:00. Dinner at 21-30".match(reg) ); // 09:00, 21-30
let regexp = /your regexp/g;
alert( "Breakfast at 09:00. Dinner at 21-30".match(regexp) ); // 09:00, 21-30
```
P.S. In this task we assume that the time is always correct, there's no need to filter out bad strings like "45:67". Later we'll deal with that too.

View file

@ -130,18 +130,18 @@ In the example below the regexp `pattern:[-().^+]` looks for one of the characte
```js run
// No need to escape
let reg = /[-().^+]/g;
let regexp = /[-().^+]/g;
alert( "1 + 2 - 3".match(reg) ); // Matches +, -
alert( "1 + 2 - 3".match(regexp) ); // Matches +, -
```
...But if you decide to escape them "just in case", then there would be no harm:
```js run
// Escaped everything
let reg = /[\-\(\)\.\^\+]/g;
let regexp = /[\-\(\)\.\^\+]/g;
alert( "1 + 2 - 3".match(reg) ); // also works: +, -
alert( "1 + 2 - 3".match(regexp) ); // also works: +, -
```
## Ranges and flag "u"

View file

@ -2,8 +2,8 @@
Solution:
```js run
let reg = /\.{3,}/g;
alert( "Hello!... How goes?.....".match(reg) ); // ..., .....
let regexp = /\.{3,}/g;
alert( "Hello!... How goes?.....".match(regexp) ); // ..., .....
```
Please note that the dot is a special character, so we have to escape it and insert as `\.`.

View file

@ -9,6 +9,6 @@ Create a regexp to find ellipsis: 3 (or more?) dots in a row.
Check it:
```js
let reg = /your regexp/g;
alert( "Hello!... How goes?.....".match(reg) ); // ..., .....
let regexp = /your regexp/g;
alert( "Hello!... How goes?.....".match(regexp) ); // ..., .....
```

View file

@ -7,11 +7,11 @@ Then we can look for 6 of them using the quantifier `pattern:{6}`.
As a result, we have the regexp: `pattern:/#[a-f0-9]{6}/gi`.
```js run
let reg = /#[a-f0-9]{6}/gi;
let regexp = /#[a-f0-9]{6}/gi;
let str = "color:#121212; background-color:#AA00ef bad-colors:f#fddee #fd2"
alert( str.match(reg) ); // #121212,#AA00ef
alert( str.match(regexp) ); // #121212,#AA00ef
```
The problem is that it finds the color in longer sequences:

View file

@ -5,11 +5,11 @@ Create a regexp to search HTML-colors written as `#ABCDEF`: first `#` and then 6
An example of use:
```js
let reg = /...your regexp.../
let regexp = /...your regexp.../
let str = "color:#121212; background-color:#AA00ef bad-colors:f#fddee #fd2 #12345678";
alert( str.match(reg) ) // #121212,#AA00ef
alert( str.match(regexp) ) // #121212,#AA00ef
```
P.S. In this task we do not need other color formats like `#123` or `rgb(1,2,3)` etc.

View file

@ -5,11 +5,11 @@ An acceptable variant is `pattern:<!--.*?-->` -- the lazy quantifier makes the d
Otherwise multiline comments won't be found:
```js run
let reg = /<!--.*?-->/gs;
let regexp = /<!--.*?-->/gs;
let str = `... <!-- My -- comment
test --> .. <!----> ..
`;
alert( str.match(reg) ); // '<!-- My -- comment \n test -->', '<!---->'
alert( str.match(regexp) ); // '<!-- My -- comment \n test -->', '<!---->'
```

View file

@ -3,11 +3,11 @@
Find all HTML comments in the text:
```js
let reg = /your regexp/g;
let regexp = /your regexp/g;
let str = `... <!-- My -- comment
test --> .. <!----> ..
`;
alert( str.match(reg) ); // '<!-- My -- comment \n test -->', '<!---->'
alert( str.match(regexp) ); // '<!-- My -- comment \n test -->', '<!---->'
```

View file

@ -2,9 +2,9 @@
The solution is `pattern:<[^<>]+>`.
```js run
let reg = /<[^<>]+>/g;
let regexp = /<[^<>]+>/g;
let str = '<> <a href="/"> <input type="radio" checked> <b>';
alert( str.match(reg) ); // '<a href="/">', '<input type="radio" checked>', '<b>'
alert( str.match(regexp) ); // '<a href="/">', '<input type="radio" checked>', '<b>'
```

View file

@ -5,11 +5,11 @@ Create a regular expression to find all (opening and closing) HTML tags with the
An example of use:
```js run
let reg = /your regexp/g;
let regexp = /your regexp/g;
let str = '<> <a href="/"> <input type="radio" checked> <b>';
alert( str.match(reg) ); // '<a href="/">', '<input type="radio" checked>', '<b>'
alert( str.match(regexp) ); // '<a href="/">', '<input type="radio" checked>', '<b>'
```
Here we assume that tag attributes may not contain `<` and `>` (inside squotes too), that simplifies things a bit.

View file

@ -17,11 +17,11 @@ A regular expression like `pattern:/".+"/g` (a quote, then something, then the o
Let's try it:
```js run
let reg = /".+"/g;
let regexp = /".+"/g;
let str = 'a "witch" and her "broom" is one';
alert( str.match(reg) ); // "witch" and her "broom"
alert( str.match(regexp) ); // "witch" and her "broom"
```
...We can see that it works not as intended!
@ -105,11 +105,11 @@ To make things clear: usually a question mark `pattern:?` is a quantifier by its
The regexp `pattern:/".+?"/g` works as intended: it finds `match:"witch"` and `match:"broom"`:
```js run
let reg = /".+?"/g;
let regexp = /".+?"/g;
let str = 'a "witch" and her "broom" is one';
alert( str.match(reg) ); // witch, broom
alert( str.match(regexp) ); // witch, broom
```
To clearly understand the change, let's trace the search step by step.
@ -175,11 +175,11 @@ With regexps, there's often more than one way to do the same thing.
In our case we can find quoted strings without lazy mode using the regexp `pattern:"[^"]+"`:
```js run
let reg = /"[^"]+"/g;
let regexp = /"[^"]+"/g;
let str = 'a "witch" and her "broom" is one';
alert( str.match(reg) ); // witch, broom
alert( str.match(regexp) ); // witch, broom
```
The regexp `pattern:"[^"]+"` gives correct results, because it looks for a quote `pattern:'"'` followed by one or more non-quotes `pattern:[^"]`, and then the closing quote.
@ -201,20 +201,20 @@ The first idea might be: `pattern:/<a href=".*" class="doc">/g`.
Let's check it:
```js run
let str = '...<a href="link" class="doc">...';
let reg = /<a href=".*" class="doc">/g;
let regexp = /<a href=".*" class="doc">/g;
// Works!
alert( str.match(reg) ); // <a href="link" class="doc">
alert( str.match(regexp) ); // <a href="link" class="doc">
```
It worked. But let's see what happens if there are many links in the text?
```js run
let str = '...<a href="link1" class="doc">... <a href="link2" class="doc">...';
let reg = /<a href=".*" class="doc">/g;
let regexp = /<a href=".*" class="doc">/g;
// Whoops! Two links in one match!
alert( str.match(reg) ); // <a href="link1" class="doc">... <a href="link2" class="doc">
alert( str.match(regexp) ); // <a href="link1" class="doc">... <a href="link2" class="doc">
```
Now the result is wrong for the same reason as our "witches" example. The quantifier `pattern:.*` took too many characters.
@ -230,10 +230,10 @@ Let's modify the pattern by making the quantifier `pattern:.*?` lazy:
```js run
let str = '...<a href="link1" class="doc">... <a href="link2" class="doc">...';
let reg = /<a href=".*?" class="doc">/g;
let regexp = /<a href=".*?" class="doc">/g;
// Works!
alert( str.match(reg) ); // <a href="link1" class="doc">, <a href="link2" class="doc">
alert( str.match(regexp) ); // <a href="link1" class="doc">, <a href="link2" class="doc">
```
Now it seems to work, there are two matches:
@ -247,10 +247,10 @@ Now it seems to work, there are two matches:
```js run
let str = '...<a href="link1" class="wrong">... <p style="" class="doc">...';
let reg = /<a href=".*?" class="doc">/g;
let regexp = /<a href=".*?" class="doc">/g;
// Wrong match!
alert( str.match(reg) ); // <a href="link1" class="wrong">... <p style="" class="doc">
alert( str.match(regexp) ); // <a href="link1" class="wrong">... <p style="" class="doc">
```
Now it fails. The match includes not just a link, but also a lot of text after it, including `<p...>`.
@ -281,11 +281,11 @@ A working example:
```js run
let str1 = '...<a href="link1" class="wrong">... <p style="" class="doc">...';
let str2 = '...<a href="link1" class="doc">... <a href="link2" class="doc">...';
let reg = /<a href="[^"]*" class="doc">/g;
let regexp = /<a href="[^"]*" class="doc">/g;
// Works!
alert( str1.match(reg) ); // null, no matches, that's correct
alert( str2.match(reg) ); // <a href="link1" class="doc">, <a href="link2" class="doc">
alert( str1.match(regexp) ); // null, no matches, that's correct
alert( str2.match(regexp) ); // <a href="link1" class="doc">, <a href="link2" class="doc">
```
## Summary

View file

@ -9,13 +9,13 @@ Now let's show that the match should capture all the text: start at the beginnin
Finally:
```js run
let reg = /^[0-9a-fA-F]{2}(:[0-9a-fA-F]{2}){5}$/i;
let regexp = /^[0-9a-fA-F]{2}(:[0-9a-fA-F]{2}){5}$/i;
alert( reg.test('01:32:54:67:89:AB') ); // true
alert( regexp.test('01:32:54:67:89:AB') ); // true
alert( reg.test('0132546789AB') ); // false (no colons)
alert( regexp.test('0132546789AB') ); // false (no colons)
alert( reg.test('01:32:54:67:89') ); // false (5 numbers, need 6)
alert( regexp.test('01:32:54:67:89') ); // false (5 numbers, need 6)
alert( reg.test('01:32:54:67:89:ZZ') ) // false (ZZ in the end)
alert( regexp.test('01:32:54:67:89:ZZ') ) // false (ZZ in the end)
```

View file

@ -8,13 +8,13 @@ Write a regexp that checks whether a string is MAC-address.
Usage:
```js
let reg = /your regexp/;
let regexp = /your regexp/;
alert( reg.test('01:32:54:67:89:AB') ); // true
alert( regexp.test('01:32:54:67:89:AB') ); // true
alert( reg.test('0132546789AB') ); // false (no colons)
alert( regexp.test('0132546789AB') ); // false (no colons)
alert( reg.test('01:32:54:67:89') ); // false (5 numbers, must be 6)
alert( regexp.test('01:32:54:67:89') ); // false (5 numbers, must be 6)
alert( reg.test('01:32:54:67:89:ZZ') ) // false (ZZ ad the end)
alert( regexp.test('01:32:54:67:89:ZZ') ) // false (ZZ ad the end)
```

View file

@ -9,19 +9,19 @@ Here the pattern `pattern:[a-f0-9]{3}` is enclosed in parentheses to apply the q
In action:
```js run
let reg = /#([a-f0-9]{3}){1,2}/gi;
let regexp = /#([a-f0-9]{3}){1,2}/gi;
let str = "color: #3f3; background-color: #AA00ef; and: #abcd";
alert( str.match(reg) ); // #3f3 #AA00ef #abc
alert( str.match(regexp) ); // #3f3 #AA00ef #abc
```
There's a minor problem here: the pattern found `match:#abc` in `subject:#abcd`. To prevent that we can add `pattern:\b` to the end:
```js run
let reg = /#([a-f0-9]{3}){1,2}\b/gi;
let regexp = /#([a-f0-9]{3}){1,2}\b/gi;
let str = "color: #3f3; background-color: #AA00ef; and: #abcd";
alert( str.match(reg) ); // #3f3 #AA00ef
alert( str.match(regexp) ); // #3f3 #AA00ef
```

View file

@ -4,11 +4,11 @@ Write a RegExp that matches colors in the format `#abc` or `#abcdef`. That is: `
Usage example:
```js
let reg = /your regexp/g;
let regexp = /your regexp/g;
let str = "color: #3f3; background-color: #AA00ef; and: #abcd";
alert( str.match(reg) ); // #3f3 #AA00ef
alert( str.match(regexp) ); // #3f3 #AA00ef
```
P.S. This should be exactly 3 or 6 hex digits. Values with 4 digits, such as `#abcd`, should not match.

View file

@ -3,9 +3,9 @@ A positive number with an optional decimal part is (per previous task): `pattern
Let's add the optional `pattern:-` in the beginning:
```js run
let reg = /-?\d+(\.\d+)?/g;
let regexp = /-?\d+(\.\d+)?/g;
let str = "-1.5 0 2 -123.4.";
alert( str.match(reg) ); // -1.5, 0, 2, -123.4
alert( str.match(regexp) ); // -1.5, 0, 2, -123.4
```

View file

@ -5,9 +5,9 @@ Write a regexp that looks for all decimal numbers including integer ones, with t
An example of use:
```js
let reg = /your regexp/g;
let regexp = /your regexp/g;
let str = "-1.5 0 2 -123.4.";
alert( str.match(reg) ); // -1.5, 0, 2, -123.4
alert( str.match(regexp) ); // -1.5, 0, 2, -123.4
```

View file

@ -18,9 +18,9 @@ To make each of these parts a separate element of the result array, let's enclos
In action:
```js run
let reg = /(-?\d+(\.\d+)?)\s*([-+*\/])\s*(-?\d+(\.\d+)?)/;
let regexp = /(-?\d+(\.\d+)?)\s*([-+*\/])\s*(-?\d+(\.\d+)?)/;
alert( "1.2 + 12".match(reg) );
alert( "1.2 + 12".match(regexp) );
```
The result includes:
@ -42,9 +42,9 @@ The final solution:
```js run
function parse(expr) {
let reg = /(-?\d+(?:\.\d+)?)\s*([-+*\/])\s*(-?\d+(?:\.\d+)?)/;
let regexp = /(-?\d+(?:\.\d+)?)\s*([-+*\/])\s*(-?\d+(?:\.\d+)?)/;
let result = expr.match(reg);
let result = expr.match(regexp);
if (!result) return [];
result.shift();

View file

@ -56,9 +56,9 @@ The email format is: `name@domain`. Any word can be the name, hyphens and dots a
The pattern:
```js run
let reg = /[-.\w]+@([\w-]+\.)+[\w-]+/g;
let regexp = /[-.\w]+@([\w-]+\.)+[\w-]+/g;
alert("my@mail.com @ his@site.com.uk".match(reg)); // my@mail.com, his@site.com.uk
alert("my@mail.com @ his@site.com.uk".match(regexp)); // my@mail.com, his@site.com.uk
```
That regexp is not perfect, but mostly works and helps to fix accidental mistypes. The only truly reliable check for an email can only be done by sending a letter.
@ -110,9 +110,9 @@ In action:
```js run
let str = '<span class="my">';
let reg = /<(([a-z]+)\s*([^>]*))>/;
let regexp = /<(([a-z]+)\s*([^>]*))>/;
let result = str.match(reg);
let result = str.match(regexp);
alert(result[0]); // <span class="my">
alert(result[1]); // span class="my"
alert(result[2]); // span
@ -336,10 +336,10 @@ let str = "Gogogo John!";
*!*
// ?: exludes 'go' from capturing
let reg = /(?:go)+ (\w+)/i;
let regexp = /(?:go)+ (\w+)/i;
*/!*
let result = str.match(reg);
let result = str.match(regexp);
alert( result[0] ); // Gogogo John (full match)
alert( result[1] ); // John

View file

@ -17,10 +17,10 @@ We can put both kinds of quotes in the square brackets: `pattern:['"](.*?)['"]`,
```js run
let str = `He said: "She's the one!".`;
let reg = /['"](.*?)['"]/g;
let regexp = /['"](.*?)['"]/g;
// The result is not what we'd like to have
alert( str.match(reg) ); // "She'
alert( str.match(regexp) ); // "She'
```
As we can see, the pattern found an opening quote `match:"`, then the text is consumed till the other quote `match:'`, that closes the match.
@ -33,10 +33,10 @@ Here's the correct code:
let str = `He said: "She's the one!".`;
*!*
let reg = /(['"])(.*?)\1/g;
let regexp = /(['"])(.*?)\1/g;
*/!*
alert( str.match(reg) ); // "She's the one!"
alert( str.match(regexp) ); // "She's the one!"
```
Now it works! The regular expression engine finds the first quote `pattern:(['"])` and memorizes its content. That's the first capturing group.
@ -65,8 +65,8 @@ In the example below the group with quotes is named `pattern:?<quote>`, so the b
let str = `He said: "She's the one!".`;
*!*
let reg = /(?<quote>['"])(.*?)\k<quote>/g;
let regexp = /(?<quote>['"])(.*?)\k<quote>/g;
*/!*
alert( str.match(reg) ); // "She's the one!"
alert( str.match(regexp) ); // "She's the one!"
```

View file

@ -4,11 +4,11 @@ The first idea can be to list the languages with `|` in-between.
But that doesn't work right:
```js run
let reg = /Java|JavaScript|PHP|C|C\+\+/g;
let regexp = /Java|JavaScript|PHP|C|C\+\+/g;
let str = "Java, JavaScript, PHP, C, C++";
alert( str.match(reg) ); // Java,Java,PHP,C,C
alert( str.match(regexp) ); // Java,Java,PHP,C,C
```
The regular expression engine looks for alternations one-by-one. That is: first it checks if we have `match:Java`, otherwise -- looks for `match:JavaScript` and so on.
@ -25,9 +25,9 @@ There are two solutions for that problem:
In action:
```js run
let reg = /Java(Script)?|C(\+\+)?|PHP/g;
let regexp = /Java(Script)?|C(\+\+)?|PHP/g;
let str = "Java, JavaScript, PHP, C, C++";
alert( str.match(reg) ); // Java,JavaScript,PHP,C,C++
alert( str.match(regexp) ); // Java,JavaScript,PHP,C,C++
```

View file

@ -5,7 +5,7 @@ There are many programming languages, for instance Java, JavaScript, PHP, C, C++
Create a regexp that finds them in the string `subject:Java JavaScript PHP C++ C`:
```js
let reg = /your regexp/g;
let regexp = /your regexp/g;
alert("Java JavaScript PHP C++ C".match(reg)); // Java JavaScript PHP C++ C
alert("Java JavaScript PHP C++ C".match(regexp)); // Java JavaScript PHP C++ C
```

View file

@ -8,7 +8,7 @@ The full pattern: `pattern:\[(b|url|quote)\].*?\[/\1\]`.
In action:
```js run
let reg = /\[(b|url|quote)\].*?\[\/\1\]/gs;
let regexp = /\[(b|url|quote)\].*?\[\/\1\]/gs;
let str = `
[b]hello![/b]
@ -17,7 +17,7 @@ let str = `
[/quote]
`;
alert( str.match(reg) ); // [b]hello![/b],[quote][url]http://google.com[/url][/quote]
alert( str.match(regexp) ); // [b]hello![/b],[quote][url]http://google.com[/url][/quote]
```
Please note that we had to escape a slash for the closing tag `pattern:[/\1]`, because normally the slash closes the pattern.

View file

@ -32,17 +32,17 @@ Create a regexp to find all BB-tags with their contents.
For instance:
```js
let reg = /your regexp/flags;
let regexp = /your regexp/flags;
let str = "..[url]http://google.com[/url]..";
alert( str.match(reg) ); // [url]http://google.com[/url]
alert( str.match(regexp) ); // [url]http://google.com[/url]
```
If tags are nested, then we need the outer tag (if we want we can continue the search in its content):
```js
let reg = /your regexp/flags;
let regexp = /your regexp/flags;
let str = "..[url][b]http://google.com[/b][/url]..";
alert( str.match(reg) ); // [url][b]http://google.com[/b][/url]
alert( str.match(regexp) ); // [url][b]http://google.com[/b][/url]
```

View file

@ -10,8 +10,8 @@ Step by step:
In action:
```js run
let reg = /"(\\.|[^"\\])*"/g;
let regexp = /"(\\.|[^"\\])*"/g;
let str = ' .. "test me" .. "Say \\"Hello\\"!" .. "\\\\ \\"" .. ';
alert( str.match(reg) ); // "test me","Say \"Hello\"!","\\ \""
alert( str.match(regexp) ); // "test me","Say \"Hello\"!","\\ \""
```

View file

@ -10,7 +10,7 @@ In the regexp language: `pattern:<style(>|\s.*?>)`.
In action:
```js run
let reg = /<style(>|\s.*?>)/g;
let regexp = /<style(>|\s.*?>)/g;
alert( '<style> <styler> <style test="...">'.match(reg) ); // <style>, <style test="...">
alert( '<style> <styler> <style test="...">'.match(regexp) ); // <style>, <style test="...">
```

View file

@ -7,7 +7,7 @@ Write a regexp to find the tag `<style...>`. It should match the full tag: it ma
For instance:
```js
let reg = /your regexp/g;
let regexp = /your regexp/g;
alert( '<style> <styler> <style test="...">'.match(reg) ); // <style>, <style test="...">
alert( '<style> <styler> <style test="...">'.match(regexp) ); // <style>, <style test="...">
```

View file

@ -11,11 +11,11 @@ The corresponding regexp: `pattern:html|php|java(script)?`.
A usage example:
```js run
let reg = /html|php|css|java(script)?/gi;
let regexp = /html|php|css|java(script)?/gi;
let str = "First HTML appeared, then CSS, then JavaScript";
alert( str.match(reg) ); // 'HTML', 'CSS', 'JavaScript'
alert( str.match(regexp) ); // 'HTML', 'CSS', 'JavaScript'
```
We already saw a similar thing -- square brackets. They allow to choose between multiple characters, for instance `pattern:gr[ae]y` matches `match:gray` or `match:grey`.
@ -64,7 +64,7 @@ But that's wrong, the alternation should only be used in the "hours" part of the
The final solution:
```js run
let reg = /([01]\d|2[0-3]):[0-5]\d/g;
let regexp = /([01]\d|2[0-3]):[0-5]\d/g;
alert("00:00 10:10 23:59 25:99 1:2".match(reg)); // 00:00,10:10,23:59
alert("00:00 10:10 23:59 25:99 1:2".match(regexp)); // 00:00,10:10,23:59
```

View file

@ -6,11 +6,11 @@ We can exclude negatives by prepending it with the negative lookahead: `pattern:
Although, if we try it now, we may notice one more "extra" result:
```js run
let reg = /(?<!-)\d+/g;
let regexp = /(?<!-)\d+/g;
let str = "0 12 -5 123 -18";
console.log( str.match(reg) ); // 0, 12, 123, *!*8*/!*
console.log( str.match(regexp) ); // 0, 12, 123, *!*8*/!*
```
As you can see, it matches `match:8`, from `subject:-18`. To exclude it, we need to ensure that the regexp starts matching a number not from the middle of another (non-matching) number.
@ -20,9 +20,9 @@ We can do it by specifying another negative lookbehind: `pattern:(?<!-)(?<!\d)\d
We can also join them into a single lookbehind here:
```js run
let reg = /(?<![-\d])\d+/g;
let regexp = /(?<![-\d])\d+/g;
let str = "0 12 -5 123 -18";
alert( str.match(reg) ); // 0, 12, 123
alert( str.match(regexp) ); // 0, 12, 123
```

View file

@ -6,9 +6,9 @@ Create a regexp that looks for only non-negative ones (zero is allowed).
An example of use:
```js
let reg = /your regexp/g;
let regexp = /your regexp/g;
let str = "0 12 -5 123 -18";
alert( str.match(reg) ); // 0, 12, 123
alert( str.match(regexp) ); // 0, 12, 123
```

View file

@ -7,7 +7,7 @@
Например:
```js
let reg = /ваше регулярное выражение/;
let regexp = /ваше регулярное выражение/;
let str = `
<html>
@ -17,7 +17,7 @@ let str = `
</html>
`;
str = str.replace(reg, `<h1>Hello</h1>`);
str = str.replace(regexp, `<h1>Hello</h1>`);
```
После этого значение `str`:

View file

@ -96,18 +96,18 @@ In the example below the currency sign `pattern:(€|kr)` is captured, along wit
```js run
let str = "1 turkey costs 30€";
let reg = /\d+(?=(€|kr))/; // extra parentheses around €|kr
let regexp = /\d+(?=(€|kr))/; // extra parentheses around €|kr
alert( str.match(reg) ); // 30, €
alert( str.match(regexp) ); // 30, €
```
And here's the same for lookbehind:
```js run
let str = "1 turkey costs $30";
let reg = /(?<=(\$|£))\d+/;
let regexp = /(?<=(\$|£))\d+/;
alert( str.match(reg) ); // 30, $
alert( str.match(regexp) ); // 30, $
```
## Summary

View file

@ -19,10 +19,10 @@ We'll use a regexp `pattern:^(\w+\s?)*$`, it specifies 0 or more such words.
In action:
```js run
let reg = /^(\w+\s?)*$/;
let regexp = /^(\w+\s?)*$/;
alert( reg.test("A good string") ); // true
alert( reg.test("Bad characters: $@#") ); // false
alert( regexp.test("A good string") ); // true
alert( regexp.test("Bad characters: $@#") ); // false
```
It seems to work. The result is correct. Although, on certain strings it takes a lot of time. So long that JavaScript engine "hangs" with 100% CPU consumption.
@ -30,11 +30,11 @@ It seems to work. The result is correct. Although, on certain strings it takes a
If you run the example below, you probably won't see anything, as JavaScript will just "hang". A web-browser will stop reacting on events, the UI will stop working. After some time it will suggest to reloaad the page. So be careful with this:
```js run
let reg = /^(\w+\s?)*$/;
let regexp = /^(\w+\s?)*$/;
let str = "An input string that takes a long time or even makes this regexp to hang!";
// will take a very long time
alert( reg.test(str) );
alert( regexp.test(str) );
```
Some regular expression engines can handle such search, but most of them can't.
@ -50,12 +50,12 @@ And, to make things more obvious, let's replace `pattern:\w` with `pattern:\d`.
<!-- let str = `AnInputStringThatMakesItHang!`; -->
```js run
let reg = /^(\d+)*$/;
let regexp = /^(\d+)*$/;
let str = "012345678901234567890123456789!";
// will take a very long time
alert( reg.test(str) );
alert( regexp.test(str) );
```
So what's wrong with the regexp?
@ -189,10 +189,10 @@ Let's rewrite the regular expression as `pattern:^(\w+\s)*\w*` - we'll look for
This regexp is equivalent to the previous one (matches the same) and works well:
```js run
let reg = /^(\w+\s)*\w*$/;
let regexp = /^(\w+\s)*\w*$/;
let str = "An input string that takes a long time or even makes this regex to hang!";
alert( reg.test(str) ); // false
alert( regexp.test(str) ); // false
```
Why did the problem disappear?
@ -272,26 +272,26 @@ There's more about the relation between possessive quantifiers and lookahead in
Let's rewrite the first example using lookahead to prevent backtracking:
```js run
let reg = /^((?=(\w+))\2\s?)*$/;
let regexp = /^((?=(\w+))\2\s?)*$/;
alert( reg.test("A good string") ); // true
alert( regexp.test("A good string") ); // true
let str = "An input string that takes a long time or even makes this regex to hang!";
alert( reg.test(str) ); // false, works and fast!
alert( regexp.test(str) ); // false, works and fast!
```
Here `pattern:\2` is used instead of `pattern:\1`, because there are additional outer parentheses. To avoid messing up with the numbers, we can give the parentheses a name, e.g. `pattern:(?<word>\w+)`.
```js run
// parentheses are named ?<word>, referenced as \k<word>
let reg = /^((?=(?<word>\w+))\k<word>\s?)*$/;
let regexp = /^((?=(?<word>\w+))\k<word>\s?)*$/;
let str = "An input string that takes a long time or even makes this regex to hang!";
alert( reg.test(str) ); // false
alert( regexp.test(str) ); // false
alert( reg.test("A correct string") ); // true
alert( regexp.test("A correct string") ); // true
```
The problem described in this article is called "catastrophic backtracking".

View file

@ -71,9 +71,9 @@ Usage example:
```js run
let str = '<h1>Hello, world!</h1>';
let reg = /<(.*?)>/g;
let regexp = /<(.*?)>/g;
let matchAll = str.matchAll(reg);
let matchAll = str.matchAll(regexp);
alert(matchAll); // [object RegExp String Iterator], not array, but an iterable
@ -118,7 +118,7 @@ alert( str.search( /ink/i ) ); // 10 (first match position)
If we need positions of further matches, we should use other means, such as finding them all with `str.matchAll(regexp)`.
## str.replace(str|reg, str|func)
## str.replace(str|regexp, str|func)
This is a generic method for searching and replacing, one of most useful ones. The swiss army knife for searching and replacing.
@ -238,7 +238,7 @@ The method `regexp.exec(str)` method returns a match for `regexp` in the string
It behaves differently depending on whether the regexp has flag `pattern:g`.
If there's no `pattern:g`, then `regexp.exec(str)` returns the first match exactly as `str.match(reg)`. This behavior doesn't bring anything new.
If there's no `pattern:g`, then `regexp.exec(str)` returns the first match exactly as `str.match(regexp)`. This behavior doesn't bring anything new.
But if there's flag `pattern:g`, then:
- A call to `regexp.exec(str)` returns the first match and saves the position immediately after it in the property `regexp.lastIndex`.
@ -272,7 +272,7 @@ For instance:
```js run
let str = 'Hello, world!';
let reg = /\w+/g; // without flag "g", lastIndex property is ignored
let regexp = /\w+/g; // without flag "g", lastIndex property is ignored
regexp.lastIndex = 5; // search from 5th position (from the comma)
alert( regexp.exec(str) ); // world
@ -285,7 +285,7 @@ Let's replace flag `pattern:g` with `pattern:y` in the example above. There will
```js run
let str = 'Hello, world!';
let reg = /\w+/y;
let regexp = /\w+/y;
regexp.lastIndex = 5; // search exactly at position 5
alert( regexp.exec(str) ); // null