regexp

2019-09-06 01:15:24 +03:00 · 2019-09-06 01:15:24 +03:00 · 681cae4b6a
commit 681cae4b6a
parent 20547570ff
16 changed files with 505 additions and 362 deletions
--- a/9-regular-expressions/14-regexp-lookahead-lookbehind/2-insert-after-head/solution.md
+++ b/9-regular-expressions/14-regexp-lookahead-lookbehind/2-insert-after-head/solution.md
@ -0,0 +1,29 @@
+
+Для того, чтобы вставить после тега `<body>`, нужно вначале его найти. Будем использовать регулярное выражение `pattern:<body.*>`.
+
+Далее, нам нужно оставить сам тег `<body>` на месте и добавить текст после него.
+
+Это можно сделать вот так:
+```js run
+let str = '...<body style="...">...';
+str = str.replace(/<body.*>/, '$&<h1>Hello</h1>');
+
+alert(str); // ...<body style="..."><h1>Hello</h1>...
+```
+
+В строке замены `$&` означает само совпадение, то есть мы заменяем `pattern:<body.*>` заменяется на самого себя плюс `<h1>Hello</h1>`.
+
+Альтернативный вариант - использовать ретроспективную проверку:
+
+```js run
+let str = '...<body style="...">...';
+str = str.replace(/(?<=<body.*>)/, `<h1>Hello</h1>`);
+
+alert(str); // ...<body style="..."><h1>Hello</h1>...
+```
+
+Такое регулярное выражение на каждой позиции будет проверять, не идёт ли прямо перед ней `pattern:<body.*>`. Если да - совпадение найдено. Но сам тег `pattern:<body.*>` в совпадение не входит, он только участвует в проверке. А других символов после проверки в нём нет, так что текст совпадения будет пустым.
+
+Происходит замена "пустой строки", перед которой идёт `pattern:<body.*>` на `<h1>Hello</h1>`. Что, как раз, и есть вставка этой строки после `<body>`.
+
+P.S. Этому регулярному выражению не помешают флаги: `pattern:/<body.*>/si`, чтобы в "точку" входил перевод строки (тег может занимать несколько строк), а также чтобы теги в другом регистре типа `match:<BODY>` тоже находились.
--- a/9-regular-expressions/14-regexp-lookahead-lookbehind/2-insert-after-head/task.md
+++ b/9-regular-expressions/14-regexp-lookahead-lookbehind/2-insert-after-head/task.md
@ -0,0 +1,30 @@
+# Вставьте после фрагмента
+
+Есть строка с HTML-документом.
+
+Вставьте после тега `<body>` (у него могут быть атрибуты) строку `<h1>Hello</h1>`.
+
+Например:
+
+```js
+let reg = /ваше регулярное выражение/;
+
+let str = `
+<html>
+  <body style="height: 200px">
+  ...
+  </body>
+</html>
+`;
+
+str = str.replace(reg, `<h1>Hello</h1>`);
+```
+
+После этого значение `str`:
+```html
+<html>
+  <body style="height: 200px"><h1>Hello</h1>
+  ...
+  </body>
+</html>
+```
--- a/9-regular-expressions/14-regexp-lookahead-lookbehind/article.md
+++ b/9-regular-expressions/14-regexp-lookahead-lookbehind/article.md
@ -1,54 +1,82 @@
 # Lookahead and lookbehind

-Sometimes we need to match a pattern only if followed by another pattern. For instance, we'd like to get the price from a string like `subject:1 turkey costs 30€`.
+Sometimes we need to find only those matches for a pattern that are followed or preceeded by another pattern.

-We need a number (let's say a price has no decimal point) followed by `subject:€` sign.
+There's a special syntax for that, called "lookahead" and "lookbehind", together referred to as "lookaround".

-That's what lookahead is for.
+For the start, let's find the price from the string like `subject:1 turkey costs 30€`. That is: a number, followed by `subject:€` sign.

 ## Lookahead

-The syntax is: `pattern:x(?=y)`, it means "look for `pattern:x`, but match only if followed by `pattern:y`".
+The syntax is: `pattern:X(?=Y)`, it means "look for `pattern:X`, but match only if followed by `pattern:Y`". There may be any pattern instead of `pattern:X` and `pattern:Y`.

-For an integer amount followed by `subject:€`, the regexp will be `pattern:\d+(?=€)`:
+For an integer number followed by `subject:€`, the regexp will be `pattern:\d+(?=€)`:

 ```js run
 let str = "1 turkey costs 30€";

-alert( str.match(/\d+(?=€)/) ); // 30 (correctly skipped the sole number 1)
+alert( str.match(/\d+(?=€)/) ); // 30, the number 1 is ignored, as it's not followed by €
 ```

-Let's say we want a quantity instead, that is a number, NOT followed by `subject:€`.
+Please note: the lookahead is merely a test, the contents of the parentheses `pattern:(?=...)` is not included in the result `match:30`.

-Here a negative lookahead can be applied.
+When we look for `pattern:X(?=Y)`, the regular expression engine finds `pattern:X` and then checks if there's `pattern:Y` immediately after it. If it's not so, then the potential match is skipped, and the search continues.

-The syntax is: `pattern:x(?!y)`, it means "search `pattern:x`, but only if not followed by `pattern:y`".
+More complex tests are possible, e.g. `pattern:X(?=Y)(?=Z)` means:
+
+1. Find `pattern:X`.
+2. Check if `pattern:Y` is immediately after `pattern:X` (skip if isn't).
+3. Check if `pattern:Z` is immediately after `pattern:X` (skip if isn't).
+4. If both tests passed, then it's the match.
+
+In other words, such pattern means that we're looking for `pattern:X` followed by   `pattern:Y` and `pattern:Z` at the same time.
+
+That's only possible if patterns `pattern:Y` and `pattern:Z` aren't mutually exclusive.
+
+For example, `pattern:\d+(?=\s)(?=.*30)` looks for `pattern:\d+` only if it's followed by a space, and there's `30` somewhere after it:
+
+```js run
+let str = "1 turkey costs 30€";
+
+alert( str.match(/\d+(?=\s)(?=.*30)/) ); // 1
+```
+
+In our string that exactly matches the number `1`.
+
+## Negative lookahead
+
+Let's say that we want a quantity instead, not a price from the same string. That's a number `pattern:\d+`, NOT followed by `subject:€`.
+
+For that, a negative lookahead can be applied.
+
+The syntax is: `pattern:X(?!Y)`, it means "search `pattern:X`, but only if not followed by `pattern:Y`".

 ```js run
 let str = "2 turkeys cost 60€";

-alert( str.match(/\d+(?!€)/) ); // 2 (correctly skipped the price)
+alert( str.match(/\d+(?!€)/) ); // 2 (the price is skipped)
 ```

 ## Lookbehind

-Lookahead allows to add a condition for "what goes after".
+Lookahead allows to add a condition for "what follows".

-Lookbehind is similar, but it looks behind. That is, it allows to match a pattern only if there's something before.
+Lookbehind is similar, but it looks behind. That is, it allows to match a pattern only if there's something before it.

 The syntax is:
- Positive lookbehind: `pattern:(?<=y)x`, matches `pattern:x`, but only if it follows after `pattern:y`.
- Negative lookbehind: `pattern:(?<!y)x`, matches `pattern:x`, but only if there's no `pattern:y` before.
+- Positive lookbehind: `pattern:(?<=Y)X`, matches `pattern:X`, but only if there's  `pattern:Y` before it.
+- Negative lookbehind: `pattern:(?<!Y)X`, matches `pattern:X`, but only if there's no `pattern:Y` before it.

 For example, let's change the price to US dollars. The dollar sign is usually before the number, so to look for `$30` we'll use `pattern:(?<=\$)\d+` -- an amount preceded by `subject:$`:

 ```js run
 let str = "1 turkey costs $30";

+// the dollar sign is escaped \$
 alert( str.match(/(?<=\$)\d+/) ); // 30 (skipped the sole number)
 ```

-And, to find the quantity -- a number, not preceded by `subject:$`, we can use a negative lookbehind `pattern:(?<!\$)\d+`:
+And, if we need the quantity -- a number, not preceded by `subject:$`, then we can use a negative lookbehind `pattern:(?<!\$)\d+`:

 ```js run
 let str = "2 turkeys cost $60";
@ -56,15 +84,15 @@ let str = "2 turkeys cost $60";
 alert( str.match(/(?<!\$)\d+/) ); // 2 (skipped the price)
 ```

-## Capture groups
+## Capturing groups

-Generally, what's inside the lookaround (a common name for both lookahead and lookbehind) parentheses does not become a part of the match.
+Generally, the contents inside lookaround parentheses does not become a part of the result.

 E.g. in the pattern `pattern:\d+(?=€)`, the `pattern:€` sign doesn't get captured as a part of the match. That's natural: we look for a number `pattern:\d+`, while `pattern:(?=€)` is just a test that it should be followed by `subject:€`.

-But in some situations we might want to capture the lookaround expression as well, or a part of it. That's possible. Just  wrap that into additional parentheses.
+But in some situations we might want to capture the lookaround expression as well, or a part of it. That's possible. Just wrap that part into additional parentheses.

-For instance, here the currency `pattern:(€|kr)` is captured, along with the amount:
+In the example below the currency sign `pattern:(€|kr)` is captured, along with the amount:

 ```js run
 let str = "1 turkey costs 30€";
@ -82,28 +110,21 @@ let reg = /(?<=(\$|£))\d+/;
 alert( str.match(reg) ); // 30, $
 ```

-Please note that for lookbehind the order stays be same, even though lookahead parentheses are before the main pattern.
-
-Usually parentheses are numbered left-to-right, but lookbehind is an exception, it is always captured after the main pattern. So the match for `pattern:\d+` goes in the result first, and then for `pattern:(\$|£)`.
-
-
 ## Summary

-Lookahead and lookbehind (commonly referred to as "lookaround") are useful when we'd like to take something into the match depending on the context before/after it.
+Lookahead and lookbehind (commonly referred to as "lookaround") are useful when we'd like to match something depending on the context before/after it.

 For simple regexps we can do the similar thing manually. That is: match everything, in any context, and then filter by context in the loop.

-Remember, `str.matchAll` and `reg.exec` return matches with `.index` property, so we know where exactly in the text it is, and can check the context.
+Remember, `str.match` (without flag `pattern:g`) and `str.matchAll` (always) return matches as arrays with `index` property, so we know where exactly in the text it is, and can check the context.

-But generally regular expressions are more convenient.
+But generally lookaround is more convenient.

 Lookaround types:

 | Pattern            | type             | matches |
 |--------------------|------------------|---------|
-| `pattern:x(?=y)`   | Positive lookahead | `x` if followed by `pattern:y` |
-| `pattern:x(?!y)`   | Negative lookahead | `x` if not followed by `pattern:y` |
-| `pattern:(?<=y)x` |  Positive lookbehind | `x` if after `pattern:y` |
-| `pattern:(?<!y)x` | Negative lookbehind | `x` if not after `pattern:y` |
-
-Lookahead can also used to disable backtracking. Why that may be needed and other details  -- see in the next chapter.
+| `X(?=Y)`   | Positive lookahead | `pattern:X` if followed by `pattern:Y` |
+| `X(?!Y)`   | Negative lookahead | `pattern:X` if not followed by `pattern:Y` |
+| `(?<=Y)X` |  Positive lookbehind | `pattern:X` if after `pattern:Y` |
+| `(?<!Y)X` | Negative lookbehind | `pattern:X` if not after `pattern:Y` |