From 7ddea43ab46f264f3b6e763cb081b50ca402f6ca Mon Sep 17 00:00:00 2001 From: Ilya Kantor Date: Mon, 20 Mar 2017 20:52:29 +0300 Subject: [PATCH] up --- .../01-regexp-introduction/article.md | 3 +- .../02-regexp-methods/article.md | 43 +++- .../09-regexp-groups/article.md | 31 ++- .../01-find-programming-language/solution.md | 28 +-- .../01-find-programming-language/task.md | 11 +- .../03-match-quoted-string/solution.md | 20 +- .../03-match-quoted-string/task.md | 37 ++-- .../04-match-exact-tag/solution.md | 15 +- .../04-match-exact-tag/task.md | 13 +- .../12-regexp-ahchors/1-start-end/solution.md | 6 - .../12-regexp-ahchors/1-start-end/task.md | 4 - .../12-regexp-ahchors/2-test-mac/solution.md | 20 -- .../12-regexp-ahchors/2-test-mac/task.md | 21 -- .../12-regexp-ahchors/article.md | 85 -------- .../12-regexp-anchors/1-start-end/solution.md | 6 + .../12-regexp-anchors/1-start-end/task.md | 3 + .../12-regexp-anchors/2-test-mac/solution.md | 21 ++ .../12-regexp-anchors/2-test-mac/task.md | 20 ++ .../12-regexp-anchors/article.md | 57 +++++ .../13-regexp-multiline-mode/article.md | 83 ++++---- .../14-regexp-lookahead/article.md | 4 - .../article.md | 194 +++++++++--------- 22 files changed, 382 insertions(+), 343 deletions(-) delete mode 100644 10-regular-expressions-javascript/12-regexp-ahchors/1-start-end/solution.md delete mode 100644 10-regular-expressions-javascript/12-regexp-ahchors/1-start-end/task.md delete mode 100644 10-regular-expressions-javascript/12-regexp-ahchors/2-test-mac/solution.md delete mode 100644 10-regular-expressions-javascript/12-regexp-ahchors/2-test-mac/task.md delete mode 100644 10-regular-expressions-javascript/12-regexp-ahchors/article.md create mode 100644 10-regular-expressions-javascript/12-regexp-anchors/1-start-end/solution.md create mode 100644 10-regular-expressions-javascript/12-regexp-anchors/1-start-end/task.md create mode 100644 10-regular-expressions-javascript/12-regexp-anchors/2-test-mac/solution.md create mode 100644 10-regular-expressions-javascript/12-regexp-anchors/2-test-mac/task.md create mode 100644 10-regular-expressions-javascript/12-regexp-anchors/article.md delete mode 100644 10-regular-expressions-javascript/14-regexp-lookahead/article.md diff --git a/10-regular-expressions-javascript/01-regexp-introduction/article.md b/10-regular-expressions-javascript/01-regexp-introduction/article.md index 4230296c..829d0809 100644 --- a/10-regular-expressions-javascript/01-regexp-introduction/article.md +++ b/10-regular-expressions-javascript/01-regexp-introduction/article.md @@ -102,8 +102,7 @@ There are only 5 of them in JavaScript: : Enables full unicode support. The flag enables correct processing of surrogate pairs. More about that in the chapter . `y` -: Sticky mode (covered in [todo]) - +: Sticky mode (covered in the [next chapter](info:regexp-methods#y-flag)) ## The "i" flag diff --git a/10-regular-expressions-javascript/02-regexp-methods/article.md b/10-regular-expressions-javascript/02-regexp-methods/article.md index 38e90f5d..ddd987b3 100644 --- a/10-regular-expressions-javascript/02-regexp-methods/article.md +++ b/10-regular-expressions-javascript/02-regexp-methods/article.md @@ -345,6 +345,42 @@ alert( regexp.exec(str).index ); // 34, the search starts from the 30th position ``` ```` +## The "y" flag [#y-flag] + +The `y` flag means that the search should find a match exactly at the position specified by the property `regexp.lastIndex` and only there. + +In other words, normally the search is made in the whole string: `pattern:/javascript/` looks for "javascript" everywhere in the string. + +But when a regexp has the `y` flag, then it only looks for the match at the position specified in `regexp.lastIndex` (`0` by default). + +For instance: + +```js run +let str = "I love JavaScript!"; + +let reg = /javascript/iy; + +alert( reg.lastIndex ); // 0 (default) +alert( str.match(reg) ); // null, not found at position 0 + +reg.lastIndex = 7; +alert( str.match(reg) ); // JavaScript (right, that word starts at position 7) + +// for any other reg.lastIndex the result is null +``` + +The regexp `pattern:/javascript/iy` can only be found if we set `reg.lastIndex=7`, because due to `y` flag the engine only tries to find it in the single place within a string -- from the `reg.lastIndex` position. + +So, what's the point? Where do we apply that? + +The reason is performance. + +The `y` flag works great for parsers -- programs that need to "read" the text and build in-memory syntax structure or perform actions from it. For that we move along the text and apply regular expressions to see what we have next: a string? A number? Something else? + +The `y` flag allows to apply a regular expression (or many of them one-by-one) exactly at the given position and when we understand what's there, we can move on -- step by step examining the text. + +Without the flag the regexp engine always searches till the end of the text, that takes time, especially if the text is large. So our parser would be very slow. The `y` flag is exactly the right thing here. + ## Summary, recipes Methods become much easier to understand if we separate them by their use in real-life tasks. @@ -365,4 +401,9 @@ To search and replace: To split the string: : - `str.split(str|reg)` -Now we know the methods and can use regular expressions. But we need to learn their syntax and capabilities, so let's move on. +We also covered two flags: + +- The `g` flag to find all matches (global search), +- The `y` flag to search at exactly the given position inside the text. + +Now we know the methods and can use regular expressions. But we need to learn their syntax, so let's move on. diff --git a/10-regular-expressions-javascript/09-regexp-groups/article.md b/10-regular-expressions-javascript/09-regexp-groups/article.md index 7ea81e74..6bd14e7c 100644 --- a/10-regular-expressions-javascript/09-regexp-groups/article.md +++ b/10-regular-expressions-javascript/09-regexp-groups/article.md @@ -1,4 +1,4 @@ -# Bracket groups +# Capturing groups A part of the pattern can be enclosed in parentheses `pattern:(...)`. That's called a "capturing group". @@ -21,6 +21,35 @@ Without parentheses, the pattern `pattern:/go+/` means `subject:g`, followed by Parentheses group the word `pattern:(go)` together. +Let's make something more complex -- a regexp to match an email. + +Examples of emails: + +``` +my@mail.com +john.smith@site.com.uk +``` + +The pattern: `pattern:[-.\w]+@([\w-]+\.)+[\w-]{2,20}`. + +- The first part before `@` may include wordly characters, a dot and a dash `pattern:[-.\w]+`, like `match:john.smith`. +- Then `pattern:@` +- And then the domain. May be a second-level domain `site.com` or with subdomains like `host.site.com.uk`. We can match it as "a word followed by a dot" repeated one or more times for subdomains: `match:mail.` or `match:site.com.`, and then "a word" for the last part: `match:.com` or `match:.uk`. + + The word followed by a dot is `pattern:(\w+\.)+` (repeated). The last word should not have a dot at the end, so it's just `\w{2,20}`. The quantifier `pattern:{2,20}` limits the length, because domain zones are like `.uk` or `.com` or `.museum`, but can't be longer than 20 characters. + + So the domain pattern is `pattern:(\w+\.)+\w{2,20}`. Now we replace `\w` with `[\w-]`, because dashes are also allowed in domains, and we get the final result. + +That regexp is not perfect, but usually works. It's short and good enough to fix errors or occasional mistypes. + +For instance, here we can find all emails in the string: + +```js run +let reg = /[-.\w]+@([\w-]+\.)+[\w-]{2,20}/g; + +alert("my@mail.com @ his@site.com.uk".match(reg)); // my@mail.com,his@site.com.uk +``` + ## Contents of parentheses diff --git a/10-regular-expressions-javascript/11-regexp-alternation/01-find-programming-language/solution.md b/10-regular-expressions-javascript/11-regexp-alternation/01-find-programming-language/solution.md index 289ba6d0..3419aa49 100644 --- a/10-regular-expressions-javascript/11-regexp-alternation/01-find-programming-language/solution.md +++ b/10-regular-expressions-javascript/11-regexp-alternation/01-find-programming-language/solution.md @@ -1,33 +1,33 @@ -Сначала неправильный способ. -Если перечислить языки один за другим через `|`, то получится совсем не то: +The first idea can be to list the languages with `|` in-between. + +But that doesn't work right: ```js run -var reg = /Java|JavaScript|PHP|C|C\+\+/g; +let reg = /Java|JavaScript|PHP|C|C\+\+/g; -var str = "Java, JavaScript, PHP, C, C++"; +let str = "Java, JavaScript, PHP, C, C++"; alert( str.match(reg) ); // Java,Java,PHP,C,C ``` -Как видно, движок регулярных выражений ищет альтернации в порядке их перечисления. То есть, он сначала смотрит, есть ли `match:Java`, а если нет -- ищет `match:JavaScript`. +The regular expression engine looks for alternations one-by-one. That is: first it checks if we have `match:Java`, otherwise -- looks for `match:JavaScript` and so on. -Естественно, при этом `match:JavaScript` не будет найдено никогда. +As a result, `match:JavaScript` can never be found, just because `match:Java` is checked first. -То же самое -- с языками `match:C` и `match:C++`. +The same with `match:C` and `match:C++`. -Есть два решения проблемы: +There are two solutions for that problem: -1. Поменять порядок, чтобы более длинное совпадение проверялось первым: `pattern:JavaScript|Java|C\+\+|C|PHP`. -2. Соединить длинный вариант с коротким: `pattern:Java(Script)?|C(\+\+)?|PHP`. +1. Change the order to check the longer match first: `pattern:JavaScript|Java|C\+\+|C|PHP`. +2. Merge variants with the same start: `pattern:Java(Script)?|C(\+\+)?|PHP`. -В действии: +In action: ```js run -var reg = /Java(Script)?|C(\+\+)?|PHP/g; +let reg = /Java(Script)?|C(\+\+)?|PHP/g; -var str = "Java, JavaScript, PHP, C, C++"; +let str = "Java, JavaScript, PHP, C, C++"; alert( str.match(reg) ); // Java,JavaScript,PHP,C,C++ ``` - diff --git a/10-regular-expressions-javascript/11-regexp-alternation/01-find-programming-language/task.md b/10-regular-expressions-javascript/11-regexp-alternation/01-find-programming-language/task.md index b93570f3..61b9526f 100644 --- a/10-regular-expressions-javascript/11-regexp-alternation/01-find-programming-language/task.md +++ b/10-regular-expressions-javascript/11-regexp-alternation/01-find-programming-language/task.md @@ -1,6 +1,11 @@ -# Найдите языки программирования +# Find programming languages -Существует много языков программирования, например Java, JavaScript, PHP, C, C++. +There are many programming languages, for instance Java, JavaScript, PHP, C, C++. -Напишите регулярное выражение, которое найдёт их все в строке "Java JavaScript PHP C++ C" +Create a regexp that finds them in the string `subject:Java JavaScript PHP C++ C`: +```js +let reg = /your regexp/g; + +alert("Java JavaScript PHP C++ C".match(reg)); // Java JavaScript PHP C++ C +``` diff --git a/10-regular-expressions-javascript/11-regexp-alternation/03-match-quoted-string/solution.md b/10-regular-expressions-javascript/11-regexp-alternation/03-match-quoted-string/solution.md index ece03804..143be870 100644 --- a/10-regular-expressions-javascript/11-regexp-alternation/03-match-quoted-string/solution.md +++ b/10-regular-expressions-javascript/11-regexp-alternation/03-match-quoted-string/solution.md @@ -1,17 +1,17 @@ -Решение задачи: `pattern:/"(\\.|[^"\\])*"/g`. +The solution: `pattern:/"(\\.|[^"\\])*"/g`. -То есть: +Step by step: -- Сначала ищем кавычку `pattern:"` -- Затем, если далее слэш `pattern:\\` (удвоение слэша -- техническое, для вставки в регэксп, на самом деле там один слэш), то после него также подойдёт любой символ (точка). -- Если не слэш, то берём любой символ, кроме кавычек (которые будут означать конец строки) и слэша (чтобы предотвратить одинокие слэши, сам по себе единственный слэш не нужен, он должен экранировать какой-то символ) `pattern:[^"\\]` -- ...И так жадно, до закрывающей кавычки. +- First we look for an opening quote `pattern:"` +- Then if we have a backslash `pattern:\\` (we technically have to double it in the pattern, because it is a special character, so that's a single backslash in fact), then any character is fine after it (a dot). +- Otherwise we take any character except a quote (that would mean the end of the string) and a backslash (to prevent lonely backslashes, the backslash is only used with some other symbol after it): `pattern:[^"\\]` +- ...And so on till the closing quote. -В действии: +In action: ```js run -var re = /"(\\.|[^"\\])*"/g; -var str = '.. "test me" .. "Скажи \\"Привет\\"!" .. "\\r\\n\\\\" ..'; +let reg = /"(\\.|[^"\\])*"/g; +let str = ' .. "test me" .. "Say \\"Hello\\"!" .. "\\\\ \\"" .. '; -alert( str.match(re) ); // "test me","Скажи \"Привет\"!","\r\n\\" +alert( str.match(reg) ); // "test me","Say \"Hello\"!","\\ \"" ``` diff --git a/10-regular-expressions-javascript/11-regexp-alternation/03-match-quoted-string/task.md b/10-regular-expressions-javascript/11-regexp-alternation/03-match-quoted-string/task.md index 2bde0073..2ccac4bd 100644 --- a/10-regular-expressions-javascript/11-regexp-alternation/03-match-quoted-string/task.md +++ b/10-regular-expressions-javascript/11-regexp-alternation/03-match-quoted-string/task.md @@ -1,25 +1,32 @@ -# Найдите строки в кавычках +# Find quoted strings -Найдите в тексте при помощи регэкспа строки в двойных кавычках `subject:"..."`. +Create a regexp to find strings in double quotes `subject:"..."`. -В строке поддерживается экранирование при помощи слеша -- примерно в таком же виде, как в обычных строках JavaScript. То есть, строка может содержать любые символы, экранированные слэшем, в частности: `subject:\"`, `subject:\n`, и даже сам слэш в экранированном виде: `subject:\\`. +The important part is that strings should support escaping, in the same way as JavaScript strings do. For instance, quotes can be inserted as `subject:\"` a newline as `subject:\n`, and the slash itself as `subject:\\`. -Здесь особо важно, что двойная кавычка после слэша не оканчивает строку, а считается её частью. В этом и состоит основная сложность задачи, которая без этого условия была бы элементарной. - -Пример совпадающих строк: ```js -.. *!*"test me"*/!* .. (обычная строка) -.. *!*"Скажи \"Привет\"!"*/!* ... (строка с кавычками внутри) -.. *!*"\r\n\\"*/!* .. (строка со спец. символами и слэшем внутри) +let str = "Just like \"here\"."; ``` -Заметим, что в JavaScript такие строки удобнее всего задавать в одинарных кавычках, и слеши придётся удвоить (в одинарных кавычках они являются экранирующими символами): +For us it's important that an escaped quote `subject:\"` does not end a string. + +So we should look from one quote to the other ignoring escaped quotes on the way. + +That's the essential part of the task, otherwise it would be trivial. + +Examples of strings to match: +```js +.. *!*"test me"*/!* .. +.. *!*"Say \"Hello\"!"*/!* ... (escaped quotes inside) +.. *!*"\\"*/!* .. (double slash inside) +.. *!*"\\ \""*/!* .. (double slash and an escaped quote inside) +``` + +In JavaScript we need to double the slashes to pass them right into the string, like this: -Пример задания тестовой строки в JavaScript: ```js run -var str = ' .. "test me" .. "Скажи \\"Привет\\"!" .. "\\r\\n\\\\" .. '; +let str = ' .. "test me" .. "Say \\"Hello\\"!" .. "\\\\ \\"" .. '; -// эта строка будет такой: -alert(str); // .. "test me" .. "Скажи \"Привет\"!" .. "\r\n\\" .. +// the in-memory string +alert(str); // .. "test me" .. "Say \"Hello\"!" .. "\\ \"" .. ``` - diff --git a/10-regular-expressions-javascript/11-regexp-alternation/04-match-exact-tag/solution.md b/10-regular-expressions-javascript/11-regexp-alternation/04-match-exact-tag/solution.md index e8895fdd..70c4de91 100644 --- a/10-regular-expressions-javascript/11-regexp-alternation/04-match-exact-tag/solution.md +++ b/10-regular-expressions-javascript/11-regexp-alternation/04-match-exact-tag/solution.md @@ -1,17 +1,16 @@ -Начало шаблона очевидно: `pattern:`, так как `match:` удовлетворяет этому регэкспу. +...But then we can't simply write `pattern:`, because `match:` would match it. -Нужно уточнить его. После `match:`. -На языке регэкспов: `pattern:|\s.*?>)`. +In the regexp language: `pattern:|\s.*?>)`. -В действии: +In action: ```js run -var re = /|\s.*?>)/g; +let reg = /|\s.*?>)/g; -alert( "