diff --git a/9-regular-expressions/16-regexp-sticky/article.md b/9-regular-expressions/16-regexp-sticky/article.md index 31c87fc0..161ce4dd 100644 --- a/9-regular-expressions/16-regexp-sticky/article.md +++ b/9-regular-expressions/16-regexp-sticky/article.md @@ -3,11 +3,9 @@ The flag `pattern:y` allows to perform the search at the given position in the source string. -To grasp the use case of `pattern:y` flag, and see how great it is, let's explore a practical example. +To grasp the use case of `pattern:y` flag, and better understand the ways of regexps, let's explore a practical example. -One of common tasks for regexps is "lexical analysis": we get a text, e.g. in a programming language, and analyze it for structural elements. - -For instance, HTML has tags and attributes, JavaScript code has functions, variables, and so on. +One of common tasks for regexps is "lexical analysis": we get a text, e.g. in a programming language, and need to find its structural elements. For instance, HTML has tags and attributes, JavaScript code has functions, variables, and so on. Writing lexical analyzers is a special area, with its own tools and algorithms, so we don't go deep in there, but there's a common task: to read something at the given position. @@ -15,24 +13,27 @@ E.g. we have a code string `subject:let varName = "value"`, and we need to read We'll look for variable name using regexp `pattern:\w+`. Actually, JavaScript variable names need a bit more complex regexp for accurate matching, but here it doesn't matter. -A call to `str.match(/\w+/)` will find only the first word in the line. Or all words with the flag `pattern:g`. But we need only one word at position `4`. +- A call to `str.match(/\w+/)` will find only the first word in the line (`var`). That's not it. +- We can add the flag `pattern:g`. But then the call `str.match(/\w+/g)` will look for all words in the text, while we need one word at position `4`. Again, not what we need. -To search from the given position, we can use method `regexp.exec(str)`. +**So, how to search for a regexp exactly at the given position?** -If the `regexp` doesn't have flags `pattern:g` or `pattern:y`, then this method looks for the first match in the string `str`, exactly like `str.match(regexp)`. Such simple no-flags case doesn't interest us here. +Let's try using method `regexp.exec(str)`. -If there's flag `pattern:g`, then it performs the search in the string `str`, starting from position stored in its `regexp.lastIndex` property. And, if it finds a match, then sets `regexp.lastIndex` to the index immediately after the match. +For a `regexp` without flags `pattern:g` and `pattern:y`, this method looks only for the first match, it works exactly like `str.match(regexp)`. -When a regexp is created, its `lastIndex` is `0`. +...But if there's flag `pattern:g`, then it performs the search in `str`, starting from position stored in the `regexp.lastIndex` property. And, if it finds a match, then sets `regexp.lastIndex` to the index immediately after the match. + +In other words, `regexp.lastIndex` serves as a starting point for the search, that each `regexp.exec(str)` call resets to the new value ("after the last match"). That's only if there's `pattern:g` flag, of course. So, successive calls to `regexp.exec(str)` return matches one after another. -An example (with flag `pattern:g`): +Here's an example of such calls: ```js run -let str = 'let varName'; - +let str = 'let varName'; // Let's find all words in this string let regexp = /\w+/g; + alert(regexp.lastIndex); // 0 (initially lastIndex=0) let word1 = regexp.exec(str); @@ -48,8 +49,6 @@ alert(word3); // null (no more matches) alert(regexp.lastIndex); // 0 (resets at search end) ``` -Every match is returned as an array with groups and additional properties. - We can get all matches in the loop: ```js run @@ -65,11 +64,13 @@ while (result = regexp.exec(str)) { } ``` -Such use of `regexp.exec` is an alternative to method `str.matchAll`. +Such use of `regexp.exec` is an alternative to method `str.matchAll`, with a bit more control over the process. -Unlike other methods, we can set our own `lastIndex`, to start the search from the given position. +Let's go back to our task. -For instance, let's find a word, starting from position `4`: +We can manually set `lastIndex` to `4`, to start the search from the given position! + +Like this: ```js run let str = 'let varName = "value"'; @@ -84,9 +85,15 @@ let word = regexp.exec(str); alert(word); // varName ``` +Hooray! Problem solved! + We performed a search of `pattern:\w+`, starting from position `regexp.lastIndex = 4`. -Please note: the search starts at position `lastIndex` and then goes further. If there's no word at position `lastIndex`, but it's somewhere after it, then it will be found: +The result is correct. + +...But wait, not so fast. + +Please note: the `regexp.exec` call start searching at position `lastIndex` and then goes further. If there's no word at position `lastIndex`, but it's somewhere after it, then it will be found: ```js run let str = 'let varName = "value"'; @@ -94,17 +101,19 @@ let str = 'let varName = "value"'; let regexp = /\w+/g; *!* +// start the search from position 3 regexp.lastIndex = 3; */!* -let word = regexp.exec(str); +let word = regexp.exec(str); +// found the match at position 4 alert(word[0]); // varName alert(word.index); // 4 ``` -...So, with flag `pattern:g` property `lastIndex` sets the starting position for the search. +For some tasks, including the lexical analysis, that's just wrong. We need to find a match exactly at the given position at the text, not somewhere after it. And that's what the flag `y` is for. -**Flag `pattern:y` makes `regexp.exec` to look exactly at position `lastIndex`, not before, not after it.** +**The flag `pattern:y` makes `regexp.exec` to search exactly at position `lastIndex`, not "starting from" it.** Here's the same search with flag `pattern:y`: @@ -122,6 +131,8 @@ alert( regexp.exec(str) ); // varName (word at position 4) As we can see, regexp `pattern:/\w+/y` doesn't match at position `3` (unlike the flag `pattern:g`), but matches at position `4`. -Imagine, we have a long text, and there are no matches in it, at all. Then searching with flag `pattern:g` will go till the end of the text, and this will take significantly more time than the search with flag `pattern:y`. +Not only that's what we need, there's an important performance gain when using flag `pattern:y`. -In such tasks like lexical analysis, there are usually many searches at an exact position. Using flag `pattern:y` is the key for a good performance. +Imagine, we have a long text, and there are no matches in it, at all. Then a search with flag `pattern:g` will go till the end of the text and find nothing, and this will take significantly more time than the search with flag `pattern:y`, that checks only the exact position. + +In tasks like lexical analysis, there are usually many searches at an exact position, to check what we have there. Using flag `pattern:y` is the key for correct implementations and a good performance.