ok
This commit is contained in:
parent
af0ee2a49e
commit
e2443e8de6
115 changed files with 3177 additions and 866 deletions
370
10-regular-expressions-javascript/02-regexp-methods/article.md
Normal file
370
10-regular-expressions-javascript/02-regexp-methods/article.md
Normal file
|
@ -0,0 +1,370 @@
|
|||
# Methods of RegExp and String
|
||||
|
||||
There are two sets of methods to deal with regular expressions.
|
||||
|
||||
1. First, regular expressions are objects of the built-in [RegExp](mdn:js/RegExp) class, it provides many methods.
|
||||
2. Besides that, there are methods in regular strings can work with regexps.
|
||||
|
||||
The structure is a bit messed up, so we'll first consider methods separately, and then -- practical recipes for common tasks.
|
||||
|
||||
[cut]
|
||||
|
||||
## str.search(reg)
|
||||
|
||||
We've seen this method already. It returns the position of the first match or `-1` if none found:
|
||||
|
||||
```js run
|
||||
let str = "A drop of ink may make a million think";
|
||||
|
||||
alert( str.search( *!*/a/i*/!* ) ); // 0 (the first position)
|
||||
```
|
||||
|
||||
**The important limitation: `search` always looks for the first match.**
|
||||
|
||||
We can't find next positions using `search`, there's just no syntax for that. But there are other mathods that can.
|
||||
|
||||
## str.match(reg), no "g" flag
|
||||
|
||||
The method `str.match` behavior varies depending on the `g` flag. First let's see the case without it.
|
||||
|
||||
Then `str.match(reg)` looks for the first match only.
|
||||
|
||||
The result is an array with that match and additional properties:
|
||||
|
||||
- `index` -- the position of the match inside the string,
|
||||
- `input` -- the subject string.
|
||||
|
||||
For instance:
|
||||
|
||||
```js run
|
||||
let str = "Fame is the thirst of youth";
|
||||
|
||||
let result = str.match( *!*/fame/i*/!* );
|
||||
|
||||
alert( result[0] ); // Fame (the match)
|
||||
alert( result.index ); // 0 (at the zero position)
|
||||
alert( result.input ); // "Fame is the thirst of youth" (the string)
|
||||
```
|
||||
|
||||
The array may have more than one element.
|
||||
|
||||
**If a part of the pattern is delimited by brackets `(...)`, then it becomes a separate element of the array.**
|
||||
|
||||
For instance:
|
||||
|
||||
```js run
|
||||
lar str = "Javascript is a programming language";
|
||||
|
||||
let result = str.match( *!*/JAVA(SCRIPT)/i*/!* );
|
||||
|
||||
alert( result[0] ); // Javascript (the whole match)
|
||||
alert( result[1] ); // script (the part of the match that corresponds to the brackets)
|
||||
alert( result.index ); // 0
|
||||
alert( result.input ); // Javascript is a programming language
|
||||
```
|
||||
|
||||
Due to the `i` flag the search is case-insensitive, so it finds `match:Javascript`. The part of the match that corresponds to `pattern:SCRIPT` becomes a separate array item.
|
||||
|
||||
We'll be back to brackets later in the chapter [todo]. They are great for search-and-replace.
|
||||
|
||||
## str.match(reg) with "g" flag
|
||||
|
||||
When there's a `"g"` flag, then `str.match` returns an array of all matches. There are no additional properties in that array, and brackets do not create any elements.
|
||||
|
||||
For instance:
|
||||
|
||||
```js run
|
||||
let str = "HO-Ho-ho!";
|
||||
|
||||
let result = str.match( *!*/ho/ig*/!* );
|
||||
|
||||
alert( result ); // HO, Ho, ho (all matches, case-insensitive)
|
||||
```
|
||||
|
||||
With brackets nothing changes, here we go:
|
||||
|
||||
|
||||
|
||||
```js run
|
||||
let str = "HO-Ho-ho!";
|
||||
|
||||
let result = str.match( *!*/h(o)/ig*/!* );
|
||||
|
||||
alert( result ); // HO, Ho, ho
|
||||
```
|
||||
|
||||
So, with `g` flag the `result` is a simple array of matches. No additional properties.
|
||||
|
||||
If we want to get information about match positions and use brackets then we should use [RegExp#exec](mdn:js/RegExp/exec) method that we'll cover below.
|
||||
|
||||
````warn header="If there are no matches, the call to `match` returns `null`"
|
||||
Please note, that's important. If there were no matches, the result is not an empty array, but `null`.
|
||||
|
||||
Keep that in mind to evade pitfalls like this:
|
||||
|
||||
```js run
|
||||
let str = "Hey-hey-hey!";
|
||||
|
||||
alert( str.match(/ho/gi).length ); // error! there's no length of null
|
||||
```
|
||||
````
|
||||
|
||||
## str.split(regexp|substr, limit)
|
||||
|
||||
Splits the string using the regexp (or a substring) as a delimiter.
|
||||
|
||||
We already used `split` with strings, like this:
|
||||
|
||||
```js run
|
||||
alert('12-34-56'.split('-')) // [12, 34, 56]
|
||||
```
|
||||
|
||||
But we can also pass a regular expression:
|
||||
|
||||
```js run
|
||||
alert('12-34-56'.split(/-/)) // [12, 34, 56]
|
||||
```
|
||||
|
||||
## str.replace(str|reg, str|func)
|
||||
|
||||
The swiss army knife for search and replace in strings.
|
||||
|
||||
The simplest use -- search and replace a substring, like this:
|
||||
|
||||
```js run
|
||||
// replace a dash by a colon
|
||||
alert('12-34-56'.replace("-", ":")) // 12:34-56
|
||||
```
|
||||
|
||||
When the first argument of `replace` is a string, it only looks for the first match.
|
||||
|
||||
To find all dashes, we need to use not the string `"-"`, but a regexp `pattern:/-/g`, with an obligatory `g` flag:
|
||||
|
||||
```js run
|
||||
// replace all dashes by a colon
|
||||
alert( '12-34-56'.replace( *!*/-/g*/!*, ":" ) ) // 12:34:56
|
||||
```
|
||||
|
||||
The second argument is a replacement string.
|
||||
|
||||
We can use special characters in it:
|
||||
|
||||
| Symbol | Inserts |
|
||||
|--------|--------|
|
||||
|`$$`|`"$"` |
|
||||
|`$&`|the whole match|
|
||||
|<code>$`</code>|a part of the string before the match|
|
||||
|`$'`|a part of the string after the match|
|
||||
|`$n`|if `n` is a 1-2 digit number, then it means the contents of n-th brackets counting fro left to right|
|
||||
|
||||
For instance let's use `$&` to replace all entries of `"John"` by `"Mr.John"`:
|
||||
|
||||
```js run
|
||||
let str = "John Doe, John Smith and John Bull.";
|
||||
|
||||
// for each John - replace it with Mr. and then John
|
||||
alert(str.replace(/John/g, 'Mr.$&'));
|
||||
// "Mr.John Doe, Mr.John Smith and Mr.John Bull.";
|
||||
```
|
||||
|
||||
Brackets are very often used together with `$1`, `$2`, like this:
|
||||
|
||||
```js run
|
||||
let str = "John Smith";
|
||||
|
||||
alert(str.replace(/(John) (Smith)/, '$2, $1')) // Smith, John
|
||||
```
|
||||
|
||||
**For situations that require "smart" replacements, the second argument can be a function.**
|
||||
|
||||
It will be called for each match, and its result will be inserted as a replacement.
|
||||
|
||||
For instance:
|
||||
|
||||
```js run
|
||||
let i = 0;
|
||||
|
||||
// replace each "ho" by the result of the function
|
||||
alert("HO-Ho-ho".replace(/ho/gi, function() {
|
||||
return ++i;
|
||||
})); // 1-2-3
|
||||
```
|
||||
|
||||
In the example above the function just returns the next number every time, but usually the result is based on the match.
|
||||
|
||||
The function is called with arguments `func(str, p1, p2, ..., pn, offset, s)`:
|
||||
|
||||
1. `str` -- the match,
|
||||
2. `p1, p2, ..., pn` -- contents of brackets (if there are any),
|
||||
3. `offset` -- position of the match,
|
||||
4. `s` -- the source string.
|
||||
|
||||
If there are no brackets in the regexp, then the function always has 3 arguments: `func(str, offset, s)`.
|
||||
|
||||
Let's use it to show full information about matches:
|
||||
|
||||
```js run
|
||||
// show and replace all matches
|
||||
function replacer(str, offset, s) {
|
||||
alert(`Found ${str} at position ${offset} in string ${s}`);
|
||||
return str.toLowerCase();
|
||||
}
|
||||
|
||||
let result = "HO-Ho-ho".replace(/ho/gi, replacer);
|
||||
alert( 'Result: ' + result ); // Result: ho-ho-ho
|
||||
|
||||
// shows each match:
|
||||
// Found HO at position 0 in string HO-Ho-ho
|
||||
// Found Ho at position 3 in string HO-Ho-ho
|
||||
// Found ho at position 6 in string HO-Ho-ho
|
||||
```
|
||||
|
||||
In the example below there are two brackets, so `replacer` is called with 5 arguments: `str` is the full match, then brackets, and then `offset` and `s`:
|
||||
|
||||
```js run
|
||||
function replacer(str, name, surname, offset, s) {
|
||||
// name is the first bracket, surname is the second one
|
||||
return surname + ", " + name;
|
||||
}
|
||||
|
||||
let str = "John Smith";
|
||||
|
||||
alert(str.replace(/(John) (Smith)/, replacer)) // Smith, John
|
||||
```
|
||||
|
||||
Using a function gives us the ultimate replacement power, because it gets all the information about the match, has access to outer variables and can do everything.
|
||||
|
||||
## regexp.test(str)
|
||||
|
||||
Let's move on to the methods of `RegExp` class, that are callable on regexps themselves.
|
||||
|
||||
The `test` method looks for any match and returns `true/false` whether he found it.
|
||||
|
||||
So it's basically the same as `str.search(reg) != -1`, for instance:
|
||||
|
||||
```js run
|
||||
let str = "I love Javascript";
|
||||
|
||||
// these two tests do the same
|
||||
alert( *!*/love/i*/!*.test(str) ); // true
|
||||
alert( str.search(*!*/love/i*/!*) != -1 ); // true
|
||||
```
|
||||
|
||||
An example with the negative answer:
|
||||
|
||||
```js run
|
||||
let str = "Bla-bla-bla";
|
||||
|
||||
alert( *!*/love/i*/!*.test(str) ); // false
|
||||
alert( str.search(*!*/love/i*/!*) != -1 ); // false
|
||||
```
|
||||
|
||||
## regexp.exec(str)
|
||||
|
||||
We've already seen these searching methods:
|
||||
|
||||
- `search` -- looks for the position of the match,
|
||||
- `match` -- if there's no `g` flag, returns the first match with brackets,
|
||||
- `match` -- if there's a `g` flag -- returns all matches, without separating brackets.
|
||||
|
||||
The `regexp.exec` method is a bit harder to use, but it allows to search all matches with brackets and positions.
|
||||
|
||||
It behaves differently depending on whether the regexp has the `g` flag.
|
||||
|
||||
- If there's no `g`, then `regexp.exec(str)` returns the first match, exactly as `str.match(reg)`.
|
||||
- If there's `g`, then `regexp.exec(str)` returns the first match and *remembers* the position after it in `regexp.lastIndex` property. The next call starts to search from `regexp.lastIndex` and returns the next match. If there are no more matches then `regexp.exec` returns `null` and `regexp.lastIndex` is set to `0`.
|
||||
|
||||
As we can see, the method gives us nothing new if we use it without the `g` flag, because `str.match` does exactly the same.
|
||||
|
||||
But the `g` flag allows to get all matches with their positions and bracket groups.
|
||||
|
||||
Here's the example how subsequent `regexp.exec` calls return matches one by one:
|
||||
|
||||
```js run
|
||||
let str = "A lot about Javascript at https://javascript.info";
|
||||
|
||||
let regexp = /JAVA(SCRIPT)/ig;
|
||||
|
||||
*!*
|
||||
// Look for the first match
|
||||
*/!*
|
||||
let matchOne = regexp.exec(str);
|
||||
alert( matchOne[0] ); // Javascript
|
||||
alert( matchOne[1] ); // script
|
||||
alert( matchOne.index ); // 12 (the position of the match)
|
||||
alert( matchOne.input ); // the same as str
|
||||
|
||||
alert( regexp.lastIndex ); // 22 (the position after the match)
|
||||
|
||||
*!*
|
||||
// Look for the second match
|
||||
*/!*
|
||||
let matchTwo = regexp.exec(str); // continue searching from regexp.lastIndex
|
||||
alert( matchTwo[0] ); // javascript
|
||||
alert( matchTwo[1] ); // script
|
||||
alert( matchTwo.index ); // 34 (the position of the match)
|
||||
alert( matchTwo.input ); // the same as str
|
||||
|
||||
alert( regexp.lastIndex ); // 44 (the position after the match)
|
||||
|
||||
*!*
|
||||
// Look for the third match
|
||||
*/!*
|
||||
let matchThree = regexp.exec(str); // continue searching from regexp.lastIndex
|
||||
alert( matchThree ); // null (no match)
|
||||
|
||||
alert( regexp.lastIndex ); // 0 (reset)
|
||||
```
|
||||
|
||||
As we can see, each `regexp.exec` call returns the match in a "full format": as an array with brackets, `index` and `input` properties.
|
||||
|
||||
The main use case for `regexp.exec` is to find all matches in a loop:
|
||||
|
||||
```js run
|
||||
let str = 'A lot about Javascript at https://javascript.info';
|
||||
|
||||
let regexp = /javascript/ig;
|
||||
|
||||
let result;
|
||||
|
||||
while (result = regexp.exec(str)) {
|
||||
alert( `Found ${result[0]} at ${result.index}` );
|
||||
}
|
||||
```
|
||||
|
||||
The loop continues until `regexp.exec` returns `null` that means "no more matches".
|
||||
|
||||
````smart header="Search from the given position"
|
||||
We can force `regexp.exec` to start searching from the given position by setting `lastIndex` manually:
|
||||
|
||||
```js run
|
||||
let str = 'A lot about Javascript at https://javascript.info';
|
||||
|
||||
let regexp = /javascript/ig;
|
||||
regexp.lastIndex = 30;
|
||||
|
||||
alert( regexp.exec(str).index ); // 34, the search starts from the 30th position
|
||||
```
|
||||
````
|
||||
|
||||
## Summary, recipes
|
||||
|
||||
Methods become much easier to understand if we separate them by their use in real-life tasks.
|
||||
|
||||
To search for the first match only:
|
||||
: - Find the position of the first match -- `str.search(reg)`.
|
||||
- Find the full match -- `str.match(reg)`.
|
||||
- Check if there's a match -- `regexp.test(str)`.
|
||||
- Find the match from the given position -- `regexp.exec(str)`, set `regexp.lastIndex` to position.
|
||||
|
||||
To search for all matches:
|
||||
: - An array of matches -- `str.match(reg)`, the regexp with `g` flag.
|
||||
- Get all matches with full information about each one -- `regexp.exec(str)` with `g` flag in the loop.
|
||||
|
||||
To search and replace:
|
||||
: - Replace with another string or a function result -- `str.replace(reg, str|func)`
|
||||
|
||||
To split the string:
|
||||
: - `str.split(str|reg)`
|
||||
|
||||
Now we know the methods and can use regular expressions. But we need to learn their syntax and capabilities, so let's move on.
|
Loading…
Add table
Add a link
Reference in a new issue