en.javascript.info/10-regular-expressions-javascript/02-regexp-methods/article.md
Ilya Kantor e2443e8de6 ok
2017-03-19 16:59:53 +03:00

370 lines
11 KiB
Markdown

# Methods of RegExp and String
There are two sets of methods to deal with regular expressions.
1. First, regular expressions are objects of the built-in [RegExp](mdn:js/RegExp) class, it provides many methods.
2. Besides that, there are methods in regular strings can work with regexps.
The structure is a bit messed up, so we'll first consider methods separately, and then -- practical recipes for common tasks.
[cut]
## str.search(reg)
We've seen this method already. It returns the position of the first match or `-1` if none found:
```js run
let str = "A drop of ink may make a million think";
alert( str.search( *!*/a/i*/!* ) ); // 0 (the first position)
```
**The important limitation: `search` always looks for the first match.**
We can't find next positions using `search`, there's just no syntax for that. But there are other mathods that can.
## str.match(reg), no "g" flag
The method `str.match` behavior varies depending on the `g` flag. First let's see the case without it.
Then `str.match(reg)` looks for the first match only.
The result is an array with that match and additional properties:
- `index` -- the position of the match inside the string,
- `input` -- the subject string.
For instance:
```js run
let str = "Fame is the thirst of youth";
let result = str.match( *!*/fame/i*/!* );
alert( result[0] ); // Fame (the match)
alert( result.index ); // 0 (at the zero position)
alert( result.input ); // "Fame is the thirst of youth" (the string)
```
The array may have more than one element.
**If a part of the pattern is delimited by brackets `(...)`, then it becomes a separate element of the array.**
For instance:
```js run
lar str = "Javascript is a programming language";
let result = str.match( *!*/JAVA(SCRIPT)/i*/!* );
alert( result[0] ); // Javascript (the whole match)
alert( result[1] ); // script (the part of the match that corresponds to the brackets)
alert( result.index ); // 0
alert( result.input ); // Javascript is a programming language
```
Due to the `i` flag the search is case-insensitive, so it finds `match:Javascript`. The part of the match that corresponds to `pattern:SCRIPT` becomes a separate array item.
We'll be back to brackets later in the chapter [todo]. They are great for search-and-replace.
## str.match(reg) with "g" flag
When there's a `"g"` flag, then `str.match` returns an array of all matches. There are no additional properties in that array, and brackets do not create any elements.
For instance:
```js run
let str = "HO-Ho-ho!";
let result = str.match( *!*/ho/ig*/!* );
alert( result ); // HO, Ho, ho (all matches, case-insensitive)
```
With brackets nothing changes, here we go:
```js run
let str = "HO-Ho-ho!";
let result = str.match( *!*/h(o)/ig*/!* );
alert( result ); // HO, Ho, ho
```
So, with `g` flag the `result` is a simple array of matches. No additional properties.
If we want to get information about match positions and use brackets then we should use [RegExp#exec](mdn:js/RegExp/exec) method that we'll cover below.
````warn header="If there are no matches, the call to `match` returns `null`"
Please note, that's important. If there were no matches, the result is not an empty array, but `null`.
Keep that in mind to evade pitfalls like this:
```js run
let str = "Hey-hey-hey!";
alert( str.match(/ho/gi).length ); // error! there's no length of null
```
````
## str.split(regexp|substr, limit)
Splits the string using the regexp (or a substring) as a delimiter.
We already used `split` with strings, like this:
```js run
alert('12-34-56'.split('-')) // [12, 34, 56]
```
But we can also pass a regular expression:
```js run
alert('12-34-56'.split(/-/)) // [12, 34, 56]
```
## str.replace(str|reg, str|func)
The swiss army knife for search and replace in strings.
The simplest use -- search and replace a substring, like this:
```js run
// replace a dash by a colon
alert('12-34-56'.replace("-", ":")) // 12:34-56
```
When the first argument of `replace` is a string, it only looks for the first match.
To find all dashes, we need to use not the string `"-"`, but a regexp `pattern:/-/g`, with an obligatory `g` flag:
```js run
// replace all dashes by a colon
alert( '12-34-56'.replace( *!*/-/g*/!*, ":" ) ) // 12:34:56
```
The second argument is a replacement string.
We can use special characters in it:
| Symbol | Inserts |
|--------|--------|
|`$$`|`"$"` |
|`$&`|the whole match|
|<code>$&#096;</code>|a part of the string before the match|
|`$'`|a part of the string after the match|
|`$n`|if `n` is a 1-2 digit number, then it means the contents of n-th brackets counting fro left to right|
For instance let's use `$&` to replace all entries of `"John"` by `"Mr.John"`:
```js run
let str = "John Doe, John Smith and John Bull.";
// for each John - replace it with Mr. and then John
alert(str.replace(/John/g, 'Mr.$&'));
// "Mr.John Doe, Mr.John Smith and Mr.John Bull.";
```
Brackets are very often used together with `$1`, `$2`, like this:
```js run
let str = "John Smith";
alert(str.replace(/(John) (Smith)/, '$2, $1')) // Smith, John
```
**For situations that require "smart" replacements, the second argument can be a function.**
It will be called for each match, and its result will be inserted as a replacement.
For instance:
```js run
let i = 0;
// replace each "ho" by the result of the function
alert("HO-Ho-ho".replace(/ho/gi, function() {
return ++i;
})); // 1-2-3
```
In the example above the function just returns the next number every time, but usually the result is based on the match.
The function is called with arguments `func(str, p1, p2, ..., pn, offset, s)`:
1. `str` -- the match,
2. `p1, p2, ..., pn` -- contents of brackets (if there are any),
3. `offset` -- position of the match,
4. `s` -- the source string.
If there are no brackets in the regexp, then the function always has 3 arguments: `func(str, offset, s)`.
Let's use it to show full information about matches:
```js run
// show and replace all matches
function replacer(str, offset, s) {
alert(`Found ${str} at position ${offset} in string ${s}`);
return str.toLowerCase();
}
let result = "HO-Ho-ho".replace(/ho/gi, replacer);
alert( 'Result: ' + result ); // Result: ho-ho-ho
// shows each match:
// Found HO at position 0 in string HO-Ho-ho
// Found Ho at position 3 in string HO-Ho-ho
// Found ho at position 6 in string HO-Ho-ho
```
In the example below there are two brackets, so `replacer` is called with 5 arguments: `str` is the full match, then brackets, and then `offset` and `s`:
```js run
function replacer(str, name, surname, offset, s) {
// name is the first bracket, surname is the second one
return surname + ", " + name;
}
let str = "John Smith";
alert(str.replace(/(John) (Smith)/, replacer)) // Smith, John
```
Using a function gives us the ultimate replacement power, because it gets all the information about the match, has access to outer variables and can do everything.
## regexp.test(str)
Let's move on to the methods of `RegExp` class, that are callable on regexps themselves.
The `test` method looks for any match and returns `true/false` whether he found it.
So it's basically the same as `str.search(reg) != -1`, for instance:
```js run
let str = "I love Javascript";
// these two tests do the same
alert( *!*/love/i*/!*.test(str) ); // true
alert( str.search(*!*/love/i*/!*) != -1 ); // true
```
An example with the negative answer:
```js run
let str = "Bla-bla-bla";
alert( *!*/love/i*/!*.test(str) ); // false
alert( str.search(*!*/love/i*/!*) != -1 ); // false
```
## regexp.exec(str)
We've already seen these searching methods:
- `search` -- looks for the position of the match,
- `match` -- if there's no `g` flag, returns the first match with brackets,
- `match` -- if there's a `g` flag -- returns all matches, without separating brackets.
The `regexp.exec` method is a bit harder to use, but it allows to search all matches with brackets and positions.
It behaves differently depending on whether the regexp has the `g` flag.
- If there's no `g`, then `regexp.exec(str)` returns the first match, exactly as `str.match(reg)`.
- If there's `g`, then `regexp.exec(str)` returns the first match and *remembers* the position after it in `regexp.lastIndex` property. The next call starts to search from `regexp.lastIndex` and returns the next match. If there are no more matches then `regexp.exec` returns `null` and `regexp.lastIndex` is set to `0`.
As we can see, the method gives us nothing new if we use it without the `g` flag, because `str.match` does exactly the same.
But the `g` flag allows to get all matches with their positions and bracket groups.
Here's the example how subsequent `regexp.exec` calls return matches one by one:
```js run
let str = "A lot about Javascript at https://javascript.info";
let regexp = /JAVA(SCRIPT)/ig;
*!*
// Look for the first match
*/!*
let matchOne = regexp.exec(str);
alert( matchOne[0] ); // Javascript
alert( matchOne[1] ); // script
alert( matchOne.index ); // 12 (the position of the match)
alert( matchOne.input ); // the same as str
alert( regexp.lastIndex ); // 22 (the position after the match)
*!*
// Look for the second match
*/!*
let matchTwo = regexp.exec(str); // continue searching from regexp.lastIndex
alert( matchTwo[0] ); // javascript
alert( matchTwo[1] ); // script
alert( matchTwo.index ); // 34 (the position of the match)
alert( matchTwo.input ); // the same as str
alert( regexp.lastIndex ); // 44 (the position after the match)
*!*
// Look for the third match
*/!*
let matchThree = regexp.exec(str); // continue searching from regexp.lastIndex
alert( matchThree ); // null (no match)
alert( regexp.lastIndex ); // 0 (reset)
```
As we can see, each `regexp.exec` call returns the match in a "full format": as an array with brackets, `index` and `input` properties.
The main use case for `regexp.exec` is to find all matches in a loop:
```js run
let str = 'A lot about Javascript at https://javascript.info';
let regexp = /javascript/ig;
let result;
while (result = regexp.exec(str)) {
alert( `Found ${result[0]} at ${result.index}` );
}
```
The loop continues until `regexp.exec` returns `null` that means "no more matches".
````smart header="Search from the given position"
We can force `regexp.exec` to start searching from the given position by setting `lastIndex` manually:
```js run
let str = 'A lot about Javascript at https://javascript.info';
let regexp = /javascript/ig;
regexp.lastIndex = 30;
alert( regexp.exec(str).index ); // 34, the search starts from the 30th position
```
````
## Summary, recipes
Methods become much easier to understand if we separate them by their use in real-life tasks.
To search for the first match only:
: - Find the position of the first match -- `str.search(reg)`.
- Find the full match -- `str.match(reg)`.
- Check if there's a match -- `regexp.test(str)`.
- Find the match from the given position -- `regexp.exec(str)`, set `regexp.lastIndex` to position.
To search for all matches:
: - An array of matches -- `str.match(reg)`, the regexp with `g` flag.
- Get all matches with full information about each one -- `regexp.exec(str)` with `g` flag in the loop.
To search and replace:
: - Replace with another string or a function result -- `str.replace(reg, str|func)`
To split the string:
: - `str.split(str|reg)`
Now we know the methods and can use regular expressions. But we need to learn their syntax and capabilities, so let's move on.