This commit is contained in:
Ilya Kantor 2019-09-04 15:44:48 +03:00
parent ef370b6ace
commit f21cb0a2f4
71 changed files with 707 additions and 727 deletions

View file

@ -0,0 +1,29 @@
A regexp to search 3-digit color `#abc`: `pattern:/#[a-f0-9]{3}/i`.
We can add exactly 3 more optional hex digits. We don't need more or less. Either we have them or we don't.
The simplest way to add them -- is to append to the regexp: `pattern:/#[a-f0-9]{3}([a-f0-9]{3})?/i`
We can do it in a smarter way though: `pattern:/#([a-f0-9]{3}){1,2}/i`.
Here the regexp `pattern:[a-f0-9]{3}` is in parentheses to apply the quantifier `pattern:{1,2}` to it as a whole.
In action:
```js run
let reg = /#([a-f0-9]{3}){1,2}/gi;
let str = "color: #3f3; background-color: #AA00ef; and: #abcd";
alert( str.match(reg) ); // #3f3 #AA00ef #abc
```
There's a minor problem here: the pattern found `match:#abc` in `subject:#abcd`. To prevent that we can add `pattern:\b` to the end:
```js run
let reg = /#([a-f0-9]{3}){1,2}\b/gi;
let str = "color: #3f3; background-color: #AA00ef; and: #abcd";
alert( str.match(reg) ); // #3f3 #AA00ef
```

View file

@ -0,0 +1,14 @@
# Find color in the format #abc or #abcdef
Write a RegExp that matches colors in the format `#abc` or `#abcdef`. That is: `#` followed by 3 or 6 hexadecimal digits.
Usage example:
```js
let reg = /your regexp/g;
let str = "color: #3f3; background-color: #AA00ef; and: #abcd";
alert( str.match(reg) ); // #3f3 #AA00ef
```
P.S. This should be exactly 3 or 6 hex digits: values like `#abcd` should not match.

View file

@ -0,0 +1,11 @@
A positive number with an optional decimal part is (per previous task): `pattern:\d+(\.\d+)?`.
Let's add an optional `-` in the beginning:
```js run
let reg = /-?\d+(\.\d+)?/g;
let str = "-1.5 0 2 -123.4.";
alert( str.match(reg) ); // -1.5, 0, 2, -123.4
```

View file

@ -0,0 +1,13 @@
# Find all numbers
Write a regexp that looks for all decimal numbers including integer ones, with the floating point and negative ones.
An example of use:
```js
let reg = /your regexp/g;
let str = "-1.5 0 2 -123.4.";
alert( str.match(reg) ); // -1.5, 0, 2, -123.4
```

View file

@ -0,0 +1,53 @@
A regexp for a number is: `pattern:-?\d+(\.\d+)?`. We created it in previous tasks.
An operator is `pattern:[-+*/]`.
Please note:
- Here the dash `pattern:-` goes first in the brackets, because in the middle it would mean a character range, while we just want a character `-`.
- A slash `/` should be escaped inside a JavaScript regexp `pattern:/.../`, we'll do that later.
We need a number, an operator, and then another number. And optional spaces between them.
The full regular expression: `pattern:-?\d+(\.\d+)?\s*[-+*/]\s*-?\d+(\.\d+)?`.
To get a result as an array let's put parentheses around the data that we need: numbers and the operator: `pattern:(-?\d+(\.\d+)?)\s*([-+*/])\s*(-?\d+(\.\d+)?)`.
In action:
```js run
let reg = /(-?\d+(\.\d+)?)\s*([-+*\/])\s*(-?\d+(\.\d+)?)/;
alert( "1.2 + 12".match(reg) );
```
The result includes:
- `result[0] == "1.2 + 12"` (full match)
- `result[1] == "1.2"` (first group `(-?\d+(\.\d+)?)` -- the first number, including the decimal part)
- `result[2] == ".2"` (second group`(\.\d+)?` -- the first decimal part)
- `result[3] == "+"` (third group `([-+*\/])` -- the operator)
- `result[4] == "12"` (forth group `(-?\d+(\.\d+)?)` -- the second number)
- `result[5] == undefined` (fifth group `(\.\d+)?` -- the last decimal part is absent, so it's undefined)
We only want the numbers and the operator, without the full match or the decimal parts.
The full match (the arrays first item) can be removed by shifting the array `pattern:result.shift()`.
The decimal groups can be removed by making them into non-capturing groups, by adding `pattern:?:` to the beginning: `pattern:(?:\.\d+)?`.
The final solution:
```js run
function parse(expr) {
let reg = /(-?\d+(?:\.\d+)?)\s*([-+*\/])\s*(-?\d+(?:\.\d+)?)/;
let result = expr.match(reg);
if (!result) return [];
result.shift();
return result;
}
alert( parse("-1.23 * 3.45") ); // -1.23, *, 3.45
```

View file

@ -0,0 +1,28 @@
# Parse an expression
An arithmetical expression consists of 2 numbers and an operator between them, for instance:
- `1 + 2`
- `1.2 * 3.4`
- `-3 / -6`
- `-2 - 2`
The operator is one of: `"+"`, `"-"`, `"*"` or `"/"`.
There may be extra spaces at the beginning, at the end or between the parts.
Create a function `parse(expr)` that takes an expression and returns an array of 3 items:
1. The first number.
2. The operator.
3. The second number.
For example:
```js
let [a, op, b] = parse("1.2 * 3.4");
alert(a); // 1.2
alert(op); // *
alert(b); // 3.4
```

View file

@ -0,0 +1,242 @@
# Capturing groups
A part of a pattern can be enclosed in parentheses `pattern:(...)`. This is called a "capturing group".
That has two effects:
1. It allows to place a part of the match into a separate array.
2. If we put a quantifier after the parentheses, it applies to the parentheses as a whole, not the last character.
## Example
In the example below the pattern `pattern:(go)+` finds one or more `match:'go'`:
```js run
alert( 'Gogogo now!'.match(/(go)+/i) ); // "Gogogo"
```
Without parentheses, the pattern `pattern:/go+/` means `subject:g`, followed by `subject:o` repeated one or more times. For instance, `match:goooo` or `match:gooooooooo`.
Parentheses group the word `pattern:(go)` together.
Let's make something more complex -- a regexp to match an email.
Examples of emails:
```
my@mail.com
john.smith@site.com.uk
```
The pattern: `pattern:[-.\w]+@([\w-]+\.)+[\w-]{2,20}`.
1. The first part `pattern:[-.\w]+` (before `@`) may include any alphanumeric word characters, a dot and a dash, to match `match:john.smith`.
2. Then `pattern:@`, and the domain. It may be a subdomain like `host.site.com.uk`, so we match it as "a word followed by a dot `pattern:([\w-]+\.)` (repeated), and then the last part must be a word: `match:com` or `match:uk` (but not very long: 2-20 characters).
That regexp is not perfect, but good enough to fix errors or occasional mistypes.
For instance, we can find all emails in the string:
```js run
let reg = /[-.\w]+@([\w-]+\.)+[\w-]{2,20}/g;
alert("my@mail.com @ his@site.com.uk".match(reg)); // my@mail.com, his@site.com.uk
```
In this example parentheses were used to make a group for repetitions `pattern:([\w-]+\.)+`. But there are other uses too, let's see them.
## Contents of parentheses
Parentheses are numbered from left to right. The search engine remembers the content matched by each of them and allows to reference it in the pattern or in the replacement string.
For instance, we'd like to find HTML tags `pattern:<.*?>`, and process them.
Let's wrap the inner content into parentheses, like this: `pattern:<(.*?)>`.
Then we'll get both the tag as a whole and its content:
```js run
let str = '<h1>Hello, world!</h1>';
let reg = /<(.*?)>/;
alert( str.match(reg) ); // Array: ["<h1>", "h1"]
```
The call to [String#match](mdn:js/String/match) returns groups only if the regexp only looks for the first match, that is: has no `pattern:/.../g` flag.
If we need all matches with their groups then we can use `.matchAll` or `regexp.exec` as described in <info:regexp-methods>:
```js run
let str = '<h1>Hello, world!</h1>';
// two matches: opening <h1> and closing </h1> tags
let reg = /<(.*?)>/g;
let matches = Array.from( str.matchAll(reg) );
alert(matches[0]); // Array: ["<h1>", "h1"]
alert(matches[1]); // Array: ["</h1>", "/h1"]
```
Here we have two matches for `pattern:<(.*?)>`, each of them is an array with the full match and groups.
## Nested groups
Parentheses can be nested. In this case the numbering also goes from left to right.
For instance, when searching a tag in `subject:<span class="my">` we may be interested in:
1. The tag content as a whole: `match:span class="my"`.
2. The tag name: `match:span`.
3. The tag attributes: `match:class="my"`.
Let's add parentheses for them:
```js run
let str = '<span class="my">';
let reg = /<(([a-z]+)\s*([^>]*))>/;
let result = str.match(reg);
alert(result); // <span class="my">, span class="my", span, class="my"
```
Here's how groups look:
![](regexp-nested-groups.svg)
At the zero index of the `result` is always the full match.
Then groups, numbered from left to right. Whichever opens first gives the first group `result[1]`. Here it encloses the whole tag content.
Then in `result[2]` goes the group from the second opening `pattern:(` till the corresponding `pattern:)` -- tag name, then we don't group spaces, but group attributes for `result[3]`.
**Even if a group is optional and doesn't exist in the match, the corresponding `result` array item is present (and equals `undefined`).**
For instance, let's consider the regexp `pattern:a(z)?(c)?`. It looks for `"a"` optionally followed by `"z"` optionally followed by `"c"`.
If we run it on the string with a single letter `subject:a`, then the result is:
```js run
let match = 'a'.match(/a(z)?(c)?/);
alert( match.length ); // 3
alert( match[0] ); // a (whole match)
alert( match[1] ); // undefined
alert( match[2] ); // undefined
```
The array has the length of `3`, but all groups are empty.
And here's a more complex match for the string `subject:ack`:
```js run
let match = 'ack'.match(/a(z)?(c)?/)
alert( match.length ); // 3
alert( match[0] ); // ac (whole match)
alert( match[1] ); // undefined, because there's nothing for (z)?
alert( match[2] ); // c
```
The array length is permanent: `3`. But there's nothing for the group `pattern:(z)?`, so the result is `["ac", undefined, "c"]`.
## Named groups
Remembering groups by their numbers is hard. For simple patterns it's doable, but for more complex ones we can give names to parentheses.
That's done by putting `pattern:?<name>` immediately after the opening paren, like this:
```js run
*!*
let dateRegexp = /(?<year>[0-9]{4})-(?<month>[0-9]{2})-(?<day>[0-9]{2})/;
*/!*
let str = "2019-04-30";
let groups = str.match(dateRegexp).groups;
alert(groups.year); // 2019
alert(groups.month); // 04
alert(groups.day); // 30
```
As you can see, the groups reside in the `.groups` property of the match.
We can also use them in the replacement string, as `pattern:$<name>` (like `$1..9`, but a name instead of a digit).
For instance, let's reformat the date into `day.month.year`:
```js run
let dateRegexp = /(?<year>[0-9]{4})-(?<month>[0-9]{2})-(?<day>[0-9]{2})/;
let str = "2019-04-30";
let rearranged = str.replace(dateRegexp, '$<day>.$<month>.$<year>');
alert(rearranged); // 30.04.2019
```
If we use a function for the replacement, then named `groups` object is always the last argument:
```js run
let dateRegexp = /(?<year>[0-9]{4})-(?<month>[0-9]{2})-(?<day>[0-9]{2})/;
let str = "2019-04-30";
let rearranged = str.replace(dateRegexp,
(str, year, month, day, offset, input, groups) =>
`${groups.day}.${groups.month}.${groups.year}`
);
alert(rearranged); // 30.04.2019
```
Usually, when we intend to use named groups, we don't need positional arguments of the function. For the majority of real-life cases we only need `str` and `groups`.
So we can write it a little bit shorter:
```js
let rearranged = str.replace(dateRegexp, (str, ...args) => {
let {year, month, day} = args.pop();
alert(str); // 2019-04-30
alert(year); // 2019
alert(month); // 04
alert(day); // 30
});
```
## Non-capturing groups with ?:
Sometimes we need parentheses to correctly apply a quantifier, but we don't want their contents in results.
A group may be excluded by adding `pattern:?:` in the beginning.
For instance, if we want to find `pattern:(go)+`, but don't want to remember the contents (`go`) in a separate array item, we can write: `pattern:(?:go)+`.
In the example below we only get the name "John" as a separate member of the `results` array:
```js run
let str = "Gogo John!";
*!*
// exclude Gogo from capturing
let reg = /(?:go)+ (\w+)/i;
*/!*
let result = str.match(reg);
alert( result.length ); // 2
alert( result[1] ); // John
```
## Summary
Parentheses group together a part of the regular expression, so that the quantifier applies to it as a whole.
Parentheses groups are numbered left-to-right, and can optionally be named with `(?<name>...)`.
The content, matched by a group, can be referenced both in the replacement string as `$1`, `$2` etc, or by the name `$name` if named.
So, parentheses groups are called "capturing groups", as they "capture" a part of the match. We get that part separately from the result as a member of the array or in `.groups` if it's named.
We can exclude the group from remembering (make in "non-capturing") by putting `?:` at the start: `(?:...)`, that's used if we'd like to apply a quantifier to the whole group, but don't need it in the result.

View file

@ -0,0 +1 @@
<svg xmlns="http://www.w3.org/2000/svg" width="320" height="130" viewBox="0 0 320 130"><defs><style>@import url(https://fonts.googleapis.com/css?family=Open+Sans:bold,italic,bolditalic%7CPT+Mono);@font-face{font-family:&apos;PT Mono&apos;;font-weight:700;font-style:normal;src:local(&apos;PT MonoBold&apos;),url(/font/PTMonoBold.woff2) format(&apos;woff2&apos;),url(/font/PTMonoBold.woff) format(&apos;woff&apos;),url(/font/PTMonoBold.ttf) format(&apos;truetype&apos;)}</style></defs><g id="regexp" fill="none" fill-rule="evenodd" stroke="none" stroke-width="1"><g id="regexp-nested-groups.svg"><text id="&lt;(([a-z]+)\s*([^&gt;]*)" font-family="PTMono-Regular, PT Mono" font-size="22" font-weight="normal"><tspan x="20" y="75" fill="#8A704D">&lt;</tspan> <tspan x="33.2" y="75" fill="#DB2023">((</tspan> <tspan x="59.6" y="75" fill="#8A704D">[a-z]+</tspan> <tspan x="138.8" y="75" fill="#DB2023">)</tspan> <tspan x="152" y="75" fill="#8A704D">\s*</tspan> <tspan x="191.6" y="75" fill="#DB2023">(</tspan> <tspan x="204.8" y="75" fill="#8A704D">[^&gt;]*</tspan> <tspan x="270.8" y="75" fill="#D0021B">))</tspan> <tspan x="297.2" y="75" fill="#8A704D">&gt;</tspan></text><path id="Line" stroke="#D0021B" stroke-linecap="square" d="M42.5 45.646V29.354"/><path id="Line-2" stroke="#D0021B" stroke-linecap="square" d="M290.5 45.646V29.354"/><path id="Line" stroke="#D0021B" stroke-linecap="square" d="M42.5 28.5h248"/><path id="Line-5" stroke="#D0021B" stroke-linecap="square" d="M52.5 101.646V85.354"/><path id="Line-4" stroke="#D0021B" stroke-linecap="square" d="M145.5 101.646V85.354"/><path id="Line-3" stroke="#D0021B" stroke-linecap="square" d="M52.5 102.5h93"/><text id="1" fill="#D0021B" font-family="PTMono-Regular, PT Mono" font-size="20" font-weight="normal"><tspan x="24" y="44">1</tspan></text><text id="span-class=&quot;my&quot;" fill="#417505" font-family="PTMono-Regular, PT Mono" font-size="20" font-weight="normal"><tspan x="82" y="23">span class=&quot;my&quot;</tspan></text><text id="2" fill="#D0021B" font-family="PTMono-Regular, PT Mono" font-size="20" font-weight="normal"><tspan x="35" y="101">2</tspan></text><text id="span" fill="#417505" font-family="PTMono-Regular, PT Mono" font-size="20" font-weight="normal"><tspan x="73" y="119">span</tspan></text><path id="Line-8" stroke="#D0021B" stroke-linecap="square" d="M197.5 101.646V85.354"/><path id="Line-7" stroke="#D0021B" stroke-linecap="square" d="M277.5 101.646V85.354"/><path id="Line-6" stroke="#D0021B" stroke-linecap="square" d="M197.5 102.5h80"/><text id="3" fill="#D0021B" font-family="PTMono-Regular, PT Mono" font-size="20" font-weight="normal"><tspan x="182" y="101">3</tspan></text><text id="class=&quot;my&quot;" fill="#417505" font-family="PTMono-Regular, PT Mono" font-size="20" font-weight="normal"><tspan x="185" y="121">class=&quot;my&quot;</tspan></text></g></g></svg>

After

Width:  |  Height:  |  Size: 2.8 KiB