regexp draft

This commit is contained in:
Ilya Kantor 2019-03-02 01:02:01 +03:00
parent 1369332661
commit 65184edf76
11 changed files with 730 additions and 399 deletions

View file

@ -1,16 +1,18 @@
# Quantifiers +, *, ? and {n}
Let's say we have a string like `+7(903)-123-45-67` and want to find all numbers in it. But unlike before, we are interested in not digits, but full numbers: `7, 903, 123, 45, 67`.
Let's say we have a string like `+7(903)-123-45-67` and want to find all numbers in it. But unlike before, we are interested not in single digits, but full numbers: `7, 903, 123, 45, 67`.
A number is a sequence of 1 or more digits `\d`. The instrument to say how many we need is called *quantifiers*.
A number is a sequence of 1 or more digits `\d`. To mark how many we need, we need to append a *quantifier*.
## Quantity {n}
The most obvious quantifier is a number in figure quotes: `pattern:{n}`. A quantifier is put after a character (or a character class and so on) and specifies exactly how many we need.
The simplest quantifier is a number in curly braces: `pattern:{n}`.
It also has advanced forms, here we go with examples:
A quantifier is appended to a character (or a character class, or a `[...]` set etc) and specifies how many we need.
Exact count: `{5}`
It has a few advanced forms, let's see examples:
The exact count: `{5}`
: `pattern:\d{5}` denotes exactly 5 digits, the same as `pattern:\d\d\d\d\d`.
The example below looks for a 5-digit number:
@ -21,20 +23,24 @@ Exact count: `{5}`
We can add `\b` to exclude longer numbers: `pattern:\b\d{5}\b`.
The count from-to: `{3,5}`
: To find numbers from 3 to 5 digits we can put the limits into figure brackets: `pattern:\d{3,5}`
The range: `{3,5}`, match 3-5 times
: To find numbers from 3 to 5 digits we can put the limits into curly braces: `pattern:\d{3,5}`
```js run
alert( "I'm not 12, but 1234 years old".match(/\d{3,5}/) ); // "1234"
```
We can omit the upper limit. Then a regexp `pattern:\d{3,}` looks for numbers of `3` and more digits:
We can omit the upper limit.
Then a regexp `pattern:\d{3,}` looks for sequences of digits of length `3` or more:
```js run
alert( "I'm not 12, but 345678 years old".match(/\d{3,}/) ); // "345678"
```
In case with the string `+7(903)-123-45-67` we need numbers: one or more digits in a row. That is `pattern:\d{1,}`:
Let's return to the string `+7(903)-123-45-67`.
A number is a sequence of one or more digits in a row. So the regexp is `pattern:\d{1,}`:
```js run
let str = "+7(903)-123-45-67";
@ -46,7 +52,7 @@ alert(numbers); // 7,903,123,45,67
## Shorthands
Most often needed quantifiers have shorthands:
There are shorthands for most used quantifiers:
`+`
: Means "one or more", the same as `{1,}`.
@ -64,7 +70,7 @@ Most often needed quantifiers have shorthands:
For instance, the pattern `pattern:ou?r` looks for `match:o` followed by zero or one `match:u`, and then `match:r`.
So it can find `match:or` in the word `subject:color` and `match:our` in `subject:colour`:
So, `pattern:colou?r` finds both `match:color` and `match:colour`:
```js run
let str = "Should I write color or colour?";
@ -75,7 +81,7 @@ Most often needed quantifiers have shorthands:
`*`
: Means "zero or more", the same as `{0,}`. That is, the character may repeat any times or be absent.
The example below looks for a digit followed by any number of zeroes:
For example, `pattern:\d0*` looks for a digit followed by any number of zeroes:
```js run
alert( "100 10 1".match(/\d0*/g) ); // 100, 10, 1
@ -85,11 +91,12 @@ Most often needed quantifiers have shorthands:
```js run
alert( "100 10 1".match(/\d0+/g) ); // 100, 10
// 1 not matched, as 0+ requires at least one zero
```
## More examples
Quantifiers are used very often. They are one of the main "building blocks" for complex regular expressions, so let's see more examples.
Quantifiers are used very often. They serve as the main "building block" of complex regular expressions, so let's see more examples.
Regexp "decimal fraction" (a number with a floating point): `pattern:\d+\.\d+`
: In action:
@ -120,12 +127,12 @@ Regexp "opening or closing HTML-tag without attributes": `pattern:/<\/?[a-z][a-z
alert( "<h1>Hi!</h1>".match(/<\/?[a-z][a-z0-9]*>/gi) ); // <h1>, </h1>
```
```smart header="More precise means more complex"
```smart header="To make a regexp more precise, we often need make it more complex"
We can see one common rule in these examples: the more precise is the regular expression -- the longer and more complex it is.
For instance, HTML tags could use a simpler regexp: `pattern:<\w+>`.
For instance, for HTML tags we could use a simpler regexp: `pattern:<\w+>`.
Because `pattern:\w` means any English letter or a digit or `'_'`, the regexp also matches non-tags, for instance `match:<_>`. But it's much simpler than `pattern:<[a-z][a-z0-9]*>`.
...But because `pattern:\w` means any English letter or a digit or `'_'`, the regexp also matches non-tags, for instance `match:<_>`. So it's much simpler than `pattern:<[a-z][a-z0-9]*>`, but less reliable.
Are we ok with `pattern:<\w+>` or we need `pattern:<[a-z][a-z0-9]*>`?