regexp draft
This commit is contained in:
parent
1369332661
commit
65184edf76
11 changed files with 730 additions and 399 deletions
|
@ -1,16 +1,18 @@
|
|||
# Quantifiers +, *, ? and {n}
|
||||
|
||||
Let's say we have a string like `+7(903)-123-45-67` and want to find all numbers in it. But unlike before, we are interested in not digits, but full numbers: `7, 903, 123, 45, 67`.
|
||||
Let's say we have a string like `+7(903)-123-45-67` and want to find all numbers in it. But unlike before, we are interested not in single digits, but full numbers: `7, 903, 123, 45, 67`.
|
||||
|
||||
A number is a sequence of 1 or more digits `\d`. The instrument to say how many we need is called *quantifiers*.
|
||||
A number is a sequence of 1 or more digits `\d`. To mark how many we need, we need to append a *quantifier*.
|
||||
|
||||
## Quantity {n}
|
||||
|
||||
The most obvious quantifier is a number in figure quotes: `pattern:{n}`. A quantifier is put after a character (or a character class and so on) and specifies exactly how many we need.
|
||||
The simplest quantifier is a number in curly braces: `pattern:{n}`.
|
||||
|
||||
It also has advanced forms, here we go with examples:
|
||||
A quantifier is appended to a character (or a character class, or a `[...]` set etc) and specifies how many we need.
|
||||
|
||||
Exact count: `{5}`
|
||||
It has a few advanced forms, let's see examples:
|
||||
|
||||
The exact count: `{5}`
|
||||
: `pattern:\d{5}` denotes exactly 5 digits, the same as `pattern:\d\d\d\d\d`.
|
||||
|
||||
The example below looks for a 5-digit number:
|
||||
|
@ -21,20 +23,24 @@ Exact count: `{5}`
|
|||
|
||||
We can add `\b` to exclude longer numbers: `pattern:\b\d{5}\b`.
|
||||
|
||||
The count from-to: `{3,5}`
|
||||
: To find numbers from 3 to 5 digits we can put the limits into figure brackets: `pattern:\d{3,5}`
|
||||
The range: `{3,5}`, match 3-5 times
|
||||
: To find numbers from 3 to 5 digits we can put the limits into curly braces: `pattern:\d{3,5}`
|
||||
|
||||
```js run
|
||||
alert( "I'm not 12, but 1234 years old".match(/\d{3,5}/) ); // "1234"
|
||||
```
|
||||
|
||||
We can omit the upper limit. Then a regexp `pattern:\d{3,}` looks for numbers of `3` and more digits:
|
||||
We can omit the upper limit.
|
||||
|
||||
Then a regexp `pattern:\d{3,}` looks for sequences of digits of length `3` or more:
|
||||
|
||||
```js run
|
||||
alert( "I'm not 12, but 345678 years old".match(/\d{3,}/) ); // "345678"
|
||||
```
|
||||
|
||||
In case with the string `+7(903)-123-45-67` we need numbers: one or more digits in a row. That is `pattern:\d{1,}`:
|
||||
Let's return to the string `+7(903)-123-45-67`.
|
||||
|
||||
A number is a sequence of one or more digits in a row. So the regexp is `pattern:\d{1,}`:
|
||||
|
||||
```js run
|
||||
let str = "+7(903)-123-45-67";
|
||||
|
@ -46,7 +52,7 @@ alert(numbers); // 7,903,123,45,67
|
|||
|
||||
## Shorthands
|
||||
|
||||
Most often needed quantifiers have shorthands:
|
||||
There are shorthands for most used quantifiers:
|
||||
|
||||
`+`
|
||||
: Means "one or more", the same as `{1,}`.
|
||||
|
@ -64,7 +70,7 @@ Most often needed quantifiers have shorthands:
|
|||
|
||||
For instance, the pattern `pattern:ou?r` looks for `match:o` followed by zero or one `match:u`, and then `match:r`.
|
||||
|
||||
So it can find `match:or` in the word `subject:color` and `match:our` in `subject:colour`:
|
||||
So, `pattern:colou?r` finds both `match:color` and `match:colour`:
|
||||
|
||||
```js run
|
||||
let str = "Should I write color or colour?";
|
||||
|
@ -75,7 +81,7 @@ Most often needed quantifiers have shorthands:
|
|||
`*`
|
||||
: Means "zero or more", the same as `{0,}`. That is, the character may repeat any times or be absent.
|
||||
|
||||
The example below looks for a digit followed by any number of zeroes:
|
||||
For example, `pattern:\d0*` looks for a digit followed by any number of zeroes:
|
||||
|
||||
```js run
|
||||
alert( "100 10 1".match(/\d0*/g) ); // 100, 10, 1
|
||||
|
@ -85,11 +91,12 @@ Most often needed quantifiers have shorthands:
|
|||
|
||||
```js run
|
||||
alert( "100 10 1".match(/\d0+/g) ); // 100, 10
|
||||
// 1 not matched, as 0+ requires at least one zero
|
||||
```
|
||||
|
||||
## More examples
|
||||
|
||||
Quantifiers are used very often. They are one of the main "building blocks" for complex regular expressions, so let's see more examples.
|
||||
Quantifiers are used very often. They serve as the main "building block" of complex regular expressions, so let's see more examples.
|
||||
|
||||
Regexp "decimal fraction" (a number with a floating point): `pattern:\d+\.\d+`
|
||||
: In action:
|
||||
|
@ -120,12 +127,12 @@ Regexp "opening or closing HTML-tag without attributes": `pattern:/<\/?[a-z][a-z
|
|||
alert( "<h1>Hi!</h1>".match(/<\/?[a-z][a-z0-9]*>/gi) ); // <h1>, </h1>
|
||||
```
|
||||
|
||||
```smart header="More precise means more complex"
|
||||
```smart header="To make a regexp more precise, we often need make it more complex"
|
||||
We can see one common rule in these examples: the more precise is the regular expression -- the longer and more complex it is.
|
||||
|
||||
For instance, HTML tags could use a simpler regexp: `pattern:<\w+>`.
|
||||
For instance, for HTML tags we could use a simpler regexp: `pattern:<\w+>`.
|
||||
|
||||
Because `pattern:\w` means any English letter or a digit or `'_'`, the regexp also matches non-tags, for instance `match:<_>`. But it's much simpler than `pattern:<[a-z][a-z0-9]*>`.
|
||||
...But because `pattern:\w` means any English letter or a digit or `'_'`, the regexp also matches non-tags, for instance `match:<_>`. So it's much simpler than `pattern:<[a-z][a-z0-9]*>`, but less reliable.
|
||||
|
||||
Are we ok with `pattern:<\w+>` or we need `pattern:<[a-z][a-z0-9]*>`?
|
||||
|
||||
|
|
Loading…
Add table
Add a link
Reference in a new issue