This commit is contained in:
Ilya Kantor 2017-03-21 17:14:05 +03:00
parent ab9ab64bd5
commit 97c8f22bbb
289 changed files with 195 additions and 172 deletions

View file

@ -0,0 +1,33 @@
The first idea can be to list the languages with `|` in-between.
But that doesn't work right:
```js run
let reg = /Java|JavaScript|PHP|C|C\+\+/g;
let str = "Java, JavaScript, PHP, C, C++";
alert( str.match(reg) ); // Java,Java,PHP,C,C
```
The regular expression engine looks for alternations one-by-one. That is: first it checks if we have `match:Java`, otherwise -- looks for `match:JavaScript` and so on.
As a result, `match:JavaScript` can never be found, just because `match:Java` is checked first.
The same with `match:C` and `match:C++`.
There are two solutions for that problem:
1. Change the order to check the longer match first: `pattern:JavaScript|Java|C\+\+|C|PHP`.
2. Merge variants with the same start: `pattern:Java(Script)?|C(\+\+)?|PHP`.
In action:
```js run
let reg = /Java(Script)?|C(\+\+)?|PHP/g;
let str = "Java, JavaScript, PHP, C, C++";
alert( str.match(reg) ); // Java,JavaScript,PHP,C,C++
```

View file

@ -0,0 +1,11 @@
# Find programming languages
There are many programming languages, for instance Java, JavaScript, PHP, C, C++.
Create a regexp that finds them in the string `subject:Java JavaScript PHP C++ C`:
```js
let reg = /your regexp/g;
alert("Java JavaScript PHP C++ C".match(reg)); // Java JavaScript PHP C++ C
```

View file

@ -0,0 +1,23 @@
Opening tag is `pattern:\[(b|url|quote)\]`.
Then to find everything till the closing tag -- let's the pattern `pattern:[\s\S]*?` to match any character including the newline and then a backreference to the closing tag.
The full pattern: `pattern:\[(b|url|quote)\][\s\S]*?\[/\1\]`.
In action:
```js run
let reg = /\[(b|url|quote)\][\s\S]*?\[\/\1\]/g;
let str = `
[b]hello![/b]
[quote]
[url]http://google.com[/url]
[/quote]
`;
alert( str.match(reg) ); // [b]hello![/b],[quote][url]http://google.com[/url][/quote]
```
Please note that we had to escape a slash for the closing tag `pattern:[/\1]`, because normally the slash closes the pattern.

View file

@ -0,0 +1,48 @@
# Find bbtag pairs
A "bb-tag" looks like `[tag]...[/tag]`, where `tag` is one of: `b`, `url` or `quote`.
For instance:
```
[b]текст[/b]
[url]http://google.com[/url]
```
BB-tags can be nested. But a tag can't be nested into itself, for instance:
```
Normal:
[url] [b]http://google.com[/b] [/url]
[quote] [b]text[/b] [/quote]
Impossible:
[b][b]text[/b][/b]
```
Tags can contain line breaks, that's normal:
```
[quote]
[b]text[/b]
[/quote]
```
Create a regexp to find all BB-tags with their contents.
For instance:
```js
let reg = /your regexp/g;
let str = "..[url]http://google.com[/url]..";
alert( str.match(reg) ); // [url]http://google.com[/url]
```
If tags are nested, then we need the outer tag (if we want we can continue the search in its content):
```js
let reg = /your regexp/g;
let str = "..[url][b]http://google.com[/b][/url]..";
alert( str.match(reg) ); // [url][b]http://google.com[/b][/url]
```

View file

@ -0,0 +1,17 @@
The solution: `pattern:/"(\\.|[^"\\])*"/g`.
Step by step:
- First we look for an opening quote `pattern:"`
- Then if we have a backslash `pattern:\\` (we technically have to double it in the pattern, because it is a special character, so that's a single backslash in fact), then any character is fine after it (a dot).
- Otherwise we take any character except a quote (that would mean the end of the string) and a backslash (to prevent lonely backslashes, the backslash is only used with some other symbol after it): `pattern:[^"\\]`
- ...And so on till the closing quote.
In action:
```js run
let reg = /"(\\.|[^"\\])*"/g;
let str = ' .. "test me" .. "Say \\"Hello\\"!" .. "\\\\ \\"" .. ';
alert( str.match(reg) ); // "test me","Say \"Hello\"!","\\ \""
```

View file

@ -0,0 +1,32 @@
# Find quoted strings
Create a regexp to find strings in double quotes `subject:"..."`.
The important part is that strings should support escaping, in the same way as JavaScript strings do. For instance, quotes can be inserted as `subject:\"` a newline as `subject:\n`, and the slash itself as `subject:\\`.
```js
let str = "Just like \"here\".";
```
For us it's important that an escaped quote `subject:\"` does not end a string.
So we should look from one quote to the other ignoring escaped quotes on the way.
That's the essential part of the task, otherwise it would be trivial.
Examples of strings to match:
```js
.. *!*"test me"*/!* ..
.. *!*"Say \"Hello\"!"*/!* ... (escaped quotes inside)
.. *!*"\\"*/!* .. (double slash inside)
.. *!*"\\ \""*/!* .. (double slash and an escaped quote inside)
```
In JavaScript we need to double the slashes to pass them right into the string, like this:
```js run
let str = ' .. "test me" .. "Say \\"Hello\\"!" .. "\\\\ \\"" .. ';
// the in-memory string
alert(str); // .. "test me" .. "Say \"Hello\"!" .. "\\ \"" ..
```

View file

@ -0,0 +1,16 @@
The pattern start is obvious: `pattern:<style`.
...But then we can't simply write `pattern:<style.*?>`, because `match:<styler>` would match it.
We need either a space after `match:<style` and then optionally something else or the ending `match:>`.
In the regexp language: `pattern:<style(>|\s.*?>)`.
In action:
```js run
let reg = /<style(>|\s.*?>)/g;
alert( '<style> <styler> <style test="...">'.match(reg) ); // <style>, <style test="...">
```

View file

@ -0,0 +1,13 @@
# Find the full tag
Write a regexp to find the tag `<style...>`. It should match the full tag: it may have no attributes `<style>` or have several of them `<style type="..." id="...">`.
...But the regexp should not match `<styler>`!
For instance:
```js
let reg = /your regexp/g;
alert( '<style> <styler> <style test="...">'.match(reg) ); // <style>, <style test="...">
```

View file

@ -0,0 +1,72 @@
# Alternation (OR) |
Alternation is the term in regular expression that is actually a simple "OR".
In a regular expression it is denoted with a vertial line character `pattern:|`.
[cut]
For instance, we need to find programming languages: HTML, PHP, Java or JavaScript.
The corresponding regexp: `pattern:html|php|java(script)?`.
A usage example:
```js run
let reg = /html|php|css|java(script)?/gi;
let str = "First HTML appeared, then CSS, then JavaScript";
alert( str.match(reg) ); // 'HTML', 'CSS', 'JavaScript'
```
We already know a similar thing -- square brackets. They allow to choose between multiple character, for instance `pattern:gr[ae]y` matches `match:gray` or `match:grey`.
Alternation works not on a character level, but on expression level. A regexp `pattern:A|B|C` means one of expressions `A`, `B` or `C`.
For instance:
- `pattern:gr(a|e)y` means exactly the same as `pattern:gr[ae]y`.
- `pattern:gra|ey` means "gra" or "ey".
To separate a part of the pattern for alternation we usually enclose it in parentheses, like this: `pattern:before(XXX|YYY)after`.
## Regexp for time
In previous chapters there was a task to build a regexp for searching time in the form `hh:mm`, for instance `12:00`. But a simple `pattern:\d\d:\d\d` is too vague. It accepts `25:99` as the time.
How can we make a better one?
We can apply more careful matching:
- The first digit must be `0` or `1` followed by any digit.
- Or `2` followed by `pattern:[0-3]`
As a regexp: `pattern:[01]\d|2[0-3]`.
Then we can add a colon and the minutes part.
The minutes must be from `0` to `59`, in the regexp language that means the first digit `pattern:[0-5]` followed by any other digit `\d`.
Let's glue them together into the pattern: `pattern:[01]\d|2[0-3]:[0-5]\d`.
We're almost done, but there's a problem. The alternation `|` is between the `pattern:[01]\d` and `pattern:2[0-3]:[0-5]\d`. That's wrong, because it will match either the left or the right pattern:
```js run
let reg = /[01]\d|2[0-3]:[0-5]\d/g;
alert("12".match(reg)); // 12 (matched [01]\d)
```
That's rather obvious, but still an often mistake when starting to work with regular expressions.
We need to add parentheses to apply alternation exactly to hours: `[01]\d` OR `2[0-3]`.
The correct variant:
```js run
let reg = /([01]\d|2[0-3]):[0-5]\d/g;
alert("00:00 10:10 23:59 25:99 1:2".match(reg)); // 00:00,10:10,23:59
```