WIP
This commit is contained in:
parent
ef370b6ace
commit
f21cb0a2f4
71 changed files with 707 additions and 727 deletions
|
@ -0,0 +1,33 @@
|
|||
|
||||
The first idea can be to list the languages with `|` in-between.
|
||||
|
||||
But that doesn't work right:
|
||||
|
||||
```js run
|
||||
let reg = /Java|JavaScript|PHP|C|C\+\+/g;
|
||||
|
||||
let str = "Java, JavaScript, PHP, C, C++";
|
||||
|
||||
alert( str.match(reg) ); // Java,Java,PHP,C,C
|
||||
```
|
||||
|
||||
The regular expression engine looks for alternations one-by-one. That is: first it checks if we have `match:Java`, otherwise -- looks for `match:JavaScript` and so on.
|
||||
|
||||
As a result, `match:JavaScript` can never be found, just because `match:Java` is checked first.
|
||||
|
||||
The same with `match:C` and `match:C++`.
|
||||
|
||||
There are two solutions for that problem:
|
||||
|
||||
1. Change the order to check the longer match first: `pattern:JavaScript|Java|C\+\+|C|PHP`.
|
||||
2. Merge variants with the same start: `pattern:Java(Script)?|C(\+\+)?|PHP`.
|
||||
|
||||
In action:
|
||||
|
||||
```js run
|
||||
let reg = /Java(Script)?|C(\+\+)?|PHP/g;
|
||||
|
||||
let str = "Java, JavaScript, PHP, C, C++";
|
||||
|
||||
alert( str.match(reg) ); // Java,JavaScript,PHP,C,C++
|
||||
```
|
|
@ -0,0 +1,11 @@
|
|||
# Find programming languages
|
||||
|
||||
There are many programming languages, for instance Java, JavaScript, PHP, C, C++.
|
||||
|
||||
Create a regexp that finds them in the string `subject:Java JavaScript PHP C++ C`:
|
||||
|
||||
```js
|
||||
let reg = /your regexp/g;
|
||||
|
||||
alert("Java JavaScript PHP C++ C".match(reg)); // Java JavaScript PHP C++ C
|
||||
```
|
|
@ -0,0 +1,23 @@
|
|||
|
||||
Opening tag is `pattern:\[(b|url|quote)\]`.
|
||||
|
||||
Then to find everything till the closing tag -- let's use the pattern `pattern:.*?` with flag `pattern:s` to match any character including the newline and then add a backreference to the closing tag.
|
||||
|
||||
The full pattern: `pattern:\[(b|url|quote)\].*?\[/\1\]`.
|
||||
|
||||
In action:
|
||||
|
||||
```js run
|
||||
let reg = /\[(b|url|quote)\].*?\[\/\1\]/gs;
|
||||
|
||||
let str = `
|
||||
[b]hello![/b]
|
||||
[quote]
|
||||
[url]http://google.com[/url]
|
||||
[/quote]
|
||||
`;
|
||||
|
||||
alert( str.match(reg) ); // [b]hello![/b],[quote][url]http://google.com[/url][/quote]
|
||||
```
|
||||
|
||||
Please note that we had to escape a slash for the closing tag `pattern:[/\1]`, because normally the slash closes the pattern.
|
|
@ -0,0 +1,48 @@
|
|||
# Find bbtag pairs
|
||||
|
||||
A "bb-tag" looks like `[tag]...[/tag]`, where `tag` is one of: `b`, `url` or `quote`.
|
||||
|
||||
For instance:
|
||||
```
|
||||
[b]text[/b]
|
||||
[url]http://google.com[/url]
|
||||
```
|
||||
|
||||
BB-tags can be nested. But a tag can't be nested into itself, for instance:
|
||||
|
||||
```
|
||||
Normal:
|
||||
[url] [b]http://google.com[/b] [/url]
|
||||
[quote] [b]text[/b] [/quote]
|
||||
|
||||
Impossible:
|
||||
[b][b]text[/b][/b]
|
||||
```
|
||||
|
||||
Tags can contain line breaks, that's normal:
|
||||
|
||||
```
|
||||
[quote]
|
||||
[b]text[/b]
|
||||
[/quote]
|
||||
```
|
||||
|
||||
Create a regexp to find all BB-tags with their contents.
|
||||
|
||||
For instance:
|
||||
|
||||
```js
|
||||
let reg = /your regexp/flags;
|
||||
|
||||
let str = "..[url]http://google.com[/url]..";
|
||||
alert( str.match(reg) ); // [url]http://google.com[/url]
|
||||
```
|
||||
|
||||
If tags are nested, then we need the outer tag (if we want we can continue the search in its content):
|
||||
|
||||
```js
|
||||
let reg = /your regexp/flags;
|
||||
|
||||
let str = "..[url][b]http://google.com[/b][/url]..";
|
||||
alert( str.match(reg) ); // [url][b]http://google.com[/b][/url]
|
||||
```
|
|
@ -0,0 +1,17 @@
|
|||
The solution: `pattern:/"(\\.|[^"\\])*"/g`.
|
||||
|
||||
Step by step:
|
||||
|
||||
- First we look for an opening quote `pattern:"`
|
||||
- Then if we have a backslash `pattern:\\` (we technically have to double it in the pattern, because it is a special character, so that's a single backslash in fact), then any character is fine after it (a dot).
|
||||
- Otherwise we take any character except a quote (that would mean the end of the string) and a backslash (to prevent lonely backslashes, the backslash is only used with some other symbol after it): `pattern:[^"\\]`
|
||||
- ...And so on till the closing quote.
|
||||
|
||||
In action:
|
||||
|
||||
```js run
|
||||
let reg = /"(\\.|[^"\\])*"/g;
|
||||
let str = ' .. "test me" .. "Say \\"Hello\\"!" .. "\\\\ \\"" .. ';
|
||||
|
||||
alert( str.match(reg) ); // "test me","Say \"Hello\"!","\\ \""
|
||||
```
|
|
@ -0,0 +1,32 @@
|
|||
# Find quoted strings
|
||||
|
||||
Create a regexp to find strings in double quotes `subject:"..."`.
|
||||
|
||||
The strings should support escaping, the same way as JavaScript strings do. For instance, quotes can be inserted as `subject:\"` a newline as `subject:\n`, and the slash itself as `subject:\\`.
|
||||
|
||||
```js
|
||||
let str = "Just like \"here\".";
|
||||
```
|
||||
|
||||
Please note, in particular, that an escaped quote `subject:\"` does not end a string.
|
||||
|
||||
So we should search from one quote to the other ignoring escaped quotes on the way.
|
||||
|
||||
That's the essential part of the task, otherwise it would be trivial.
|
||||
|
||||
Examples of strings to match:
|
||||
```js
|
||||
.. *!*"test me"*/!* ..
|
||||
.. *!*"Say \"Hello\"!"*/!* ... (escaped quotes inside)
|
||||
.. *!*"\\"*/!* .. (double slash inside)
|
||||
.. *!*"\\ \""*/!* .. (double slash and an escaped quote inside)
|
||||
```
|
||||
|
||||
In JavaScript we need to double the slashes to pass them right into the string, like this:
|
||||
|
||||
```js run
|
||||
let str = ' .. "test me" .. "Say \\"Hello\\"!" .. "\\\\ \\"" .. ';
|
||||
|
||||
// the in-memory string
|
||||
alert(str); // .. "test me" .. "Say \"Hello\"!" .. "\\ \"" ..
|
||||
```
|
|
@ -0,0 +1,16 @@
|
|||
|
||||
The pattern start is obvious: `pattern:<style`.
|
||||
|
||||
...But then we can't simply write `pattern:<style.*?>`, because `match:<styler>` would match it.
|
||||
|
||||
We need either a space after `match:<style` and then optionally something else or the ending `match:>`.
|
||||
|
||||
In the regexp language: `pattern:<style(>|\s.*?>)`.
|
||||
|
||||
In action:
|
||||
|
||||
```js run
|
||||
let reg = /<style(>|\s.*?>)/g;
|
||||
|
||||
alert( '<style> <styler> <style test="...">'.match(reg) ); // <style>, <style test="...">
|
||||
```
|
|
@ -0,0 +1,13 @@
|
|||
# Find the full tag
|
||||
|
||||
Write a regexp to find the tag `<style...>`. It should match the full tag: it may have no attributes `<style>` or have several of them `<style type="..." id="...">`.
|
||||
|
||||
...But the regexp should not match `<styler>`!
|
||||
|
||||
For instance:
|
||||
|
||||
```js
|
||||
let reg = /your regexp/g;
|
||||
|
||||
alert( '<style> <styler> <style test="...">'.match(reg) ); // <style>, <style test="...">
|
||||
```
|
59
9-regular-expressions/13-regexp-alternation/article.md
Normal file
59
9-regular-expressions/13-regexp-alternation/article.md
Normal file
|
@ -0,0 +1,59 @@
|
|||
# Alternation (OR) |
|
||||
|
||||
Alternation is the term in regular expression that is actually a simple "OR".
|
||||
|
||||
In a regular expression it is denoted with a vertical line character `pattern:|`.
|
||||
|
||||
For instance, we need to find programming languages: HTML, PHP, Java or JavaScript.
|
||||
|
||||
The corresponding regexp: `pattern:html|php|java(script)?`.
|
||||
|
||||
A usage example:
|
||||
|
||||
```js run
|
||||
let reg = /html|php|css|java(script)?/gi;
|
||||
|
||||
let str = "First HTML appeared, then CSS, then JavaScript";
|
||||
|
||||
alert( str.match(reg) ); // 'HTML', 'CSS', 'JavaScript'
|
||||
```
|
||||
|
||||
We already know a similar thing -- square brackets. They allow to choose between multiple character, for instance `pattern:gr[ae]y` matches `match:gray` or `match:grey`.
|
||||
|
||||
Square brackets allow only characters or character sets. Alternation allows any expressions. A regexp `pattern:A|B|C` means one of expressions `A`, `B` or `C`.
|
||||
|
||||
For instance:
|
||||
|
||||
- `pattern:gr(a|e)y` means exactly the same as `pattern:gr[ae]y`.
|
||||
- `pattern:gra|ey` means `match:gra` or `match:ey`.
|
||||
|
||||
To separate a part of the pattern for alternation we usually enclose it in parentheses, like this: `pattern:before(XXX|YYY)after`.
|
||||
|
||||
## Regexp for time
|
||||
|
||||
In previous chapters there was a task to build a regexp for searching time in the form `hh:mm`, for instance `12:00`. But a simple `pattern:\d\d:\d\d` is too vague. It accepts `25:99` as the time (as 99 seconds match the pattern).
|
||||
|
||||
How can we make a better one?
|
||||
|
||||
We can apply more careful matching. First, the hours:
|
||||
|
||||
- If the first digit is `0` or `1`, then the next digit can by anything.
|
||||
- Or, if the first digit is `2`, then the next must be `pattern:[0-3]`.
|
||||
|
||||
As a regexp: `pattern:[01]\d|2[0-3]`.
|
||||
|
||||
Next, the minutes must be from `0` to `59`. In the regexp language that means `pattern:[0-5]\d`: the first digit `0-5`, and then any digit.
|
||||
|
||||
Let's glue them together into the pattern: `pattern:[01]\d|2[0-3]:[0-5]\d`.
|
||||
|
||||
We're almost done, but there's a problem. The alternation `pattern:|` now happens to be between `pattern:[01]\d` and `pattern:2[0-3]:[0-5]\d`.
|
||||
|
||||
That's wrong, as it should be applied only to hours `[01]\d` OR `2[0-3]`. That's a common mistake when starting to work with regular expressions.
|
||||
|
||||
The correct variant:
|
||||
|
||||
```js run
|
||||
let reg = /([01]\d|2[0-3]):[0-5]\d/g;
|
||||
|
||||
alert("00:00 10:10 23:59 25:99 1:2".match(reg)); // 00:00,10:10,23:59
|
||||
```
|
Loading…
Add table
Add a link
Reference in a new issue