fix

2019-05-09 11:48:14 +03:00 · 2019-05-09 11:48:14 +03:00 · 5f9597d0c5
commit 5f9597d0c5
parent 7f5008e18a
1 changed files with 11 additions and 7 deletions
--- a/9-regular-expressions/03-regexp-character-classes/article.md
+++ b/9-regular-expressions/03-regexp-character-classes/article.md
@ -43,7 +43,7 @@ Most used are:
 : A space symbol: that includes spaces, tabs, newlines.

 `\w` ("w" is from "word")
-: A "wordly" character: either a letter of English alphabet or a digit or an underscore. Non-english letters (like cyrillic or hindi) do not belong to `\w`.
+: A "wordly" character: either a letter of English alphabet or a digit or an underscore. Non-Latin letters (like cyrillic or hindi) do not belong to `\w`.

 For instance, `pattern:\d\s\w` means a "digit" followed by a "space character" followed by a "wordly character", like `"1 a"`.

@ -115,7 +115,7 @@ alert( "Hello, Java!".match(/\bJava!\b/) ); // null (no match)

 Once again let's note that `pattern:\b` makes the searching engine to test for the boundary, so that `pattern:Java\b` finds `match:Java` only when followed by a word boundary, but it does not add a letter to the result.

-Usually we use `\b` to find standalone English words. So that if we want `"Java"` language then `pattern:\bJava\b` finds exactly a standalone word and ignores it when it's a part of `"JavaScript"`.
+Usually we use `\b` to find standalone English words. So that if we want `"Java"` language then `pattern:\bJava\b` finds exactly a standalone word and ignores it when it's a part of another word, e.g. it won't match `match:Java` in `subject:JavaScript`.

 Another example: a regexp `pattern:\b\d\d\b` looks for standalone two-digit numbers. In other words, it requires that before and after `pattern:\d\d` must be a symbol different from `\w` (or beginning/end of the string).

@ -125,6 +125,8 @@ alert( "1 23 456 78".match(/\b\d\d\b/g) ); // 23,78

 ```warn header="Word boundary doesn't work for non-Latin alphabets"
 The word boundary check `\b` tests for a boundary between `\w` and something else. But `\w` means an English letter (or a digit or an underscore), so the test won't work for other characters (like cyrillic or hieroglyphs).
+
+Later we'll come by Unicode character classes that allow to solve the similar task for different languages.
 ```


@ -223,13 +225,14 @@ alert( "CS4".match(/CS.4/) ); // null, no match because there's no character for

 Usually a dot doesn't match a newline character.

-For instance, this doesn't match:
+For instance, `pattern:A.B` matches `match:A`, and then `match:B` with any character between them, except a newline.
+
+This doesn't match:

 ```js run
 alert( "A\nB".match(/A.B/) ); // null (no match)

-// a space character would match
-// or a letter, but not \n
+// a space character would match, or a letter, but not \n
 ```

 Sometimes it's inconvenient, we really want "any character", newline included.
@ -240,7 +243,6 @@ That's what `s` flag does. If a regexp has it, then the dot `"."` match literall
 alert( "A\nB".match(/A.B/s) ); // A\nB (match!)
 ```

-
 ## Summary

 There exist following character classes:
@ -255,7 +257,9 @@ There exist following character classes:

 ...But that's not all!

-Modern JavaScript also allows to look for characters by their Unicode properties, for instance:
+The Unicode encoding, used by JavaScript for strings, provides many properties for characters, like: which language the letter belongs to (if a letter) it is it a punctuation sign, etc.
+
+Modern JavaScript allows to use these properties in regexps to look for characters, for instance:

 - A cyrillic letter is: `pattern:\p{Script=Cyrillic}` or `pattern:\p{sc=Cyrillic}`.
 - A dash (be it a small hyphen `-` or a long dash `—`): `pattern:\p{Dash_Punctuation}` or `pattern:\p{pd}`.