Merge pull request #1101 from aruseni/patch-4

Unicode
This commit is contained in:
Ilya Kantor 2019-07-01 12:48:42 +03:00 committed by GitHub
commit d303295945
No known key found for this signature in database
GPG key ID: 4AEE18F83AFDEB23

View file

@ -558,7 +558,7 @@ You can skip the section if you don't plan to support them.
### Surrogate pairs
Most symbols have a 2-byte code. Letters in most european languages, numbers, and even most hieroglyphs, have a 2-byte representation.
All frequently used characters have 2-byte codes. Letters in most european languages, numbers, and even most hieroglyphs, have a 2-byte representation.
But 2 bytes only allow 65536 combinations and that's not enough for every possible symbol. So rare symbols are encoded with a pair of 2-byte characters called "a surrogate pair".
@ -628,7 +628,7 @@ For instance:
```js run
alert( 'S\u0307\u0323' ); // Ṩ, S + dot above + dot below
alert( 'S\u0323\u0307' ); // Ṩ, S + dot below + dot above
alert( 'S\u0323\u0307' ); // Ṩ, S + dot below + dot above
alert( 'S\u0307\u0323' == 'S\u0323\u0307' ); // false
```
@ -649,7 +649,7 @@ alert( "S\u0307\u0323".normalize().length ); // 1
alert( "S\u0307\u0323".normalize() == "\u1e68" ); // true
```
In reality, this is not always the case. The reason being that the symbol `Ṩ` is "common enough", so UTF-16 creators included it in the main table and gave it the code.
In reality, this is not always the case. The reason being that the symbol `` is "common enough", so UTF-16 creators included it in the main table and gave it the code.
If you want to learn more about normalization rules and variants -- they are described in the appendix of the Unicode standard: [Unicode Normalization Forms](http://www.unicode.org/reports/tr15/), but for most practical purposes the information from this section is enough.