This commit is contained in:
aruseni 2019-07-01 05:53:36 +03:00 committed by GitHub
parent 6bbe0b4313
commit cf641d4d41
No known key found for this signature in database
GPG key ID: 4AEE18F83AFDEB23

View file

@ -558,7 +558,7 @@ You can skip the section if you don't plan to support them.
### Surrogate pairs ### Surrogate pairs
Most symbols have a 2-byte code. Letters in most european languages, numbers, and even most hieroglyphs, have a 2-byte representation. All frequently used characters have 2-byte codes. Letters in most european languages, numbers, and even most hieroglyphs, have a 2-byte representation.
But 2 bytes only allow 65536 combinations and that's not enough for every possible symbol. So rare symbols are encoded with a pair of 2-byte characters called "a surrogate pair". But 2 bytes only allow 65536 combinations and that's not enough for every possible symbol. So rare symbols are encoded with a pair of 2-byte characters called "a surrogate pair".
@ -628,7 +628,7 @@ For instance:
```js run ```js run
alert( 'S\u0307\u0323' ); // Ṩ, S + dot above + dot below alert( 'S\u0307\u0323' ); // Ṩ, S + dot above + dot below
alert( 'S\u0323\u0307' ); // Ṩ, S + dot below + dot above alert( 'S\u0323\u0307' ); // Ṩ, S + dot below + dot above
alert( 'S\u0307\u0323' == 'S\u0323\u0307' ); // false alert( 'S\u0307\u0323' == 'S\u0323\u0307' ); // false
``` ```
@ -649,7 +649,7 @@ alert( "S\u0307\u0323".normalize().length ); // 1
alert( "S\u0307\u0323".normalize() == "\u1e68" ); // true alert( "S\u0307\u0323".normalize() == "\u1e68" ); // true
``` ```
In reality, this is not always the case. The reason being that the symbol `Ṩ` is "common enough", so UTF-16 creators included it in the main table and gave it the code. In reality, this is not always the case. The reason being that the symbol `` is "common enough", so UTF-16 creators included it in the main table and gave it the code.
If you want to learn more about normalization rules and variants -- they are described in the appendix of the Unicode standard: [Unicode Normalization Forms](http://www.unicode.org/reports/tr15/), but for most practical purposes the information from this section is enough. If you want to learn more about normalization rules and variants -- they are described in the appendix of the Unicode standard: [Unicode Normalization Forms](http://www.unicode.org/reports/tr15/), but for most practical purposes the information from this section is enough.