Unicode
This commit is contained in:
parent
6bbe0b4313
commit
cf641d4d41
1 changed files with 3 additions and 3 deletions
|
@ -558,7 +558,7 @@ You can skip the section if you don't plan to support them.
|
|||
|
||||
### Surrogate pairs
|
||||
|
||||
Most symbols have a 2-byte code. Letters in most european languages, numbers, and even most hieroglyphs, have a 2-byte representation.
|
||||
All frequently used characters have 2-byte codes. Letters in most european languages, numbers, and even most hieroglyphs, have a 2-byte representation.
|
||||
|
||||
But 2 bytes only allow 65536 combinations and that's not enough for every possible symbol. So rare symbols are encoded with a pair of 2-byte characters called "a surrogate pair".
|
||||
|
||||
|
@ -628,7 +628,7 @@ For instance:
|
|||
|
||||
```js run
|
||||
alert( 'S\u0307\u0323' ); // Ṩ, S + dot above + dot below
|
||||
alert( 'S\u0323\u0307' ); // Ṩ, S + dot below + dot above
|
||||
alert( 'S\u0323\u0307' ); // Ṩ, S + dot below + dot above
|
||||
|
||||
alert( 'S\u0307\u0323' == 'S\u0323\u0307' ); // false
|
||||
```
|
||||
|
@ -649,7 +649,7 @@ alert( "S\u0307\u0323".normalize().length ); // 1
|
|||
alert( "S\u0307\u0323".normalize() == "\u1e68" ); // true
|
||||
```
|
||||
|
||||
In reality, this is not always the case. The reason being that the symbol `Ṩ` is "common enough", so UTF-16 creators included it in the main table and gave it the code.
|
||||
In reality, this is not always the case. The reason being that the symbol `Ṩ` is "common enough", so UTF-16 creators included it in the main table and gave it the code.
|
||||
|
||||
If you want to learn more about normalization rules and variants -- they are described in the appendix of the Unicode standard: [Unicode Normalization Forms](http://www.unicode.org/reports/tr15/), but for most practical purposes the information from this section is enough.
|
||||
|
||||
|
|
Loading…
Add table
Add a link
Reference in a new issue