Unicode art, grammar suggestions
This commit is contained in:
parent
18b1314af4
commit
dca45f773b
1 changed files with 7 additions and 7 deletions
|
@ -2,7 +2,7 @@
|
|||
# Unicode, String internals
|
||||
|
||||
```warn header="Advanced knowledge"
|
||||
The section goes deeper into string internals. This knowledge will be useful for you if you plan to deal with emoji, rare mathematical or hieroglyphic characters or other rare symbols.
|
||||
The section goes deeper into string internals. This knowledge will be useful for you if you plan to deal with emoji, rare mathematical or hieroglyphic characters, or other rare symbols.
|
||||
```
|
||||
|
||||
As we already know, JavaScript strings are based on [Unicode](https://en.wikipedia.org/wiki/Unicode): each character is represented by a byte sequence of 1-4 bytes.
|
||||
|
@ -11,11 +11,11 @@ JavaScript allows us to insert a character into a string by specifying its hexad
|
|||
|
||||
- `\xXX`
|
||||
|
||||
`XX` must be two hexadecimal digits with value between `00` and `FF`, then it's character whose Unicode code is `XX`.
|
||||
`XX` must be two hexadecimal digits with a value between `00` and `FF`, then it's a character whose Unicode code is `XX`.
|
||||
|
||||
Because the `\xXX` notation supports only two digits, it can be used only for the first 256 Unicode characters.
|
||||
|
||||
These first 256 characters include latin alphabet, most basic syntax characters and some others. For example, `"\x7A"` is the same as `"z"` (Unicode `U+007A`).
|
||||
These first 256 characters include the latin alphabet, most basic syntax characters, and some others. For example, `"\x7A"` is the same as `"z"` (Unicode `U+007A`).
|
||||
|
||||
```js run
|
||||
alert( "\x7A" ); // z
|
||||
|
@ -23,9 +23,9 @@ JavaScript allows us to insert a character into a string by specifying its hexad
|
|||
```
|
||||
|
||||
- `\uXXXX`
|
||||
`XXXX` must be exactly 4 hex digits with the value between `0000` and `FFFF`, then `\uXXXX` is a character whose Unicode code is `XXXX` .
|
||||
`XXXX` must be exactly 4 hex digits with the value between `0000` and `FFFF`, then `\uXXXX` is a character whose Unicode code is `XXXX`.
|
||||
|
||||
Characters with Unicode value greater than `U+FFFF` can also be represented with this notation, but in this case we will need to use a so called surrogate pair (we will talk about surrogate pairs later in this chapter).
|
||||
Characters with Unicode values greater than `U+FFFF` can also be represented with this notation, but in this case, we will need to use a so called surrogate pair (we will talk about surrogate pairs later in this chapter).
|
||||
|
||||
```js run
|
||||
alert( "\u00A9" ); // ©, the same as \xA9, using the 4-digit hex notation
|
||||
|
@ -120,7 +120,7 @@ For instance, the letter `a` can be the base character for these characters: `à
|
|||
|
||||
Most common "composite" characters have their own code in the Unicode table. But not all of them, because there are too many possible combinations.
|
||||
|
||||
To support arbitrary compositions, Unicode standard allows us to use several Unicode characters: the base character followed by one or many "mark" characters that "decorate" it.
|
||||
To support arbitrary compositions, the Unicode standard allows us to use several Unicode characters: the base character followed by one or many "mark" characters that "decorate" it.
|
||||
|
||||
For instance, if we have `S` followed by the special "dot above" character (code `\u0307`), it is shown as Ṡ.
|
||||
|
||||
|
@ -167,6 +167,6 @@ alert( "S\u0307\u0323".normalize().length ); // 1
|
|||
alert( "S\u0307\u0323".normalize() == "\u1e68" ); // true
|
||||
```
|
||||
|
||||
In reality, this is not always the case. The reason being that the symbol `Ṩ` is "common enough", so Unicode creators included it in the main table and gave it the code.
|
||||
In reality, this is not always the case. The reason is that the symbol `Ṩ` is "common enough", so Unicode creators included it in the main table and gave it the code.
|
||||
|
||||
If you want to learn more about normalization rules and variants -- they are described in the appendix of the Unicode standard: [Unicode Normalization Forms](https://www.unicode.org/reports/tr15/), but for most practical purposes the information from this section is enough.
|
||||
|
|
Loading…
Add table
Add a link
Reference in a new issue