works
Before Width: | Height: | Size: 117 KiB After Width: | Height: | Size: 117 KiB |
Before Width: | Height: | Size: 227 KiB After Width: | Height: | Size: 227 KiB |
Before Width: | Height: | Size: 77 KiB After Width: | Height: | Size: 77 KiB |
Before Width: | Height: | Size: 155 KiB After Width: | Height: | Size: 155 KiB |
57
1-js/3-code-quality/5-polyfills/article.md
Normal file
|
@ -0,0 +1,57 @@
|
||||||
|
|
||||||
|
# Polyfills
|
||||||
|
|
||||||
|
The JavaScript language steadily evolves. The new proposals get analyzed and, if they look worthy, are appended to the list at <https://tc39.github.io/ecma262/> and then progress to the [specification](http://www.ecma-international.org/publications/standards/Ecma-262.htm).
|
||||||
|
|
||||||
|
Each JS engine has its own idea about what to implement first. It may implement proposals that are not approved yet and fail to implement things that are already in the spec, because they are less interesting or just harder to do.
|
||||||
|
|
||||||
|
So it's quite common for an engine to implement only the part of the standard.
|
||||||
|
|
||||||
|
A good page to see the current state of support for language features is <https://kangax.github.io/compat-table/es6/> (remember the link to use in the future when you know the language).
|
||||||
|
|
||||||
|
## Babel.JS
|
||||||
|
|
||||||
|
When we use all the modern features of the language, some engines may fail to support such code. Just as it was said, not all features are implemented everywhere.
|
||||||
|
|
||||||
|
Here Babel.JS comes to the rescue.
|
||||||
|
|
||||||
|
[Babel.JS](https://babeljs.io) is a [transpiler](https://en.wikipedia.org/wiki/Source-to-source_compiler). It rewrites the modern JavaScript code into the previous standard.
|
||||||
|
|
||||||
|
Actually, there are two parts in Babel:
|
||||||
|
|
||||||
|
1. The transpiler program, which rewrites the code.
|
||||||
|
|
||||||
|
The transpiler runs on a developer's computer. It rewrites the code, which is then bundled by a project build system (like [webpack](http://webpack.github.io/) or [brunch](http://brunch.io/)). Most build systems can support Babel easily.
|
||||||
|
|
||||||
|
2. The polyfill.
|
||||||
|
|
||||||
|
For some functions we also need add a special script that should run before our scripts and introduce modern functions that the engine may not support by itself. There's a term "polyfill" for such scripts.
|
||||||
|
|
||||||
|
The two interesting variants are [babel polyfill](https://babeljs.io/docs/usage/polyfill/) that supports a lot, but is big and the [polyfill.io](http://polyfill.io) service that allows to load/construct polyfills on-demand, depending on the features we need.
|
||||||
|
|
||||||
|
The transpiler and/or polyfill may be not needed if we orient towards more-or-less modern engines and don't use rarely supported features.
|
||||||
|
|
||||||
|
## Examples in the tutorial
|
||||||
|
|
||||||
|
```warn header="Browser support is required"
|
||||||
|
Examples that use modern JS will work only if your browser supports it.
|
||||||
|
```
|
||||||
|
|
||||||
|
````online
|
||||||
|
Most examples are runnable at-place, like here:
|
||||||
|
|
||||||
|
```js run
|
||||||
|
alert('Press the "Play" button in the upper-right corner to run');
|
||||||
|
```
|
||||||
|
|
||||||
|
...But if it uses a feature that your browser does not support, an error is shown.
|
||||||
|
|
||||||
|
That doesn't mean that the example is wrong! It's just the browser lacking the support for certain features yet.
|
||||||
|
````
|
||||||
|
|
||||||
|
[Chrome Canary](https://www.google.com/chrome/browser/canary.html) is good for more examples.
|
||||||
|
|
||||||
|
Note that on production we can use Babel to translate the code into suitable for less recent browsers, so there will be no such limitation, the code will run everywhere.
|
||||||
|
|
||||||
|
Now we can go coding, so let's choose a good code editor.
|
||||||
|
|
|
@ -4,7 +4,7 @@ All numbers in JavaScript are stored in 64-bit format [IEEE-754](http://en.wikip
|
||||||
|
|
||||||
Let's recap what we know about them and add a little bit more.
|
Let's recap what we know about them and add a little bit more.
|
||||||
|
|
||||||
## Advanced ways to write
|
## More ways to write a number
|
||||||
|
|
||||||
Imagine, we need to write a billion. The obvious way is:
|
Imagine, we need to write a billion. The obvious way is:
|
||||||
|
|
||||||
|
@ -54,18 +54,19 @@ In other words, a negative number after `e` means a division by 1 with the given
|
||||||
1.23e-6 = 1.23 / 1000000 (=0.00000123)
|
1.23e-6 = 1.23 / 1000000 (=0.00000123)
|
||||||
```
|
```
|
||||||
|
|
||||||
## Hex, binary and octal numbers
|
### Hex, binary and octal numbers
|
||||||
|
|
||||||
Hexadecimal numbers are widely used in JavaScript: to represent colors, encode characters and for many other things. So there exists a short way to write them: `0x` and then the number.
|
[Hexadecimal](https://en.wikipedia.org/wiki/Hexadecimal) numbers are widely used in JavaScript: to represent colors, encode characters and for many other things. So there exists a short way to write them: `0x` and then the number.
|
||||||
|
|
||||||
For instance:
|
For instance:
|
||||||
|
|
||||||
```js run
|
```js run
|
||||||
alert( 0xff ); // 255
|
alert( 0xff ); // 255
|
||||||
alert( 0xFF ); // the same, letters can be uppercased, doesn't matter
|
alert( 0xFF ); // 255 (the same, case doesn't matter)
|
||||||
```
|
```
|
||||||
|
|
||||||
Binary and octal numeral systems are rarely used, but also possible to write them right way:
|
Binary and octal numeral systems are rarely used, but also supported using `0b` and `0o` prefixes:
|
||||||
|
|
||||||
|
|
||||||
```js run
|
```js run
|
||||||
let a = 0b11111111; // binary form of 255
|
let a = 0b11111111; // binary form of 255
|
||||||
|
@ -74,38 +75,36 @@ let b = 0o377; // octal form of 255
|
||||||
alert( a == b ); // true, the same number 255 at both sides
|
alert( a == b ); // true, the same number 255 at both sides
|
||||||
```
|
```
|
||||||
|
|
||||||
So as you can see, we prepend the number with `0x` for a hex, `0b` for a binary and `0o` for an octal.
|
There are only 3 numeral systems with such support. For other numeral systems we should use function `parseInt` (goes later in this chapter).
|
||||||
|
|
||||||
## toString(base)
|
## toString(base)
|
||||||
|
|
||||||
There is also a "reverse" method `num.toString(base)` that returns a string representation of `num` in the given `base`.
|
There method `num.toString(base)` returns a string representation of `num` in the numeral system with the given `base`.
|
||||||
|
|
||||||
For example:
|
For example:
|
||||||
```js run
|
```js run
|
||||||
let num = 255;
|
let num = 255;
|
||||||
|
|
||||||
alert( num.toString(2) ); // 11111111
|
alert( num.toString(16) ); // ff
|
||||||
alert( num.toString(8) ); // 377
|
alert( num.toString(2) ); // 11111111
|
||||||
```
|
```
|
||||||
|
|
||||||
The `base` can vary from `2` to `36`.
|
The `base` can vary from `2` to `36`.
|
||||||
|
|
||||||
Most often use cases are:
|
Most often use cases are:
|
||||||
|
|
||||||
- **16**, because hexadecimal numeral system is used for colors, character encodings etc, digits can be `0..9` or `A..F`.
|
- **base=16** is used for colors, character encodings etc, digits can be `0..9` or `A..F`.
|
||||||
- **2** is mostly for debugging bitwise operations, digits can be only `0` or `1`.
|
- **base=2** is mostly for debugging bitwise operations, digits can be `0` or `1`.
|
||||||
- **36** is the maximum, digits can be `0..9` or `A..Z`. The whole latin alphabet is used to represent a number.
|
- **base=36** is the maximum, digits can be `0..9` or `A..Z`. The whole latin alphabet is used to represent a number. A funny, but useful case for `36` is when we need to turn a long numeric identifier into something shorter, for example to make a short url. Can simply represent it in the numeral system with base `36`:
|
||||||
|
|
||||||
A funny, but useful case for `36` is when we need to turn a long numeric identifier into something shorter, for example to make a short url. The base-36 notation is an easy way to go:
|
```js run
|
||||||
|
alert( 123456..toString(36) ); // 2n9c
|
||||||
```js run
|
```
|
||||||
alert( 123456..toString(36) ); // 2n9c
|
|
||||||
```
|
|
||||||
|
|
||||||
```warn header="Two dots to call a method"
|
```warn header="Two dots to call a method"
|
||||||
If we want to call a method directly on a number, like `toString` in the example above, then we need to place two dots `..` after it.
|
Please note that two dots in `123456..toString(36)` is not a typo. If we want to call a method directly on a number, like `toString` in the example above, then we need to place two dots `..` after it.
|
||||||
|
|
||||||
If we place a single dot: `123456.toString(36)`, then there will be an error, because JavaScript awaits the decimal part after the dot. And if we place one more dot, then JavaScript knows that the number has finished and we mean the method.
|
If we placed a single dot: `123456.toString(36)`, then there would be an error, because JavaScript syntax implies the decimal part after the first dot. And if we place one more dot, then JavaScript knows that the decimal part is empty and now goes the method.
|
||||||
```
|
```
|
||||||
|
|
||||||
## Rounding
|
## Rounding
|
||||||
|
@ -126,12 +125,9 @@ There are following built-in functions for rounding:
|
||||||
`Math.trunc` (not supported by Internet Explorer)
|
`Math.trunc` (not supported by Internet Explorer)
|
||||||
: Removes the decimal part: `3.1` becomes `3`, `-1.1` becomes `-1`.
|
: Removes the decimal part: `3.1` becomes `3`, `-1.1` becomes `-1`.
|
||||||
|
|
||||||
|
Here's the table to summarize the differences between them:
|
||||||
|
|
||||||
Looks simple, right? Indeed it is.
|
| | `Math.floor` | `Math.ceil` | `Math.round` | `Math.trunc` |
|
||||||
|
|
||||||
Here's the table to make edge cases more obvious:
|
|
||||||
|
|
||||||
| | `floor` | `ceil` | `round` | `trunc` |
|
|
||||||
|---|---------|--------|---------|---------|
|
|---|---------|--------|---------|---------|
|
||||||
|`3.1`| `3` | `4` | `3` | `3` |
|
|`3.1`| `3` | `4` | `3` | `3` |
|
||||||
|`3.6`| `3` | `4` | `4` | `3` |
|
|`3.6`| `3` | `4` | `4` | `3` |
|
||||||
|
@ -139,9 +135,7 @@ Here's the table to make edge cases more obvious:
|
||||||
|`-1.6`| `-2` | `-1` | `-2` | `-1` |
|
|`-1.6`| `-2` | `-1` | `-2` | `-1` |
|
||||||
|
|
||||||
|
|
||||||
These functions cover all possible ways to deal with the decimal part.
|
These functions cover all possible ways to deal with the decimal part as a whole. But what if we'd like to round the number to `n-th` digit after the point?
|
||||||
|
|
||||||
But what if we'd like to round the number to `n-th` digit after the point?
|
|
||||||
|
|
||||||
For instance, we have `1.2345` and want to round it to 2 digits, getting only `1.23`.
|
For instance, we have `1.2345` and want to round it to 2 digits, getting only `1.23`.
|
||||||
|
|
||||||
|
@ -170,77 +164,66 @@ There are two ways to do so.
|
||||||
alert( num.toFixed(1) ); // "12.4"
|
alert( num.toFixed(1) ); // "12.4"
|
||||||
```
|
```
|
||||||
|
|
||||||
The resulting string is zero-padded to the required precision if needed:
|
Please note that result of `toFixed` is a string. If the decimal part is shorter than required, zeroes are appended to its end:
|
||||||
|
|
||||||
```js run
|
```js run
|
||||||
let num = 12.34;
|
let num = 12.34;
|
||||||
alert( num.toFixed(5) ); // "12.34000", added zeroes to make exactly 5 digits
|
alert( num.toFixed(5) ); // "12.34000", added zeroes to make exactly 5 digits
|
||||||
```
|
```
|
||||||
|
|
||||||
Let's note once again that the result is a string. We can convert it to a number using the unary plus or a `Number()` call.
|
We can convert it to a number using the unary plus or a `Number()` call: `+num.toFixed(5)`.
|
||||||
|
|
||||||
## Imprecise calculations
|
## Imprecise calculations
|
||||||
|
|
||||||
Internally, each number occupies 64 bits, 52 of them are used to store the digits, 11 of them store the location of the point (to allow fractions) and 1 is the sign.
|
Internally, a number is represented in 64-bit format [IEEE-754](http://en.wikipedia.org/wiki/IEEE_754-1985). So, there are exactly 64 bits to store a number: 52 of them are used to store the digits, 11 of them store the position of the decimal point (they are zero for integer numbers) and 1 bit for the sign.
|
||||||
|
|
||||||
If a number is too big, it would overflow the storage, potentially giving an infinity:
|
If a number is too big, it would overflow the 64-bit storage, potentially giving an infinity:
|
||||||
|
|
||||||
```js run
|
```js run
|
||||||
alert( 1e500 ); // Infinity
|
alert( 1e500 ); // Infinity
|
||||||
```
|
```
|
||||||
|
|
||||||
But what happens much more often is the loss of precision.
|
But what may be a little bit more obvious, but happens much often is the loss of precision.
|
||||||
|
|
||||||
Consider this:
|
Consider this (falsy!) test:
|
||||||
|
|
||||||
```js run
|
```js run
|
||||||
alert( 0.1 + 0.2 == 0.3 ); // *!*false*/!*
|
alert( 0.1 + 0.2 == 0.3 ); // *!*false*/!*
|
||||||
```
|
```
|
||||||
|
|
||||||
Yes, you got that right, the sum of `0.1` and `0.2` is not `0.3`. What is it then?
|
Yes, indeed, if we check whether the sum of `0.1` and `0.2` is `0.3`, we get `false`.
|
||||||
|
|
||||||
|
Strange! What is it then if not `0.3`?
|
||||||
|
|
||||||
```js run
|
```js run
|
||||||
alert( 0.1 + 0.2 ); // 0.30000000000000004
|
alert( 0.1 + 0.2 ); // 0.30000000000000004
|
||||||
```
|
```
|
||||||
|
|
||||||
Imagine you're making an e-shopping site and the visitor puts `$0.10` and `$0.20` goods into his chart. The order total will be `$0.30000000000000004`. That would surprise anyone.
|
Ouch! There are more consequences than an incorrect comparison here. Imagine you're making an e-shopping site and the visitor puts `$0.10` and `$0.20` goods into his chart. The order total will be `$0.30000000000000004`. That would surprise anyone.
|
||||||
|
|
||||||
Why does it work like that?
|
Why does it work like that?
|
||||||
|
|
||||||
A number is stored in memory in it's binary form, as a sequence of ones and zeroes. But decimal fractions like `0.1`, `0.2` are actually unending fractions in their binary form.
|
A number is stored in memory in it's binary form, as a sequence of ones and zeroes. But fractions like `0.1`, `0.2` that look simple in the decimal numeric system are actually unending fractions in their binary form.
|
||||||
|
|
||||||
In other words, what is `0.1`? It is one divided by ten.
|
In other words, what is `0.1`? It is one divided by ten `1/10`, one-tenth. In decimal numeral system such numbers are easily representable. Compare it to one-third: `1/3`. It becomes an endless fraction `0.33333(3)`.
|
||||||
|
|
||||||
But in the binary numeral system, we can only get "clean" division by the powers of two:
|
So, division by powers `10` is guaranteed to look well in the decimal system, but the division by `3` is not. For the same reason, in the binary numeral system, the division by powers of `2` is guaranteed to look good, but `1/10` becomes an endless binary fraction.
|
||||||
|
|
||||||
```js
|
There's just no way to store *exactly 0.1* or *exactly 0.2* in the binary system, just like there is no way to store one-third as a decimal fraction.
|
||||||
// binary numbers
|
|
||||||
10 = 1 * 2
|
|
||||||
100 = 1 * 4
|
|
||||||
1000 = 1 * 8
|
|
||||||
...
|
|
||||||
// now the reverse
|
|
||||||
0.1 = 1 / 2
|
|
||||||
0.01 = 1 / 4
|
|
||||||
0.001 = 1 / 8
|
|
||||||
...
|
|
||||||
```
|
|
||||||
|
|
||||||
So, there's just no way to store *exactly 0.1* or *exactly 0.2* as a binary fraction. Just like there is no way to store one-third as a decimal fraction.
|
The numeric format IEEE-754 solves that by storing the nearest possible number. There are rounding rules that normally don't allow us to see that "tiny precision loss", so the number shows up as `0.3`. But the loss still exists.
|
||||||
|
|
||||||
The numeric format "fixes" that by storing the nearest possible number. There are rounding rules that normally don't allow us to see that "tiny precision loss", but it still exists.
|
|
||||||
|
|
||||||
We can see it like this:
|
We can see it like this:
|
||||||
```js run
|
```js run
|
||||||
alert( 0.1.toFixed(20) ); // 0.10000000000000000555
|
alert( 0.1.toFixed(20) ); // 0.10000000000000000555
|
||||||
```
|
```
|
||||||
|
|
||||||
And when we sum two numbers, then their "precision losses" sum up too.
|
And when we sum two numbers, then their "precision losses" sum up.
|
||||||
|
|
||||||
That's why `0.1 + 0.2` is not exactly `0.3`.
|
That's why `0.1 + 0.2` is not exactly `0.3`.
|
||||||
|
|
||||||
```smart header="Not only JavaScript"
|
```smart header="Not only JavaScript"
|
||||||
The same problem exists in many other programming languages.
|
The same issue exists in many other programming languages.
|
||||||
|
|
||||||
PHP, Java, C, Perl, Ruby give exactly the same result, because they are based on the same numeric format.
|
PHP, Java, C, Perl, Ruby give exactly the same result, because they are based on the same numeric format.
|
||||||
```
|
```
|
||||||
|
@ -267,14 +250,15 @@ Can we work around the problem? Sure, there's a number of ways:
|
||||||
alert( (0.1*10 + 0.2*10) / 10 ); // 0.3
|
alert( (0.1*10 + 0.2*10) / 10 ); // 0.3
|
||||||
```
|
```
|
||||||
|
|
||||||
It works, because `0.1*10 = 1` and `0.2 * 10 = 2` are integers. The rounding rules of the format fix the precision loss in the process of multiplication. Now the resulting integer numbers can now be exactly represented in the binary format.
|
It works, because when we get `0.1*10 = 1` and `0.2 * 10 = 2` then both numbers are integers, there's no precision loss for them.
|
||||||
|
|
||||||
3. If it's a shop, then the most radical solution would be to store all prices in cents. No fractions at all. But what if we apply a discount of 30%? In practice, totally evading fractions is rarely feasible, so the solutions listed above are here to help.
|
3. If it's a shop, then the most radical solution would be to store all prices in cents. No fractions at all. But what if we apply a discount of 30%? In practice, totally evading fractions is rarely feasible, so the solutions listed above are here to help.
|
||||||
|
|
||||||
````smart header="The funny thing"
|
````smart header="The funny thing"
|
||||||
Hello! I'm a self-increasing number!
|
Try running this:
|
||||||
|
|
||||||
```js run
|
```js run
|
||||||
|
// Hello! I'm a self-increasing number!
|
||||||
alert( 9999999999999999 ); // shows 10000000000000000
|
alert( 9999999999999999 ); // shows 10000000000000000
|
||||||
```
|
```
|
||||||
|
|
||||||
|
@ -283,31 +267,24 @@ The reason is the same: loss of precision. There are 64 bits for the number, 52
|
||||||
JavaScript doesn't trigger an error in such case. It does the best to fit the number into the format. Unfortunately, the format is not big enough.
|
JavaScript doesn't trigger an error in such case. It does the best to fit the number into the format. Unfortunately, the format is not big enough.
|
||||||
````
|
````
|
||||||
|
|
||||||
## parseInt and parseFloat
|
|
||||||
|
|
||||||
We already know the easiest way to convert a value into a number. It's the unary plus!
|
|
||||||
|
|
||||||
But in web-programming we sometimes meet values that
|
|
||||||
|
|
||||||
|
|
||||||
## Tests: isFinite and isNaN
|
## Tests: isFinite and isNaN
|
||||||
|
|
||||||
Remember those two special numeric values?
|
Remember the two special numeric values?
|
||||||
|
|
||||||
- `Infinite` (and `-Infinite`) is a special numeric value that is greater (less) than anything.
|
- `Infinite` (and `-Infinite`) is a special numeric value that is greater (less) than anything.
|
||||||
- `NaN` represends an error.
|
- `NaN` represends an error.
|
||||||
|
|
||||||
There are special functions to check for them:
|
They belong to the type `number`, but are not "normal" numbers, so there are special functions to check for them:
|
||||||
|
|
||||||
|
|
||||||
- `isNaN(value)` converts its argument to a number and then tests if for being `NaN:
|
- `isNaN(value)` converts its argument to a number and then tests if for being `NaN`:
|
||||||
|
|
||||||
```js run
|
```js run
|
||||||
alert( isNaN(NaN) ); // true
|
alert( isNaN(NaN) ); // true
|
||||||
alert( isNaN("str") ); // true
|
alert( isNaN("str") ); // true
|
||||||
```
|
```
|
||||||
|
|
||||||
But can't we just use `===` here? Funny, but no. The value `NaN` is unique. It does not equal anything including itself:
|
But do we need the function? Can we just use the comparison `=== NaN`? Funny, but no. The value `NaN` is unique in that it does not equal anything including itself:
|
||||||
|
|
||||||
```js run
|
```js run
|
||||||
alert( NaN === NaN ); // false
|
alert( NaN === NaN ); // false
|
||||||
|
@ -327,15 +304,15 @@ Sometimes `isFinite` is used to validate the string value for being a regular nu
|
||||||
```js run
|
```js run
|
||||||
let num = +prompt("Enter a number", '');
|
let num = +prompt("Enter a number", '');
|
||||||
|
|
||||||
// isFinite will be true only for regular numbers
|
// will be true unless you enter Infinity, -Infinity or not a number
|
||||||
alert(`num:${num}, isFinite:${isFinite(num)}`);
|
alert( isFinite(num) );
|
||||||
```
|
```
|
||||||
|
|
||||||
Please note that an empty or a space-only string is treated as `0` in the described case. If it's not what's needed, then additional checks are required.
|
Please note that an empty or a space-only string is treated as `0` in all numeric functions. If it's not what's needed, then additional checks are required.
|
||||||
|
|
||||||
## parseInt and parseFloat
|
## parseInt and parseFloat
|
||||||
|
|
||||||
Regular numeric conversion is harsh. If a value is not exactly a number, it fails:
|
The numeric conversion using a plus `+` or `Number()` is strict. If a value is not exactly a number, it fails:
|
||||||
|
|
||||||
```js run
|
```js run
|
||||||
alert( +"100px" ); // NaN
|
alert( +"100px" ); // NaN
|
||||||
|
@ -404,8 +381,7 @@ A few examples:
|
||||||
alert( Math.pow(2, 10) ); // 2 in power 10 = 1024
|
alert( Math.pow(2, 10) ); // 2 in power 10 = 1024
|
||||||
```
|
```
|
||||||
|
|
||||||
|
There are more functions and constants in `Math`, including trigonometry, you can find them in the [docs for the Math](https://developer.mozilla.org/en/docs/Web/JavaScript/Reference/Global_Objects/Math) object.
|
||||||
You can find the full list of functions in the docs for the [Math](https://developer.mozilla.org/en/docs/Web/JavaScript/Reference/Global_Objects/Math) object.
|
|
||||||
|
|
||||||
## Summary
|
## Summary
|
||||||
|
|
||||||
|
@ -427,10 +403,10 @@ For converting values like `12pt` and `100px` to a number:
|
||||||
For fractions:
|
For fractions:
|
||||||
|
|
||||||
- Round using `Math.floor`, `Math.ceil`, `Math.trunc`, `Math.round` or `num.toFixed(precision)`.
|
- Round using `Math.floor`, `Math.ceil`, `Math.trunc`, `Math.round` or `num.toFixed(precision)`.
|
||||||
- Remember about the loss of precision when comparing or doing maths.
|
- Remember about the loss of precision when working with fractions.
|
||||||
|
|
||||||
Mathematical functions:
|
More mathematical functions:
|
||||||
|
|
||||||
- See the [Math](https://developer.mozilla.org/en/docs/Web/JavaScript/Reference/Global_Objects/Math) manual when you need them. The library is very small, but can cover basic needs.
|
- See the [Math](https://developer.mozilla.org/en/docs/Web/JavaScript/Reference/Global_Objects/Math) object when you need them. The library is very small, but can cover basic needs.
|
||||||
|
|
||||||
|
|
||||||
|
|
|
@ -47,19 +47,7 @@ let guestList = "Guests: // Error: Unexpected token ILLEGAL
|
||||||
* John";
|
* John";
|
||||||
```
|
```
|
||||||
|
|
||||||
That's because they come from ancient times of language creation, and the need for multiline strings was not taken into account. Backticks appeared much later.
|
Single and double quotes come from ancient times of language creation, and the need for multiline strings was not taken into account. Backticks appeared much later and thus are more versatile.
|
||||||
|
|
||||||
````smart header="Template function"
|
|
||||||
The advanced feature of backticks is the ability to specify a "template function" at the beginning that would get the string and it's `${…}` components and can convert them.
|
|
||||||
|
|
||||||
The syntax is:
|
|
||||||
```js
|
|
||||||
function f(...) { /* the function to postprocess he string */ }
|
|
||||||
|
|
||||||
let str = f`my string``;
|
|
||||||
```
|
|
||||||
We'll get back to this advanced stuff later, because it's rarely used and we won't need it any time soon.
|
|
||||||
````
|
|
||||||
|
|
||||||
## Special characters
|
## Special characters
|
||||||
|
|
||||||
|
@ -68,14 +56,15 @@ It is still possible to create multiline strings with single quotes, using a so-
|
||||||
```js run
|
```js run
|
||||||
let guestList = "Guests:\n * John\n * Pete\n * Mary";
|
let guestList = "Guests:\n * John\n * Pete\n * Mary";
|
||||||
|
|
||||||
alert(guestList); // a list of guests, multiple lines, same as with backticks above
|
alert(guestList); // a multiline list of guests
|
||||||
```
|
```
|
||||||
|
|
||||||
So to speak, these two lines describe the same:
|
So to speak, these two lines describe the same:
|
||||||
|
|
||||||
```js run
|
```js run
|
||||||
alert( "Hello\nWorld" ); // two lines, just like below
|
alert( "Hello\nWorld" ); // two lines using a "newline symbol"
|
||||||
|
|
||||||
|
// two lines using a normal newline and backticks
|
||||||
alert( `Hello
|
alert( `Hello
|
||||||
World` );
|
World` );
|
||||||
```
|
```
|
||||||
|
@ -92,16 +81,17 @@ There are other, less common "special" characters as well, here's the list:
|
||||||
|`\uNNNN`|A unicode symbol with the hex code `NNNN`, for instance `\u00A9` -- is a unicode for the copyright symbol `©`. Must be exactly 4 hex digits. |
|
|`\uNNNN`|A unicode symbol with the hex code `NNNN`, for instance `\u00A9` -- is a unicode for the copyright symbol `©`. Must be exactly 4 hex digits. |
|
||||||
|`\u{NNNNNNNN}`|Some rare characters are encoded with two unicode symbols, taking up to 4 bytes. The long unicode requires braces around.|
|
|`\u{NNNNNNNN}`|Some rare characters are encoded with two unicode symbols, taking up to 4 bytes. The long unicode requires braces around.|
|
||||||
|
|
||||||
For example:
|
Examples with unicode:
|
||||||
|
|
||||||
```js run
|
```js run
|
||||||
alert( "\u00A9" ); // ©
|
alert( "\u00A9" ); // ©
|
||||||
alert( "\u{20331}" ); // 𠌱, a rare chinese hieroglyph
|
alert( "\u{20331}" ); // 𠌱, a rare chinese hieroglyph (long unicode)
|
||||||
|
alert( "\u{1F60D}"); // a smiling face sumbol (another long unicode)
|
||||||
```
|
```
|
||||||
|
|
||||||
As we can see, all special characters start with a backslash character `\`. It is also called an "escaping character".
|
All special characters start with a backslash character `\`. It is also called an "escaping character".
|
||||||
|
|
||||||
Another use of it is an insertion of the enclosing quote into the string.
|
We should also use it if we want to insert the quote into the string.
|
||||||
|
|
||||||
For instance:
|
For instance:
|
||||||
|
|
||||||
|
@ -111,15 +101,13 @@ alert( 'I*!*\'*/!*m the Walrus!' ); // *!*I'm*/!* the Walrus!
|
||||||
|
|
||||||
See, we have to prepend the inner quote by the backslash `\'`, because otherwise it would mean the string end.
|
See, we have to prepend the inner quote by the backslash `\'`, because otherwise it would mean the string end.
|
||||||
|
|
||||||
As a more elegant solution, we could wrap the string in double quotes or backticks instead:
|
Of course, that refers only for the quotes that are same as the enclosing ones. So, as a more elegant solution, we could switch to double quotes or backticks instead:
|
||||||
|
|
||||||
```js run
|
```js run
|
||||||
alert( `I'm the Walrus!` ); // I'm the Walrus!
|
alert( `I'm the Walrus!` ); // I'm the Walrus!
|
||||||
```
|
```
|
||||||
|
|
||||||
Most of time when we know we're going to use this or that kind of quotes inside of the string, we can choose non-conflicting quotes to enclose it.
|
Note that the backslash `\` serves for the correct reading of the string by Javascript, then disappears. The in-memory string has no `\`. You can clearly see that in `alert` from the examples above.
|
||||||
|
|
||||||
Note that the backslash `\` serves for the correct reading of the string by JavaScript, then disappears. The in-memory string has no `\`. You can clearly see that in `alert` from the examples above.
|
|
||||||
|
|
||||||
But what if we need exactly a backslash `\` in the string?
|
But what if we need exactly a backslash `\` in the string?
|
||||||
|
|
||||||
|
@ -129,46 +117,50 @@ That's possible, but we need to double it like `\\`:
|
||||||
alert( `The backslash: \\` ); // The backslash: \
|
alert( `The backslash: \\` ); // The backslash: \
|
||||||
```
|
```
|
||||||
|
|
||||||
## The length and characters
|
## String length
|
||||||
|
|
||||||
- The `length` property has the string length:
|
|
||||||
|
|
||||||
```js run
|
The `length` property has the string length:
|
||||||
alert( `My\n`.length ); // 3
|
|
||||||
```
|
|
||||||
|
|
||||||
Note that `\n` is a single "special" character, so the length is indeed `3`.
|
```js run
|
||||||
|
alert( `My\n`.length ); // 3
|
||||||
|
```
|
||||||
|
|
||||||
- To get a character, use square brackets `[position]` or the method [str.charAt(position)](mdn:js/String/charAt). The first character starts from the zero position:
|
Note that `\n` is a single "special" character, so the length is indeed `3`.
|
||||||
|
|
||||||
```js run
|
|
||||||
let str = `Hello`;
|
|
||||||
|
|
||||||
// the first character
|
|
||||||
alert( str[0] ); // H
|
|
||||||
alert( str.charAt(0) ); // H
|
|
||||||
|
|
||||||
// the last character
|
|
||||||
alert( str[str.length - 1] ); // o
|
|
||||||
```
|
|
||||||
|
|
||||||
The square brackets is a modern way of getting a character, while `charAt` exists mostly for historical reasons.
|
|
||||||
|
|
||||||
The only difference between them is that if no character found, `[]` returns `undefined`, and `charAt` returns an empty string:
|
|
||||||
|
|
||||||
```js run
|
|
||||||
let str = `Hello`;
|
|
||||||
|
|
||||||
alert( str[1000] ); // undefined
|
|
||||||
alert( str.charAt(1000) ); // '' (an empty string)
|
|
||||||
```
|
|
||||||
|
|
||||||
```warn header="`length` is a property"
|
```warn header="`length` is a property"
|
||||||
Please note that `str.length` is a numeric property, not a function.
|
People with background in some other languages sometimes mistype by calling `str.length()` instead of just `str.length`. That doesn't work.
|
||||||
|
|
||||||
There is no need to add brackets after it. The call `str.length()` won't work, must use bare `str.length`.
|
Please note that `str.length` is a numeric property, not a function. There is no need to add brackets after it.
|
||||||
```
|
```
|
||||||
|
|
||||||
|
## Accessing characters
|
||||||
|
|
||||||
|
To get a character at position `pos`, use square brackets `[pos]` or call the method [str.charAt(pos)](mdn:js/String/charAt). The first character starts from the zero position:
|
||||||
|
|
||||||
|
```js run
|
||||||
|
let str = `Hello`;
|
||||||
|
|
||||||
|
// the first character
|
||||||
|
alert( str[0] ); // H
|
||||||
|
alert( str.charAt(0) ); // H
|
||||||
|
|
||||||
|
// the last character
|
||||||
|
alert( str[str.length - 1] ); // o
|
||||||
|
```
|
||||||
|
|
||||||
|
The square brackets is a modern way of getting a character, while `charAt` exists mostly for historical reasons.
|
||||||
|
|
||||||
|
The only difference between them is that if no character found, `[]` returns `undefined`, and `charAt` returns an empty string:
|
||||||
|
|
||||||
|
```js run
|
||||||
|
let str = `Hello`;
|
||||||
|
|
||||||
|
alert( str[1000] ); // undefined
|
||||||
|
alert( str.charAt(1000) ); // '' (an empty string)
|
||||||
|
```
|
||||||
|
|
||||||
|
|
||||||
## Strings are immutable
|
## Strings are immutable
|
||||||
|
|
||||||
Strings can't be changed in JavaScript. It is impossible to change a character.
|
Strings can't be changed in JavaScript. It is impossible to change a character.
|
||||||
|
@ -211,7 +203,7 @@ Or, if we want a single character lowercased:
|
||||||
alert( 'Interface'[0].toLowerCase() ); // 'i'
|
alert( 'Interface'[0].toLowerCase() ); // 'i'
|
||||||
```
|
```
|
||||||
|
|
||||||
## Finding substrings
|
## Searching for a substring
|
||||||
|
|
||||||
There are multiple ways to look for a substring in a string.
|
There are multiple ways to look for a substring in a string.
|
||||||
|
|
||||||
|
@ -281,17 +273,17 @@ There is also a similar method [str.lastIndexOf(pos)](mdn:js/String/lastIndexOf)
|
||||||
It would list the occurences in the reverse way.
|
It would list the occurences in the reverse way.
|
||||||
```
|
```
|
||||||
|
|
||||||
The inconvenience with `indexOf` is that we can't put it "as is" into an `if` check:
|
There is a slight inconvenience with `indexOf` in the `if` test. We can't put it in the `if` like this:
|
||||||
|
|
||||||
```js run
|
```js run
|
||||||
let str = "Widget with id";
|
let str = "Widget with id";
|
||||||
|
|
||||||
if (str.indexOf("Widget")) {
|
if (str.indexOf("Widget")) {
|
||||||
alert("We found it"); // won't work
|
alert("We found it"); // doesn't work!
|
||||||
}
|
}
|
||||||
```
|
```
|
||||||
|
|
||||||
That's because `str.indexOf("Widget")` returns `0` (found at the starting position). Right, but `if` considers that `false`.
|
The `alert` in the example above doesn't show, because `str.indexOf("Widget")` returns `0` (meaning that it found the match at the starting position). Right, but `if` considers that to be `false`.
|
||||||
|
|
||||||
So, we should actualy check for `-1`, like that:
|
So, we should actualy check for `-1`, like that:
|
||||||
|
|
||||||
|
@ -306,21 +298,24 @@ if (str.indexOf("Widget") != -1) {
|
||||||
```
|
```
|
||||||
|
|
||||||
````smart header="The bitwise NOT trick"
|
````smart header="The bitwise NOT trick"
|
||||||
One of the old tricks used here is the [bitwise NOT](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Operators/Bitwise_Operators#Bitwise_NOT) `~` operator. For 32-bit integers the call `~n` is the same as `-(n+1)`.
|
One of the old tricks used here is the [bitwise NOT](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Operators/Bitwise_Operators#Bitwise_NOT) `~` operator. It converts the number to 32-bit integer (removes the decimal part if exists) and then reverses all bits in its binary representation.
|
||||||
|
|
||||||
|
For 32-bit integers the call `~n` means exactly the same as `-(n+1)` (due to IEEE-754 format).
|
||||||
|
|
||||||
For instance:
|
For instance:
|
||||||
|
|
||||||
```js run
|
```js run
|
||||||
alert( ~2 ); // -(2+1) = -3
|
alert( ~2 ); // -3, the same as -(2+1)
|
||||||
alert( ~1 ); // -(1+1) = -2
|
alert( ~1 ); // -2, the same as -(1+1)
|
||||||
alert( ~0 ); // -(0+1) = -1
|
alert( ~0 ); // -1, the same as -(0+1)
|
||||||
*!*
|
*!*
|
||||||
alert( ~-1 ); // -(-1+1) = 0
|
alert( ~-1 ); // 0, the same as -(-1+1)
|
||||||
*/!*
|
*/!*
|
||||||
```
|
```
|
||||||
|
|
||||||
As we can see, `~n` is zero only if `n == -1`.
|
As we can see, `~n` is zero only if `n == -1`.
|
||||||
|
|
||||||
So, `if ( ~str.indexOf("...") )` means that the `indexOf` result is different from `-1`.
|
So, the test `if ( ~str.indexOf("...") )` is truthy that the result of `indexOf` is not `-1`. In other words, when there is a match.
|
||||||
|
|
||||||
People use it to shorten `indexOf` checks:
|
People use it to shorten `indexOf` checks:
|
||||||
|
|
||||||
|
@ -332,7 +327,7 @@ if (~str.indexOf("Widget")) {
|
||||||
}
|
}
|
||||||
```
|
```
|
||||||
|
|
||||||
It is usually not recommended to use language features in a non-obvious way, but this particular trick is widely used, generally JavaScript programmers understand it.
|
It is usually not recommended to use language features in a non-obvious way, but this particular trick is widely used in the old code, so we should understand it.
|
||||||
|
|
||||||
Just remember: `if (~str.indexOf(...))` reads as "if found".
|
Just remember: `if (~str.indexOf(...))` reads as "if found".
|
||||||
````
|
````
|
||||||
|
@ -341,7 +336,7 @@ Just remember: `if (~str.indexOf(...))` reads as "if found".
|
||||||
|
|
||||||
The more modern method [str.includes(substr)](mdn:js/String/includes) returns `true/false` depending on whether `str` has `substr` as its part.
|
The more modern method [str.includes(substr)](mdn:js/String/includes) returns `true/false` depending on whether `str` has `substr` as its part.
|
||||||
|
|
||||||
That's usually a simpler way to go if we don't need the exact position:
|
It's the right choice if we need to test for the match, without the position:
|
||||||
|
|
||||||
```js run
|
```js run
|
||||||
alert( "Widget with id".includes("Widget") ); // true
|
alert( "Widget with id".includes("Widget") ); // true
|
||||||
|
@ -349,11 +344,11 @@ alert( "Widget with id".includes("Widget") ); // true
|
||||||
alert( "Hello".includes("Bye") ); // false
|
alert( "Hello".includes("Bye") ); // false
|
||||||
```
|
```
|
||||||
|
|
||||||
The methods [str.startsWith](mdn:js/String/startsWith) and [str.endsWith](mdn:js/String/endsWith) do exactly what they promise:
|
The methods [str.startsWith](mdn:js/String/startsWith) and [str.endsWith](mdn:js/String/endsWith) do exactly what they say:
|
||||||
|
|
||||||
```js run
|
```js run
|
||||||
alert( "Widget".startsWith("Wid") ); // true, "Widget" starts with "Wid"
|
alert( "Widget".startsWith("Wid") ); // true, "Widget" starts with "Wid"
|
||||||
alert( "Widget".endsWith("get") ); // true, "Widget" ends with "get"
|
alert( "Widget".endsWith("get") ); // true, "Widget" ends with "get"
|
||||||
```
|
```
|
||||||
|
|
||||||
|
|
||||||
|
@ -362,17 +357,17 @@ alert( "Widget".endsWith("get") ); // true, "Widget" ends with "get"
|
||||||
There are 3 methods in JavaScript to get a substring: `substring`, `substr` and `slice`.
|
There are 3 methods in JavaScript to get a substring: `substring`, `substr` and `slice`.
|
||||||
|
|
||||||
`str.slice(start [, end])`
|
`str.slice(start [, end])`
|
||||||
: Returns the part of the string from `start` to, but not including, `end`.
|
: Returns the part of the string from `start` to (but not including) `end`.
|
||||||
|
|
||||||
For instance:
|
For instance:
|
||||||
|
|
||||||
```js run
|
```js run
|
||||||
let str = "stringify";
|
let str = "stringify";
|
||||||
alert( str.slice(0,5) ); // 'string', the substring from 0, but not including 5
|
alert( str.slice(0,5) ); // 'string', the substring from 0 to 5 (not including 5)
|
||||||
alert( str.slice(0,1) ); // 's', the substring from 0, but not including 1
|
alert( str.slice(0,1) ); // 's', from 0 to 1, but not including 1, so only character at 0
|
||||||
```
|
```
|
||||||
|
|
||||||
If there is no `end` argument, then `slice` goes till the end of the string:
|
If there is no second argument, then `slice` goes till the end of the string:
|
||||||
|
|
||||||
```js run
|
```js run
|
||||||
let str = "st*!*ringify*/!*";
|
let str = "st*!*ringify*/!*";
|
||||||
|
@ -392,22 +387,25 @@ There are 3 methods in JavaScript to get a substring: `substring`, `substr` and
|
||||||
`str.substring(start [, end])`
|
`str.substring(start [, end])`
|
||||||
: Returns the part of the string *between* `start` and `end`.
|
: Returns the part of the string *between* `start` and `end`.
|
||||||
|
|
||||||
Almost the same as `slice`, but allows `start` greater than `end`. For instance:
|
Almost the same as `slice`, but allows `start` to be greater than `end`.
|
||||||
|
|
||||||
|
For instance:
|
||||||
|
|
||||||
|
|
||||||
```js run
|
```js run
|
||||||
let str = "st*!*ring*/!*ify";
|
let str = "st*!*ring*/!*ify";
|
||||||
|
|
||||||
|
// these are same for substring
|
||||||
alert( str.substring(2, 6) ); // "ring"
|
alert( str.substring(2, 6) ); // "ring"
|
||||||
alert( str.substring(6, 2) ); // "ring"
|
alert( str.substring(6, 2) ); // "ring"
|
||||||
|
|
||||||
// compare with slice:
|
// ...but not for slice:
|
||||||
alert( str.slice(2, 6) ); // "ring" (the same)
|
alert( str.slice(2, 6) ); // "ring" (the same)
|
||||||
alert( str.slice(6, 2) ); // "" (an empty string)
|
alert( str.slice(6, 2) ); // "" (an empty string)
|
||||||
|
|
||||||
```
|
```
|
||||||
|
|
||||||
Negative arguments are treated as `0`.
|
Negative arguments are (unlike slice) not supported, they are treated as `0`.
|
||||||
|
|
||||||
|
|
||||||
`str.substr(start [, length])`
|
`str.substr(start [, length])`
|
||||||
|
@ -437,7 +435,7 @@ Let's recap the methods to avoid any confusion:
|
||||||
|
|
||||||
|
|
||||||
```smart header="Which one to choose?"
|
```smart header="Which one to choose?"
|
||||||
All of them can do the job. The author of this chapter finds himself using `slice` almost all the time.
|
All of them can do the job. The author finds himself using `slice` almost all the time.
|
||||||
```
|
```
|
||||||
|
|
||||||
## Comparing strings
|
## Comparing strings
|
||||||
|
@ -452,15 +450,15 @@ Although, there are some oddities.
|
||||||
alert( 'a' > 'Z' ); // true
|
alert( 'a' > 'Z' ); // true
|
||||||
```
|
```
|
||||||
|
|
||||||
2. Letters with diacritical marks are "out of the alphabet":
|
2. Letters with diacritical marks are "out of order":
|
||||||
|
|
||||||
```js run
|
```js run
|
||||||
alert( 'Österreich' > 'Zealand' ); // true
|
alert( 'Österreich' > 'Zealand' ); // true
|
||||||
```
|
```
|
||||||
|
|
||||||
That may give strange results if we sort country names. Usually people would await for `Zealand` to be after `Österreich` in the list.
|
That may lead to strange results if we sort country names. Usually people would await for `Zealand` to be after `Österreich` in the list.
|
||||||
|
|
||||||
To understand the reasoning behind that, let's review the internal representaion of strings in JavaScript.
|
To understand what happens, let's review the internal representaion of strings in JavaScript.
|
||||||
|
|
||||||
All strings are encoded using [UTF-16](https://en.wikipedia.org/wiki/UTF-16). That is: each character has a corresponding numeric code. There are special methods that allow to get the character for the code and back.
|
All strings are encoded using [UTF-16](https://en.wikipedia.org/wiki/UTF-16). That is: each character has a corresponding numeric code. There are special methods that allow to get the character for the code and back.
|
||||||
|
|
||||||
|
@ -487,7 +485,7 @@ All strings are encoded using [UTF-16](https://en.wikipedia.org/wiki/UTF-16). Th
|
||||||
alert( '\u005a' ); // Z
|
alert( '\u005a' ); // Z
|
||||||
```
|
```
|
||||||
|
|
||||||
Now let's make the string from the characters with codes `65..220` (the latin alphabet and a little bit extra):
|
Now let's see the characters with codes `65..220` (the latin alphabet and a little bit extra) by making a string of them:
|
||||||
|
|
||||||
```js run
|
```js run
|
||||||
let str = '';
|
let str = '';
|
||||||
|
@ -500,25 +498,27 @@ alert( str );
|
||||||
// ¡¢£¤¥¦§¨©ª«¬®¯°±²³´µ¶·¸¹º»¼½¾¿ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖרÙÚÛÜ
|
// ¡¢£¤¥¦§¨©ª«¬®¯°±²³´µ¶·¸¹º»¼½¾¿ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖרÙÚÛÜ
|
||||||
```
|
```
|
||||||
|
|
||||||
|
See? Capital character go first, then few special ones, then lowercase characters.
|
||||||
|
|
||||||
Now it becomes obvious why `a > Z`.
|
Now it becomes obvious why `a > Z`.
|
||||||
|
|
||||||
The characters are compared by their numeric code. The greater code means that the character is greater.
|
The characters are compared by their numeric code. The greater code means that the character is greater. The code for `a` (97) is greater than the code for `Z` (90).
|
||||||
|
|
||||||
And we can easily see that:
|
- All lowercase letters go after uppercase letters, their codes are greater.
|
||||||
|
- Some letters like `Ö` stand apart from the main alphabet. Here, it's code is greater than anything from `a` to `z`.
|
||||||
1. Lowercase letters go after uppercase letters, their codes are greater.
|
|
||||||
2. Some letters like `Ö` stand apart from the main alphabet. Here, it's code is greater than anything from `a` to `z`.
|
|
||||||
|
|
||||||
|
|
||||||
### The correct way
|
### Correct comparisons
|
||||||
|
|
||||||
The "right" comparisons are more complex than it may seem. Because the alphabets are different for different languages. The same letter may be located differently in different alphabets.
|
The "right" algorithm to do string comparisons is more complex than it may seem. Because the alphabets are different for different languages. So the same letter may be located differently in different alphabets, that is -- even if it looks the same, different alphabets put it in different place.
|
||||||
|
|
||||||
|
So, the browser needs to know the language to compare.
|
||||||
|
|
||||||
Luckily, all modern browsers (IE10- requires the additional library [Intl.JS](https://github.com/andyearnshaw/Intl.js/)) support the internationalization standard [ECMA 402](http://www.ecma-international.org/ecma-402/1.0/ECMA-402.pdf).
|
Luckily, all modern browsers (IE10- requires the additional library [Intl.JS](https://github.com/andyearnshaw/Intl.js/)) support the internationalization standard [ECMA 402](http://www.ecma-international.org/ecma-402/1.0/ECMA-402.pdf).
|
||||||
|
|
||||||
It provides a special method to compare strings in different languages, following their rules.
|
It provides a special method to compare strings in different languages, following their rules.
|
||||||
|
|
||||||
[str.localeCompare(str2)](mdn:js/String/localeCompare):
|
The call [str.localeCompare(str2)](mdn:js/String/localeCompare):
|
||||||
|
|
||||||
- Returns `1` if `str` is greater than `str2` according to the language rules.
|
- Returns `1` if `str` is greater than `str2` according to the language rules.
|
||||||
- Returns `-1` if `str` is less than `str2`.
|
- Returns `-1` if `str` is less than `str2`.
|
||||||
|
@ -530,14 +530,14 @@ For instance:
|
||||||
alert( 'Österreich'.localeCompare('Zealand') ); // -1
|
alert( 'Österreich'.localeCompare('Zealand') ); // -1
|
||||||
```
|
```
|
||||||
|
|
||||||
The method actually has two additional arguments, allowing to specify the language (by default taken from the environment) and setup additional rules like case sensivity or should `a` and `á` be treated as the same etc. See the manual for details when you need them.
|
The method actually has two additional arguments specified in [the documentation](mdn:js/String/localeCompare), that allow to specify the language (by default taken from the environment) and setup additional rules like case sensivity or should `a` and `á` be treated as the same etc.
|
||||||
|
|
||||||
## Encoding
|
## Internal encoding
|
||||||
|
|
||||||
```warn header="Advanced knowledge"
|
```warn header="Advanced knowledge"
|
||||||
The section goes deeper into string internals. The knowledge will be useful for you if you plan to deal with emoji, rare math of hieroglyphs characters and such.
|
The section goes deeper into string internals. The knowledge will be useful for you if you plan to deal with emoji, rare mathematical of hieroglyphs characters or other rare symbols.
|
||||||
|
|
||||||
You can skip the section if all you need is common letters and digits.
|
You can skip the section if you don't plan to support them.
|
||||||
```
|
```
|
||||||
|
|
||||||
### Surrogate pairs
|
### Surrogate pairs
|
||||||
|
@ -546,7 +546,7 @@ Most symbols have a 2-byte code. Letters of most european languages, numbers, ev
|
||||||
|
|
||||||
But 2 bytes only allow 65536 combinations that's not enough for every possible symbol. So rare symbols are encoded with a pair of 2-byte characters called "a surrogate pair".
|
But 2 bytes only allow 65536 combinations that's not enough for every possible symbol. So rare symbols are encoded with a pair of 2-byte characters called "a surrogate pair".
|
||||||
|
|
||||||
Examples of symbols encoded this way:
|
The length of such symbols is `2`:
|
||||||
|
|
||||||
```js run
|
```js run
|
||||||
alert( '𝒳'.length ); // 2, MATHEMATICAL SCRIPT CAPITAL X
|
alert( '𝒳'.length ); // 2, MATHEMATICAL SCRIPT CAPITAL X
|
||||||
|
@ -554,38 +554,40 @@ alert( '😂'.length ); // 2, FACE WITH TEARS OF JOY
|
||||||
alert( '𩷶'.length ); // 2, a rare chinese hieroglyph
|
alert( '𩷶'.length ); // 2, a rare chinese hieroglyph
|
||||||
```
|
```
|
||||||
|
|
||||||
Note that surrogate pairs are incorrectly processed by the language most of the time. We actually have a single symbol in each of the strings above, but the `length` shows the length of `2`.
|
Note that surrogate pairs did not exist at the time when Javascript was created, and thus are not correctly processed by the language!
|
||||||
|
|
||||||
`String.fromCodePoint` and `str.codePointAt` are notable exceptions that deal with surrogate pairs right. They recently appeared in the language. Before them, there were only [String.fromCharCode](mdn:js/String/fromCharCode) and [str.charCodeAt](mdn:js/String/charCodeAt) that do the same, but don't work with surrogate pairs.
|
We actually have a single symbol in each of the strings above, but the `length` shows the length of `2`.
|
||||||
|
|
||||||
Getting a symbol can also be tricky, because most functions treat surrogate pairs as two characters:
|
`String.fromCodePoint` and `str.codePointAt` are notable exceptions that deal with surrogate pairs right. They recently appeared in the language. Before them, there were only [String.fromCharCode](mdn:js/String/fromCharCode) and [str.charCodeAt](mdn:js/String/charCodeAt). These methods are actually the same as `fromCodePoint/codePointAt`, but don't work with surrogate pairs.
|
||||||
|
|
||||||
|
But, for instance, getting a symbol can be tricky, because surrogate pairs are treated as two characters:
|
||||||
|
|
||||||
```js run
|
```js run
|
||||||
alert( '𩷶'[0] ); // some strange symbols
|
alert( '𩷶'[0] ); // some strange symbols
|
||||||
alert( '𝒳'[0] ); // pieces of the surrogate pair
|
alert( '𝒳'[0] ); // pieces of the surrogate pair
|
||||||
```
|
```
|
||||||
|
|
||||||
Note that pieces of the surrogate pair have no meaning without each other. So, the alerts actually display garbage.
|
Note that pieces of the surrogate pair have no meaning without each other. So, the alerts in the example above actually display garbage.
|
||||||
|
|
||||||
How to solve this problem? First, let's make sure you have it. Not every project deals with surrogate pairs.
|
How to solve this problem? First, let's make sure you have it. Not every project deals with surrogate pairs.
|
||||||
|
|
||||||
But if you do, then there are libraries in the net which implement surrogate-aware versions of `slice`, `indexOf` and other functions. Surrogate pairs are detectable by their codes: the first character has the code in the interval of `0xD800..0xDBFF`, while the second is in `0xDC00..0xDFFF`. So if we see a character with the code, say, `0xD801`, then the next one must be the second part of the surrogate pair.
|
But if you do, then search the internet for libraries which implement surrogate-aware versions of `slice`, `indexOf` and other functions. Technically, surrogate pairs are detectable by their codes: the first character has the code in the interval of `0xD800..0xDBFF`, while the second is in `0xDC00..0xDFFF`. So if we see a character with the code, say, `0xD801`, then the next one must be the second part of the surrogate pair. Libraries rely on that to split stirngs right. Unfortunately, there's no single well-known library to advise yet.
|
||||||
|
|
||||||
### Diacritical marks
|
### Diacritical marks
|
||||||
|
|
||||||
In many languages there are symbols that are composed of the base character and a mark above/under it.
|
In many languages there are symbols that are composed of the base character and a mark above/under it.
|
||||||
|
|
||||||
For instance, letter `a` can be the base character for: `àáâäãåā`. Most common "composite" character have their own code in the UTF-16 table. But not all of them.
|
For instance, letter `a` can be the base character for: `àáâäãåā`. Most common "composite" character have their own code in the UTF-16 table. But not all of them, because there are too many possible combinations.
|
||||||
|
|
||||||
To generate arbitrary compositions, several unicode characters are used: the base character and one or many "mark" characters.
|
To support arbitrary compositions, UTF-16 allows to use several unicode characters. The base character and one or many "mark" characters that "decorate" it.
|
||||||
|
|
||||||
For instance, if we have `S` followed by "dot above" character (code `\u0307`), it is shown as Ṡ.
|
For instance, if we have `S` followed by the special "dot above" character (code `\u0307`), it is shown as Ṡ.
|
||||||
|
|
||||||
```js run
|
```js run
|
||||||
alert( 'S\u0307' ); // Ṡ
|
alert( 'S\u0307' ); // Ṡ
|
||||||
```
|
```
|
||||||
|
|
||||||
If we need a one more mark over the letter (or below it) -- no problems, just add the necessary mark character.
|
If we need a one more mark over the letter (or below it) -- no problem, just add the necessary mark character.
|
||||||
|
|
||||||
For instance, if we append a character "dot below" (code `\u0323`), then we'll have "S with dots above and below": `Ṩ`.
|
For instance, if we append a character "dot below" (code `\u0323`), then we'll have "S with dots above and below": `Ṩ`.
|
||||||
|
|
||||||
|
@ -622,9 +624,9 @@ alert( "S\u0307\u0323".normalize().length ); // 1
|
||||||
alert( "S\u0307\u0323".normalize() == "\u1e68" ); // true
|
alert( "S\u0307\u0323".normalize() == "\u1e68" ); // true
|
||||||
```
|
```
|
||||||
|
|
||||||
In real, that is not always so, but the symbol `Ṩ` was considered "common enough" by UTF-16 creators to include it into the main table.
|
In real, that is not always so. It's just the symbol `Ṩ` is "common enough" so that UTF-16 creators included it into the main table and gave it the code.
|
||||||
|
|
||||||
For most practical tasks that information is enough, but if you want to learn more about normalization rules and variants -- they are described in the appendix to the Unicode standard: [Unicode Normalization Forms](http://www.unicode.org/reports/tr15/).
|
If you want to learn more about normalization rules and variants -- they are described in the appendix to the Unicode standard: [Unicode Normalization Forms](http://www.unicode.org/reports/tr15/), but for most practical reasons the information from this section is enough.
|
||||||
|
|
||||||
|
|
||||||
## Summary
|
## Summary
|
||||||
|
|