Minor grammer/spelling edit

Double checked, should be good. Just pay attention to line 296 as I'm not 100% sure it's the correct edit in this context.
This commit is contained in:
gwooly 2017-09-01 15:49:34 +01:00 committed by GitHub
parent 21d71c03d4
commit 3daed31965

View file

@ -8,9 +8,9 @@ The internal format for strings is always [UTF-16](https://en.wikipedia.org/wiki
## Quotes
Let's remember the kinds of quotes.
Let's recall the kinds of quotes.
Strings can be enclosed either with the single, double quotes or in backticks:
Strings can be enclosed within either single quotes, double quotes or backticks:
```js
let single = 'single-quoted';
@ -19,7 +19,7 @@ let double = "double-quoted";
let backticks = `backticks`;
```
Single and double quotes are essentially the same. Backticks allow to embed any expression into the string, including function calls:
Single and double quotes are essentially the same. Backticks however allow us to embed any expression into the string, including function calls:
```js run
function sum(a, b) {
@ -41,20 +41,20 @@ let guestList = `Guests:
alert(guestList); // a list of guests, multiple lines
```
If we try to use single or double quotes the same way, there will be an error:
If we try to use single or double quotes in the same way, there will be an error:
```js run
let guestList = "Guests: // Error: Unexpected token ILLEGAL
* John";
```
Single and double quotes come from ancient times of language creation, and the need for multiline strings was not taken into account. Backticks appeared much later and thus are more versatile.
Single and double quotes come from ancient times of language creation when the need for multiline strings was not taken into account. Backticks appeared much later and thus are more versatile.
Backticks also allow to specify a "template function" before the first backtick, the syntax is: <code>func&#96;string&#96;</code>. The function `func` is called automatically, receives the string and embedded expressions and can process them. You can read more in the [docs](mdn:JavaScript/Reference/Template_literals#Tagged_template_literals). That is called "tagged templates". This feature makes it easier to wrap strings into custom templating or other functionality, but is rarely used.
Backticks also allow us to specify a "template function" before the first backtick. The syntax is: <code>func&#96;string&#96;</code>. The function `func` is called automatically, receives the string and embedded expressions and can process them. You can read more about it in the [docs](mdn:JavaScript/Reference/Template_literals#Tagged_template_literals). This is called "tagged templates". This feature makes it easier to wrap strings into custom templating or other functionality, but it is rarely used.
## Special characters
It is still possible to create multiline strings with single quotes, using a so-called "newline character" written as `\n`, that denotes a line break:
It is still possible to create multiline strings with single quotes by using a so-called "newline character", written as `\n`, which denotes a line break:
```js run
let guestList = "Guests:\n * John\n * Pete\n * Mary";
@ -62,7 +62,7 @@ let guestList = "Guests:\n * John\n * Pete\n * Mary";
alert(guestList); // a multiline list of guests
```
So to speak, these two lines describe the same:
For example, these two lines describe the same:
```js run
alert( "Hello\nWorld" ); // two lines using a "newline symbol"
@ -72,7 +72,7 @@ alert( `Hello
World` );
```
There are other, less common "special" characters as well, here's the list:
There are other, less common "special" characters as well. Here's the list:
| Character | Description |
|-----------|-------------|
@ -81,8 +81,8 @@ There are other, less common "special" characters as well, here's the list:
|`\n`|New line|
|`\r`|Carriage return|
|`\t`|Tab|
|`\uNNNN`|A unicode symbol with the hex code `NNNN`, for instance `\u00A9` -- is a unicode for the copyright symbol `©`. Must be exactly 4 hex digits. |
|`\u{NNNNNNNN}`|Some rare characters are encoded with two unicode symbols, taking up to 4 bytes. The long unicode requires braces around.|
|`\uNNNN`|A unicode symbol with the hex code `NNNN`, for instance `\u00A9` -- is a unicode for the copyright symbol `©`. It must be exactly 4 hex digits. |
|`\u{NNNNNNNN}`|Some rare characters are encoded with two unicode symbols, taking up to 4 bytes. This long unicode requires braces around it.|
Examples with unicode:
@ -92,9 +92,9 @@ alert( "\u{20331}" ); // 𠌱, a rare chinese hieroglyph (long unicode)
alert( "\u{1F60D}"); // a smiling face sumbol (another long unicode)
```
All special characters start with a backslash character `\`. It is also called an "escaping character".
All special characters start with a backslash character `\`. It is also called an "escape character".
We should also use it if we want to insert the quote into the string.
We would also use it if we want to insert a quote into the string.
For instance:
@ -102,9 +102,9 @@ For instance:
alert( 'I*!*\'*/!*m the Walrus!' ); // *!*I'm*/!* the Walrus!
```
See, we have to prepend the inner quote by the backslash `\'`, because otherwise it would mean the string end.
As you can see, we have to prepend the inner quote by the backslash `\'`, because otherwise it would indicate the string end.
Of course, that refers only for the quotes that are same as the enclosing ones. So, as a more elegant solution, we could switch to double quotes or backticks instead:
Of course, that refers only to the quotes that are same as the enclosing ones. So, as a more elegant solution, we could switch to double quotes or backticks instead:
```js run
alert( `I'm the Walrus!` ); // I'm the Walrus!
@ -112,7 +112,7 @@ alert( `I'm the Walrus!` ); // I'm the Walrus!
Note that the backslash `\` serves for the correct reading of the string by JavaScript, then disappears. The in-memory string has no `\`. You can clearly see that in `alert` from the examples above.
But what if we need exactly a backslash `\` in the string?
But what if we need to show an actual backslash `\` within the string?
That's possible, but we need to double it like `\\`:
@ -132,7 +132,7 @@ alert( `My\n`.length ); // 3
Note that `\n` is a single "special" character, so the length is indeed `3`.
```warn header="`length` is a property"
People with background in some other languages sometimes mistype by calling `str.length()` instead of just `str.length`. That doesn't work.
People with a background in some other languages sometimes mistype by calling `str.length()` instead of just `str.length`. That doesn't work.
Please note that `str.length` is a numeric property, not a function. There is no need to add brackets after it.
```
@ -152,9 +152,9 @@ alert( str.charAt(0) ); // H
alert( str[str.length - 1] ); // o
```
The square brackets is a modern way of getting a character, while `charAt` exists mostly for historical reasons.
The square brackets are a modern way of getting a character, while `charAt` exists mostly for historical reasons.
The only difference between them is that if no character found, `[]` returns `undefined`, and `charAt` returns an empty string:
The only difference between them is that if no character is found, `[]` returns `undefined`, and `charAt` returns an empty string:
```js run
let str = `Hello`;
@ -163,7 +163,7 @@ alert( str[1000] ); // undefined
alert( str.charAt(1000) ); // '' (an empty string)
```
Also we can iterate over characters using `for..of`:
We can also iterate over characters using `for..of`:
```js run
for(let char of "Hello") {
@ -175,7 +175,7 @@ for(let char of "Hello") {
Strings can't be changed in JavaScript. It is impossible to change a character.
Let's try to see that it doesn't work:
Let's try it to show that it doesn't work:
```js run
let str = 'Hi';
@ -196,7 +196,7 @@ str = 'h' + str[1]; // replace the string
alert( str ); // hi
```
In the following sections we'll see more examples of that.
In the following sections we'll see more examples of this.
## Changing the case
@ -215,7 +215,7 @@ alert( 'Interface'[0].toLowerCase() ); // 'i'
## Searching for a substring
There are multiple ways to look for a substring in a string.
There are multiple ways to look for a substring within a string.
### str.indexOf
@ -234,9 +234,9 @@ alert( str.indexOf('widget') ); // -1, not found, the search is case-sensitive
alert( str.indexOf("id") ); // 1, "id" is found at the position 1 (..idget with id)
```
The optional second parameter allows to search starting from the given position.
The optional second parameter allows us to search starting from the given position.
For instance, the first occurence of `"id"` is at the position `1`. To look for the next occurence, let's start the search from the position `2`:
For instance, the first occurence of `"id"` is at position `1`. To look for the next occurence, let's start the search from position `2`:
```js run
let str = 'Widget with id';
@ -278,9 +278,9 @@ while ((pos = str.indexOf(target, pos + 1)) != -1) {
```
```smart header="`str.lastIndexOf(pos)`"
There is also a similar method [str.lastIndexOf(pos)](mdn:js/String/lastIndexOf) that searches from the end of the string to its beginning.
There is also a similar method [str.lastIndexOf(pos)](mdn:js/String/lastIndexOf) that searches from the end of a string to its beginning.
It would list the occurences in the reverse way.
It would list the occurences in the reverse order.
```
There is a slight inconvenience with `indexOf` in the `if` test. We can't put it in the `if` like this:
@ -293,9 +293,9 @@ if (str.indexOf("Widget")) {
}
```
The `alert` in the example above doesn't show, because `str.indexOf("Widget")` returns `0` (meaning that it found the match at the starting position). Right, but `if` considers that to be `false`.
The `alert` in the example above doesn't show because `str.indexOf("Widget")` returns `0` (meaning that it found the match at the starting position). Right, but `if` considers `0` to be `false`.
So, we should actualy check for `-1`, like that:
So, we should actually check for `-1`, like this:
```js run
let str = "Widget with id";
@ -308,7 +308,7 @@ if (str.indexOf("Widget") != -1) {
```
````smart header="The bitwise NOT trick"
One of the old tricks used here is the [bitwise NOT](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Operators/Bitwise_Operators#Bitwise_NOT) `~` operator. It converts the number to 32-bit integer (removes the decimal part if exists) and then reverses all bits in its binary representation.
One of the old tricks used here is the [bitwise NOT](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Operators/Bitwise_Operators#Bitwise_NOT) `~` operator. It converts the number to a 32-bit integer (removes the decimal part if exists) and then reverses all bits in its binary representation.
For 32-bit integers the call `~n` means exactly the same as `-(n+1)` (due to IEEE-754 format).
@ -344,7 +344,7 @@ Just remember: `if (~str.indexOf(...))` reads as "if found".
### includes, startsWith, endsWith
The more modern method [str.includes(substr, pos)](mdn:js/String/includes) returns `true/false` depending on whether `str` has `substr` as its part.
The more modern method [str.includes(substr, pos)](mdn:js/String/includes) returns `true/false` depending on whether `str` contains `substr` within.
It's the right choice if we need to test for the match, but don't need its position:
@ -403,7 +403,7 @@ There are 3 methods in JavaScript to get a substring: `substring`, `substr` and
`str.substring(start [, end])`
: Returns the part of the string *between* `start` and `end`.
Almost the same as `slice`, but allows `start` to be greater than `end`.
This is almost the same as `slice`, but it allows `start` to be greater than `end`.
For instance:
@ -427,7 +427,7 @@ There are 3 methods in JavaScript to get a substring: `substring`, `substr` and
`str.substr(start [, length])`
: Returns the part of the string from `start`, with the given `length`.
In contrast with the previous methods, this one allows to specify the `length` instead of the ending position:
In contrast with the previous methods, this one allows us to specify the `length` instead of the ending position:
```js run
let str = "st*!*ring*/!*ify";
@ -441,7 +441,7 @@ There are 3 methods in JavaScript to get a substring: `substring`, `substr` and
alert( str.substr(-4, 2) ); // gi, from the 4th position get 2 characters
```
Let's recap the methods to avoid any confusion:
Let's recap these methods to avoid any confusion:
| method | selects... | negatives |
|--------|-----------|-----------|
@ -458,7 +458,7 @@ The author finds himself using `slice` almost all the time.
## Comparing strings
As we know from the chapter <info:comparison>, strings are compared character-by-character, in the alphabet order.
As we know from the chapter <info:comparison>, strings are compared character-by-character in alphabetical order.
Although, there are some oddities.
@ -474,7 +474,7 @@ Although, there are some oddities.
alert( 'Österreich' > 'Zealand' ); // true
```
That may lead to strange results if we sort country names. Usually people would await for `Zealand` to be after `Österreich` in the list.
This may lead to strange results if we sort these country names. Usually people would expect `Zealand` to come after `Österreich` in the list.
To understand what happens, let's review the internal representation of strings in JavaScript.
@ -516,19 +516,19 @@ alert( str );
// ¡¢£¤¥¦§¨©ª«¬­®¯°±²³´µ¶·¸¹º»¼½¾¿ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖרÙÚÛÜ
```
See? Capital character go first, then few special ones, then lowercase characters.
See? Capital characters go first, then a few special ones, then lowercase characters.
Now it becomes obvious why `a > Z`.
The characters are compared by their numeric code. The greater code means that the character is greater. The code for `a` (97) is greater than the code for `Z` (90).
- All lowercase letters go after uppercase letters, their codes are greater.
- All lowercase letters go after uppercase letters because their codes are greater.
- Some letters like `Ö` stand apart from the main alphabet. Here, it's code is greater than anything from `a` to `z`.
### Correct comparisons
The "right" algorithm to do string comparisons is more complex than it may seem. Because alphabets are different for different languages. The same-looking letter may be located differently in different alphabets.
The "right" algorithm to do string comparisons is more complex than it may seem, because alphabets are different for different languages. The same-looking letter may be located differently in different alphabets.
So, the browser needs to know the language to compare.
@ -548,21 +548,21 @@ For instance:
alert( 'Österreich'.localeCompare('Zealand') ); // -1
```
The method actually has two additional arguments specified in [the documentation](mdn:js/String/localeCompare), that allow to specify the language (by default taken from the environment) and setup additional rules like case sensivity or should `"a"` and `"á"` be treated as the same etc.
This method actually has two additional arguments specified in [the documentation](mdn:js/String/localeCompare), which allow it to specify the language (by default taken from the environment) and setup additional rules like case sensivity or should `"a"` and `"á"` be treated as the same etc.
## Internals, Unicode
```warn header="Advanced knowledge"
The section goes deeper into string internals. The knowledge will be useful for you if you plan to deal with emoji, rare mathematical of hieroglyphs characters or other rare symbols.
The section goes deeper into string internals. This knowledge will be useful for you if you plan to deal with emoji, rare mathematical of hieroglyphs characters or other rare symbols.
You can skip the section if you don't plan to support them.
```
### Surrogate pairs
Most symbols have a 2-byte code. Letters of most european languages, numbers, even most hieroglyphs have a 2-byte representation.
Most symbols have a 2-byte code. Letters in most european languages, numbers, and even most hieroglyphs, have a 2-byte representation.
But 2 bytes only allow 65536 combinations that's not enough for every possible symbol. So rare symbols are encoded with a pair of 2-byte characters called "a surrogate pair".
But 2 bytes only allow 65536 combinations and that's not enough for every possible symbol. So rare symbols are encoded with a pair of 2-byte characters called "a surrogate pair".
The length of such symbols is `2`:
@ -574,7 +574,7 @@ alert( '𩷶'.length ); // 2, a rare chinese hieroglyph
Note that surrogate pairs did not exist at the time when JavaScript was created, and thus are not correctly processed by the language!
We actually have a single symbol in each of the strings above, but the `length` shows the length of `2`.
We actually have a single symbol in each of the strings above, but the `length` shows a length of `2`.
`String.fromCodePoint` and `str.codePointAt` are few rare methods that deal with surrogate pairs right. They recently appeared in the language. Before them, there were only [String.fromCharCode](mdn:js/String/fromCharCode) and [str.charCodeAt](mdn:js/String/charCodeAt). These methods are actually the same as `fromCodePoint/codePointAt`, but don't work with surrogate pairs.
@ -585,7 +585,7 @@ alert( '𝒳'[0] ); // strange symbols...
alert( '𝒳'[1] ); // ...pieces of the surrogate pair
```
Note that pieces of the surrogate pair have no meaning without each other. So, the alerts in the example above actually display garbage.
Note that pieces of the surrogate pair have no meaning without each other. So the alerts in the example above actually display garbage.
Technically, surrogate pairs are also detectable by their codes: if a character has the code in the interval of `0xd800..0xdbff`, then it is the first part of the surrogate pair. The next character (second part) must have the code in interval `0xdc00..0xdfff`. These intervals are reserved exclusively for surrogate pairs by the standard.
@ -598,15 +598,15 @@ alert( '𝒳'.charCodeAt(0).toString(16) ); // d835, between 0xd800 and 0xdbff
alert( '𝒳'.charCodeAt(1).toString(16) ); // dcb3, between 0xdc00 and 0xdfff
```
You will find more ways to deal with surrogate pairs later in the chapter <info:iterable>. Probably, there are special libraries for that too, but nothing famous enough to suggest here.
You will find more ways to deal with surrogate pairs later in the chapter <info:iterable>. There are probably special libraries for that too, but nothing famous enough to suggest here.
### Diacritical marks and normalization
In many languages there are symbols that are composed of the base character and a mark above/under it.
In many languages there are symbols that are composed of the base character with a mark above/under it.
For instance, letter `a` can be the base character for: `àáâäãåā`. Most common "composite" character have their own code in the UTF-16 table. But not all of them, because there are too many possible combinations.
For instance, the letter `a` can be the base character for: `àáâäãåā`. Most common "composite" character have their own code in the UTF-16 table. But not all of them, because there are too many possible combinations.
To support arbitrary compositions, UTF-16 allows to use several unicode characters. The base character and one or many "mark" characters that "decorate" it.
To support arbitrary compositions, UTF-16 allows us to use several unicode characters. The base character and one or many "mark" characters that "decorate" it.
For instance, if we have `S` followed by the special "dot above" character (code `\u0307`), it is shown as Ṡ.
@ -614,17 +614,17 @@ For instance, if we have `S` followed by the special "dot above" character (code
alert( 'S\u0307' ); // Ṡ
```
If we need a one more mark over the letter (or below it) -- no problem, just add the necessary mark character.
If we need an additional mark above the letter (or below it) -- no problem, just add the necessary mark character.
For instance, if we append a character "dot below" (code `\u0323`), then we'll have "S with dots above and below": `Ṩ`.
The example:
For example:
```js run
alert( 'S\u0307\u0323' ); // Ṩ
```
This gives great flexibility, but also an interesting problem: the same symbol visually can be represented with different unicode compositions.
This provides great flexibility, but also an interesting problem: the same symbol can be visually represented with different unicode compositions.
For instance:
@ -635,7 +635,7 @@ alert( 'S\u0323\u0307' ); // Ṩ, S + dot below + dot above
alert( 'S\u0307\u0323' == 'S\u0323\u0307' ); // false
```
To solve it, there exists a "unicode normalization" algorithm that brings each string to the single "normal" form.
To solve this, there exists a "unicode normalization" algorithm that brings each string to the single "normal" form.
It is implemented by [str.normalize()](mdn:js/String/normalize).
@ -643,7 +643,7 @@ It is implemented by [str.normalize()](mdn:js/String/normalize).
alert( "S\u0307\u0323".normalize() == "S\u0323\u0307".normalize() ); // true
```
It's funny that in our situation `normalize()` actually brings a sequence of 3 characters to one: `\u1e68` (S with two dots).
It's funny that in our situation `normalize()` actually brings together a sequence of 3 characters to one: `\u1e68` (S with two dots).
```js run
alert( "S\u0307\u0323".normalize().length ); // 1
@ -651,9 +651,9 @@ alert( "S\u0307\u0323".normalize().length ); // 1
alert( "S\u0307\u0323".normalize() == "\u1e68" ); // true
```
In real, that is not always so. The reason is that symbol `Ṩ` is "common enough", so UTF-16 creators included it into the main table and gave it the code.
In reality, this is not always the case. The reason being that the symbol `Ṩ` is "common enough", so UTF-16 creators included it in the main table and gave it to the code.
If you want to learn more about normalization rules and variants -- they are described in the appendix to the Unicode standard: [Unicode Normalization Forms](http://www.unicode.org/reports/tr15/), but for most practical reasons the information from this section is enough.
If you want to learn more about normalization rules and variants -- they are described in the appendix of the Unicode standard: [Unicode Normalization Forms](http://www.unicode.org/reports/tr15/), but for most practical purposes the information from this section is enough.
## Summary
@ -661,16 +661,16 @@ If you want to learn more about normalization rules and variants -- they are des
- There are 3 types of quotes. Backticks allow a string to span multiple lines and embed expressions.
- Strings in JavaScript are encoded using UTF-16.
- We can use special characters like `\n` and insert letters by their unicode using `\u...`.
- To get a character: use `[]`.
- To get a substring: use `slice` or `substring`.
- To lowercase/uppercase a string: use `toLowerCase/toUpperCase`.
- To look for a substring: use `indexOf`, or `includes/startsWith/endsWith` for simple checks.
- To compare strings according to the language, use `localeCompare`, otherwise they are compared by character codes.
- To get a character, use: `[]`.
- To get a substring, use: `slice` or `substring`.
- To lowercase/uppercase a string, use: `toLowerCase/toUpperCase`.
- To look for a substring, use: `indexOf`, or `includes/startsWith/endsWith` for simple checks.
- To compare strings according to the language, use: `localeCompare`, otherwise they are compared by character codes.
There are several other helpful methods in strings:
- `str.trim()` -- removes ("trims") spaces from the beginning and end of the string.
- `str.repeat(n)` -- repeats the string `n` times.
- ...and others, see the [manual](mdn:js/String) for details.
- ...and more. See the [manual](mdn:js/String) for details.
Also strings have methods for doing search/replace with regular expressions. But that topic deserves a separate chapter, so we'll return to that later.
Strings also have methods for doing search/replace with regular expressions. But that topic deserves a separate chapter, so we'll return to that later.