en.javascript.info/4-binary/02-text-decoder/article.md
Violet Bora Lee e85e48f8d0 minor fixes
2020-01-27 15:24:23 +09:00

76 lines
2.4 KiB
Markdown

# TextDecoder and TextEncoder
What if the binary data is actually a string? For instance, we received a file with textual data.
The build-in [TextDecoder](https://encoding.spec.whatwg.org/#interface-textdecoder) object allows to read the value into an actual JavaScript string, given the buffer and the encoding.
We first need to create it:
```js
let decoder = new TextDecoder([label], [options]);
```
- **`label`** -- the encoding, `utf-8` by default, but `big5`, `windows-1251` and many other are also supported.
- **`options`** -- optional object:
- **`fatal`** -- boolean, if `true` then throw an exception for invalid (non-decodable) characters, otherwise (default) replace them with character `\uFFFD`.
- **`ignoreBOM`** -- boolean, if `true` then ignore BOM (an optional byte-order unicode mark), rarely needed.
...And then decode:
```js
let str = decoder.decode([input], [options]);
```
- **`input`** -- `BufferSource` to decode.
- **`options`** -- optional object:
- **`stream`** -- true for decoding streams, when `decoder` is called repeatedly with incoming chunks of data. In that case a multi-byte character may occasionally split between chunks. This options tells `TextDecoder` to memorize "unfinished" characters and decode them when the next chunk comes.
For instance:
```js run
let uint8Array = new Uint8Array([72, 101, 108, 108, 111]);
alert( new TextDecoder().decode(uint8Array) ); // Hello
```
```js run
let uint8Array = new Uint8Array([228, 189, 160, 229, 165, 189]);
alert( new TextDecoder().decode(uint8Array) ); // 你好
```
We can decode a part of the buffer by creating a subarray view for it:
```js run
let uint8Array = new Uint8Array([0, 72, 101, 108, 108, 111, 0]);
// the string is in the middle
// create a new view over it, without copying anything
let binaryString = uint8Array.subarray(1, -1);
alert( new TextDecoder().decode(binaryString) ); // Hello
```
## TextEncoder
[TextEncoder](https://encoding.spec.whatwg.org/#interface-textencoder) does the reverse thing -- converts a string into bytes.
The syntax is:
```js
let encoder = new TextEncoder();
```
The only encoding it supports is "utf-8".
It has two methods:
- **`encode(str)`** -- returns `Uint8Array` from a string.
- **`encodeInto(str, destination)`** -- encodes `str` into `destination` that must be `Uint8Array`.
```js run
let encoder = new TextEncoder();
let uint8Array = encoder.encode("Hello");
alert(uint8Array); // 72,101,108,108,111
```