Skip to content

Commit e07d268

Browse files
committed
encoding: rudimentary TextDecoder support w/o ICU
Also split up the tests.
1 parent 939818f commit e07d268

8 files changed

Lines changed: 428 additions & 268 deletions

File tree

doc/api/errors.md

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -826,6 +826,12 @@ would be possible by calling a callback more then once.
826826
Used when an attempt is made to use crypto features while Node.js is not
827827
compiled with OpenSSL crypto support.
828828

829+
<a id="ERR_NO_ICU"></a>
830+
### ERR_NO_ICU
831+
832+
Used when an attempt is made to use features that require [ICU][], while
833+
Node.js is not compiled with ICU support.
834+
829835
<a id="ERR_NO_LONGER_SUPPORTED"></a>
830836
### ERR_NO_LONGER_SUPPORTED
831837

@@ -955,6 +961,7 @@ installed.
955961
[domains]: domain.html
956962
[event emitter-based]: events.html#events_class_eventemitter
957963
[file descriptors]: https://en.wikipedia.org/wiki/File_descriptor
964+
[ICU]: intl.html#intl_internationalization_support
958965
[online]: http://man7.org/linux/man-pages/man3/errno.3.html
959966
[stream-based]: stream.html
960967
[syscall]: http://man7.org/linux/man-pages/man2/syscall.2.html

doc/api/intl.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -52,7 +52,7 @@ option:
5252
| [WHATWG URL Parser][] | partial (no IDN support) | full | full | full
5353
| [`require('buffer').transcode()`][] | none (function does not exist) | full | full | full
5454
| [REPL][] | partial (inaccurate line editing) | full | full | full
55-
| [`require('util').TextDecoder`][] | none (class does not exist) | partial/full (depends on OS) | partial (Unicode-only) | full
55+
| [`require('util').TextDecoder`][] | partial (basic encodings support) | partial/full (depends on OS) | partial (Unicode-only) | full
5656

5757
*Note*: The "(not locale-aware)" designation denotes that the function carries
5858
out its operation just like the non-`Locale` version of the function, if one

doc/api/util.md

Lines changed: 39 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -536,7 +536,7 @@ added: v8.0.0
536536
A Symbol that can be used to declare custom promisified variants of functions,
537537
see [Custom promisified functions][].
538538

539-
### Class: util.TextDecoder
539+
## Class: util.TextDecoder
540540
<!-- YAML
541541
added: REPLACEME
542542
-->
@@ -555,23 +555,33 @@ while (buffer = getNextChunkSomehow()) {
555555
string += decoder.decode(); // end-of-stream
556556
```
557557

558-
#### WHATWG Supported Encodings
558+
### WHATWG Supported Encodings
559559

560560
Per the [WHATWG Encoding Standard][], the encodings supported by the
561561
`TextDecoder` API are outlined in the tables below. For each encoding,
562-
one or more aliases may be used. Support for some encodings is enabled
563-
only when Node.js is using the full ICU data (see [Internationalization][]).
564-
`util.TextDecoder` is `undefined` when ICU is not enabled during build.
562+
one or more aliases may be used.
565563

566-
##### Encodings Supported By Default
564+
Different Node.js build configurations support different sets of encodings.
565+
While a very basic set of encodings is supported even on Node.js builds without
566+
ICU enabled, support for some encodings is provided only when Node.js is built
567+
with ICU and using the full ICU data (see [Internationalization][]).
568+
569+
#### Encodings Supported Without ICU
567570

568571
| Encoding | Aliases |
569572
| ----------- | --------------------------------- |
570-
| `'utf8'` | `'unicode-1-1-utf-8'`, `'utf-8'` |
571-
| `'utf-16be'`| |
573+
| `'utf-8'` | `'unicode-1-1-utf-8'`, `'utf8'` |
572574
| `'utf-16le'`| `'utf-16'` |
573575

574-
##### Encodings Requiring Full-ICU
576+
#### Encodings Supported by Default (With ICU)
577+
578+
| Encoding | Aliases |
579+
| ----------- | --------------------------------- |
580+
| `'utf-8'` | `'unicode-1-1-utf-8'`, `'utf8'` |
581+
| `'utf-16le'`| `'utf-16'` |
582+
| `'utf-16be'`| |
583+
584+
#### Encodings Requiring Full ICU Data
575585

576586
| Encoding | Aliases |
577587
| ----------------- | -------------------------------- |
@@ -613,13 +623,14 @@ only when Node.js is using the full ICU data (see [Internationalization][]).
613623
*Note*: The `'iso-8859-16'` encoding listed in the [WHATWG Encoding Standard][]
614624
is not supported.
615625

616-
#### new TextDecoder([encoding[, options]])
626+
### new TextDecoder([encoding[, options]])
617627

618628
* `encoding` {string} Identifies the `encoding` that this `TextDecoder` instance
619629
supports. Defaults to `'utf-8'`.
620630
* `options` {Object}
621631
* `fatal` {boolean} `true` if decoding failures are fatal. Defaults to
622-
`false`.
632+
`false`. This option is only supported when ICU is enabled (see
633+
[Internationalization][]).
623634
* `ignoreBOM` {boolean} When `true`, the `TextDecoder` will include the byte
624635
order mark in the decoded result. When `false`, the byte order mark will
625636
be removed from the output. This option is only used when `encoding` is
@@ -628,7 +639,7 @@ is not supported.
628639
Creates an new `TextDecoder` instance. The `encoding` may specify one of the
629640
supported encodings or an alias.
630641

631-
#### textDecoder.decode([input[, options]])
642+
### textDecoder.decode([input[, options]])
632643

633644
* `input` {ArrayBuffer|DataView|TypedArray} An `ArrayBuffer`, `DataView` or
634645
Typed Array instance containing the encoded data.
@@ -644,49 +655,55 @@ internally and emitted after the next call to `textDecoder.decode()`.
644655
If `textDecoder.fatal` is `true`, decoding errors that occur will result in a
645656
`TypeError` being thrown.
646657

647-
#### textDecoder.encoding
658+
### textDecoder.encoding
648659

649-
* Value: {string}
660+
* {string}
650661

651662
The encoding supported by the `TextDecoder` instance.
652663

653-
#### textDecoder.fatal
664+
### textDecoder.fatal
654665

655-
* Value: {boolean}
666+
* {boolean}
656667

657668
The value will be `true` if decoding errors result in a `TypeError` being
658669
thrown.
659670

660-
#### textDecoder.ignoreBOM
671+
### textDecoder.ignoreBOM
661672

662-
* Value: {boolean}
673+
* {boolean}
663674

664675
The value will be `true` if the decoding result will include the byte order
665676
mark.
666677

667-
### Class: util.TextEncoder
678+
## Class: util.TextEncoder
668679
<!-- YAML
669680
added: REPLACEME
670681
-->
671682

672683
> Stability: 1 - Experimental
673684
674685
An implementation of the [WHATWG Encoding Standard][] `TextEncoder` API. All
675-
instances of `TextEncoder` only support `UTF-8` encoding.
686+
instances of `TextEncoder` only support UTF-8 encoding.
676687

677688
```js
678689
const encoder = new TextEncoder();
679690
const uint8array = encoder.encode('this is some data');
680691
```
681692

682-
#### textEncoder.encode([input])
693+
### textEncoder.encode([input])
683694

684695
* `input` {string} The text to encode. Defaults to an empty string.
685696
* Returns: {Uint8Array}
686697

687-
UTF-8 Encodes the `input` string and returns a `Uint8Array` containing the
698+
UTF-8 encodes the `input` string and returns a `Uint8Array` containing the
688699
encoded bytes.
689700

701+
### textDecoder.encoding
702+
703+
* {string}
704+
705+
The encoding supported by the `TextEncoder` instance. Always set to `'utf-8'`.
706+
690707
## Deprecated APIs
691708

692709
The following APIs have been deprecated and should no longer be used. Existing

0 commit comments

Comments
 (0)