[tor-dev] Proposal 285: Directory documents should be standardized as UTF-8
Alex Xu
alex_y_xu at yahoo.ca
Wed Jan 10 01:36:22 UTC 2018
Quoting teor (2018-01-10 00:19:54)
> These are called "Unicode Scalar Values".
> https://www.unicode.org/glossary/#unicode_scalar_value
>
> Let's reference that.
"Unicode Scalar Value" includes U+0, which I think we probably want to
exclude.
> > * each encoded with the shortest possible encoding.
> > * without any BOM
> >
> > Are there other restrictions we should make? If so, how should we phrase them?
>
> These seem fine, and not tied to a particular unicode version.
>
> But I don't know enough about Unicode to know if there is anything else we should
> specify.
Skimming through
https://www.cl.cam.ac.uk/~mgk25/ucs/examples/UTF-8-test.txt, I think it
might be good to additionally forbid the code points listed at the end:
U+nFFF{E,F} for n = 0..10, and U+FDD0 through U+FDEF.
More information about the tor-dev
mailing list