[tor-dev] Error-Correcting Onions with Bech32
nullius
nullius at nym.zone
Sun Dec 31 02:46:00 UTC 2017
On 2017-12-31 at 00:57:49 +0000, Alec Muffett <alec.muffett at gmail.com>
wrote:
>Thanks! That's very interesting! TIL :-)
Why, if it isn’t instant feedback from the RFC 7686 co-author! In
response to what you said, in brief: I will propose that any subdomain
data (which is presumably human-readable) be transmitted in a separate
or affixed string, leaving Bech32 to deal with the pseudorandom blobs.
Technical details follow.
>What would you propose to do with subdomains, like
>www.facebookcorewwwi.onion? Or is that outside the scope of your
>proposal?
Good question. That had briefly occurred to me; but I couldn’t figure
out any feasible means to stuff subdomains into the Bech32 string, for
the following reasons:
(0) RFC 1034 DNS names may be up to 255 octets in length. But Bech32
strings are more length-limited. After subtracting an HRP of “onion” (5
chars), the required separator of “1”, and the 6 characters of ECC
checksum in the data part, the 90-character total length limit can only
spare up to 78 characters for the onion address data. For both v2 and
v3 onions, that’s more than sufficient. But even if the length limit
could be raised, an excessively long string would destroy the
human-friendliness which is the raison d’être for Bech32.
(I *infer* that this last may be one reason for the length limit.
Although of course I can’t say for certain, I’ve read Greg Maxwell
discussing some of the user testing involved in the standard’s
development; and 90 chars seems to me the extreme of what a mortal
flesh-and-blood creature could handle with such a string.)
(1) Bech32 is a base-32 encoding, only with a different alphabet than
RFC 4648. Thus, it would be necessary to design another layer of
encoding to most efficiently represent subdomain labels and the
dot-separator with an alphabet of 38 characters [-0-9a-z.]. Worse,
depending on which standards an implementation follows or ignores, that
is not really a strict limitation on names seen in the wild. How should
the Bech32 transformation deal with names containing an underscore “_”?
Or other characters? I think it would only be safe to go with full
octets. This would severely exacerbate the problem of (0) above.
(Aside: The special alphabet is bound to raise some eyebrows; so I will
here quote its rationale from BIP 173: “The character set is chosen to
minimize ambiguity according to [this](https://hissa.nist.gov/~black/GTLD/)
visual similarity data, and the ordering is chosen to minimize the
number of pairs of similar characters (according to the same data) that
differ in more than 1 bit. As the checksum is chosen to maximize
detection capabilities for low numbers of bit errors, this choice
improves its performance under some error models.” From what I
understand, a large amount of CPU time was spent crunching over the data
in search of the most error-resistant alphabet.)
(2) Most subdomains are human-memorable—in your example, “www”. Coding
them with Bech32 would decrease human-friendliness, which is the precise
opposite of my objective in making this suggestion. Bech32 is great for
helping humans deal with pseudorandom blobs; for those, it improves upon
RFC4648 Base32, Base64, hexadecimal, or in Bitcoin’s case, the old
base58-based address encoding. But it is absolutely inappropriate as a
coding format for text which humans can easily read, type, and remember.
It is also important to consider relative impact in common usage. I
observe that most .onions do not use subdomains. I do think that it’s
important to support this use case; but if tradeoffs must be made, then
I would optimize more for making that pseudorandom blob less brittle in
human hands.
For the foregoing reasons, I will propose that subdomain data, if any,
be kept separate from the Bech32 coding. It may be either kept in a
separate string, or somehow affixed with a special delimiter either
before or after the Bech32 representation of the onion. Off-the-cuff,
which of these looks best to you?
www:onion19qzypww2zw3ykkkglr4tu9
onion19qzypww2zw3ykkkglr4tu9:www
another-level.www:onion19qzypww2zw3ykkkglr4tu9
(My choice of a delimiter here may be wrong, if we want for the
browser’s address bar to translate it. I should think more about this.)
Finally, I think I should mention: Yes, “onion19qzypww2zw3ykkkglr4tu9”
is not as pretty as “facebookcorewwwi.onion”. But few .onion sites have
the compute power available to Facebook! Moreover, my proposal should
apply to v3 onions—where nobody on Earth will be able to fully
bruteforce out a human-memorable string.
I would advise users to stick to the DNS-style coding for
facebookcorewwwi.onion, and take advantage of Bech32 as an alternative
representation for http://yz7lpwfhhzcdyc5y.onion/ ,
http://5nca3wxl33tzlzj5.onion/ , and other such strings. Those are pure
pain for users now, and it will only get use when v3 onions get uptake.
Error-correcting codes do not make the names any easier to read; but
they certainly do help with the inevitable mistakes in all the use cases
which involve voice, handwriting, manual typing, carrier pigeons, etc.
--
nullius at nym.zone | PGP ECC: 0xC2E91CD74A4C57A105F6C21B5A00591B2F307E0C
Bitcoin: bc1qcash96s5jqppzsp8hy8swkggf7f6agex98an7h | (Segwit nested:
3NULL3ZCUXr7RDLxXeLPDMZDZYxuaYkCnG) (PGP RSA: 0x36EBB4AB699A10EE)
“‘If you’re not doing anything wrong, you have nothing to hide.’
No! Because I do nothing wrong, I have nothing to show.” — nullius
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 228 bytes
Desc: not available
URL: <http://lists.torproject.org/pipermail/tor-dev/attachments/20171231/35c038f5/attachment-0001.sig>
More information about the tor-dev
mailing list