[tor-project] Ethics Guidelines; crawling .onion

Thu Jul 7 04:40:21 UTC 2016

Hello all.  Back in June Griffin asked for this conversation to be
temporarily tabled, and it's been a month!

Let us discuss robots.txt and crawling of .onion.  Right now we have
*three* camps!  They are:

So now we have *three* different positions among respected members of the
Tor community.

(A) isis et al: robots.txt is insufficient
--- "Consent is not the absence of saying 'no' — it is explicitly saying
'yes'."

(B) onionlink/ahmia/notevil/grams: we respect robots.txt
--- "Default is yes, but you can always opt-out."

(C) onionstats/memex: we ignore robots.txt
--- "Don't care even if you opt-out." (see
https://onionscan.org/reports/may2016.html)

Isis did a good job arguing for (A) by claiming that representing (B) and
(C) are "blatant and disgusting workaround[s] to the trust and expectations
which onion service operators place in the network."
https://lists.torproject.org/pipermail/tor-project/2016-May/000356.html

This is me arguing for (B):
https://lists.torproject.org/pipermail/tor-project/2016-May/000411.html

I have no link arguing for (C).

I had tried to get this conversation moving before.  So to poke this
discussion to go forward this time, I have republished the onion2bitcoin as
well as the bitcoin2onion anonymizing only the final 4 characters of the
.onion address instead of final 8.  Under (A), compiling this list is
deeply heretical.  In the view of either (B) or (C), .onion content is by
default public (presumably running regexs is fine), compiling such data is
a perfectly fine thing to do.
-- http://virgil.gr/wp-content/uploads/2016/06/onion2btc.html
-- http://virgil.gr/wp-content/uploads/2016/06/btc2onion.html

Let's discuss!

-V
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.torproject.org/pipermail/tor-project/attachments/20160707/35ddce70/attachment.html>