[tor-project] Ethics Guidelines; crawling .onion
Griffin Boyce
griffin at cryptolab.net
Wed Jun 8 07:55:54 UTC 2016
Hey Virgil,
While I know you and I have talked about this in private recently, it
seems like a good time to table this discussion for a couple of weeks.
Considering everything else that's going on, this might not be the ideal
time for everyone to contribute to the discussion.
<3
Griffin
Virgil Griffith wrote:
> Here's yet another data point indicating the policy on crawling .onion
> needs to be clarified. The new and popular OnionStats tool doesn't
> even respect /robots.txt, see:
> https://onionscan.org/reports/may2016.html
>
> So now we have *three* different positions among respected members of
> the Tor community.
>
> (1) isis et al: robots.txt is insufficient
> --- "Consent is not the absence of saying 'no' — it is explicitly
> saying 'yes'."
>
> (2) onionlink/ahmia/notevil/grams: we respect robots.txt
> --- "Default is yes, but you can always opt-out."
>
> (3) onionstats/memex: we ignore robots.txt
> --- "Don't care even if you opt-out."
>
> -V
>
> On Wed, Jun 8, 2016 at 1:34 AM, Virgil Griffith <i at virgil.gr> wrote:
>
>> Hello all.
>>
>> I wrote on this topic earlier at:
>>
>>
> https://lists.torproject.org/pipermail/tor-project/2016-May/000411.html
>>
>> This is me again asking for clarification. I choose this issue
>> because it is the most self-contained of the various ones raised by
>> isis et al, and it seemed wise to clarify this becoming opening up a
>> new one. If someone from Tor management writes me that social
>> reasons prohibit search engines from being addressed at this time, I
>> will drop it.
>>
>> Given the lack of prior reaction as well as ahmia.fi [1] getting
>> funded for GSoC (ahmia has followed /robots.txt from day zero), I
>> tentatively conclude this crawling .onion is non-controversial,
>> i.e., "Per Tor community standards, search engines obeying
>> robots.txt are a-okay. Equivalently, indexing .onion content is
>> treated equivalently as any other part of the web."
>>
>> But, to motivate as well as give any concerned parties an
>> opportunity to be hard, I have republished the onion2bitcoin as well
>> as the bitcoin2onion anonymizing only the final 4 characters of the
>> .onion address instead of final 8.
>>
>> -- http://virgil.gr/wp-content/uploads/2016/06/onion2btc.html
>> -- http://virgil.gr/wp-content/uploads/2016/06/btc2onion.html
>>
>> -V
>>
>> On Tue, May 31, 2016 at 10:05 PM, Virgil Griffith <i at virgil.gr>
>> wrote:
>> This seems like something people would have opinions on. Anyone?
>>
>> -V
>>
>> On Monday, 30 May 2016, Virgil Griffith <i at virgil.gr> wrote:
>>
>> Hello all.
>>
>> I am preparing a longer response to the issues Isis et al mentioned.
>> Most are interrelated, but this one is not. And I wanted to get
>> clarification on it.
>>
>> Isis expressed a concern about making a list of bitcoin addresses
>> from .onion, citing, "Consent is not the absence of saying 'no' —
>> it is explicitly saying 'yes'."
>>
>> For what it's worth, ahmia.fi [1] actually supports regex searching
>> right out of the box. In fact, a single line of JSON spits out all
>> known bitcoin addresses ahmia knows about.
>>
>> For example, here's an anonymized list going .onion -> BTC which I
>> mined from Ahmia,
>> * http://virgil.gr/wp-content/uploads/2016/05/btc-on-dot-onion.html
>> [6MB]
>>
>> And here's the same information going BTC -> .onion
>> * http://virgil.gr/wp-content/uploads/2016/05/btc2domains.v2.txt
>> [2mb]
>>
>> If you want to check the results you can ask Juha for the JSON query
>> to do this.
>>
>> Lets go out on a limb and assume that regexs are okay. Is the issue
>> then .onion search-engines? I understand Isis's preference for
>> there to always be affirmative consent but does that mean that until
>> such a standard exists all search engines from onion.link, ahmia.fi
>> [1], MEMEX, NotEvil, and Grams are violating official Tor community
>> policy?
>>
>> ----
>> Here's how I currently see this. I put on my amateur legal hat and
>> say, "Well, the Internet/world-wide-web is considered a public
>> space. Onion-sites are like the web, but with masked speakers."
>>
>> *
>>
> https://www.hks.harvard.edu/m-rcbg/research/j.camp_acm.computer_internet.as.public.space.pdf
>> * http://aims.muohio.edu/2011/02/01/is-the-internet-a-public-space/
>>
>> Ergo, I would argue that, by default, content on .onion is public
>> the same way everything else on the web is. If you don't want to be
>> "indexed", for physical spaces you go in-doors, or for the web you
>> put up a login. As an aside, the web-standard is actually *kinder*
>> than physical public spaces because on the web one can have an
>> unobstrusive /robots.txt saying, "please don't index me". Which is
>> a great thing.
>>
>> Whereas some would say Tor users are "anonymous", others would
>> instead say any and everything Tor is "private". I believe this
>> needs to be clarified. I once proposed to Roger that he delineate
>> the sub-types of privacy in the same way Stallman delineated his
>> "Four Freedoms". Roger replied that he preferred using the broad
>> catch-all term "Privacy". These confusions may be a caveat of using
>> a broad catch-all term. Interpreting broadly, Isis is correct.
>> However, this conclusion has a lot of unpleasant ramifications.
>>
>> Comments appreciated,
>> -V
>>
>> P.S. Mildly related, I saw this today involving DARPA, and Tor.
>> http://thehackernews.com/2016/05/darpa-trace-hacker.html
>>
>> """
>> The aim of Enhanced Attribution program is to track personas
>> continuously and create “algorithms for developing predictive
>> behavioral profiles.”
>> """
>>
>> I hope you all are aware this flows directly from MEMEX. Right?
>> This, and MEMEX, seems a much more appropriate target for outrage.
>> A lot of this work that numerous community members have worked on
>> gives even me pause.
>
>
>
> Links:
> ------
> [1] http://ahmia.fi
>
> _______________________________________________
> tor-project mailing list
> tor-project at lists.torproject.org
> https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-project
--
There are 10 kinds of people in the world: those who understand binary,
those who don't, and people who didn't expect a base 3 joke.
More information about the tor-project
mailing list