[tor-project] Ethics Guidelines; crawling .onion

Griffin Boyce griffin at cryptolab.net
Wed Jun 8 07:55:54 UTC 2016


Hey Virgil,

   While I know you and I have talked about this in private recently, it 
seems like a good time to table this discussion for a couple of weeks.  
Considering everything else that's going on, this might not be the ideal 
time for everyone to contribute to the discussion.

<3
Griffin




Virgil Griffith wrote:
> Here's yet another data point indicating the policy on crawling .onion
> needs to be clarified.  The new and popular OnionStats tool doesn't
> even respect /robots.txt, see:
> https://onionscan.org/reports/may2016.html
> 
> So now we have *three* different positions among respected members of
> the Tor community.
> 
> (1) isis et al: robots.txt is insufficient
> --- "Consent is not the absence of saying 'no' — it is explicitly
> saying 'yes'."
> 
> (2) onionlink/ahmia/notevil/grams: we respect robots.txt
> --- "Default is yes, but you can always opt-out."
> 
> (3) onionstats/memex: we ignore robots.txt
> --- "Don't care even if you opt-out."
> 
> -V
> 
> On Wed, Jun 8, 2016 at 1:34 AM, Virgil Griffith <i at virgil.gr> wrote:
> 
>> Hello all.
>> 
>> I wrote on this topic earlier at:
>> 
>> 
> https://lists.torproject.org/pipermail/tor-project/2016-May/000411.html
>> 
>> This is me again asking for clarification.  I choose this issue
>> because it is the most self-contained of the various ones raised by
>> isis et al, and it seemed wise to clarify this becoming opening up a
>> new one.  If someone from Tor management writes me that social
>> reasons prohibit search engines from being addressed at this time, I
>> will drop it.
>> 
>> Given the lack of prior reaction as well as ahmia.fi [1] getting
>> funded for GSoC (ahmia has followed /robots.txt from day zero), I
>> tentatively conclude this crawling .onion is non-controversial,
>> i.e., "Per Tor community standards, search engines obeying
>> robots.txt are a-okay.  Equivalently, indexing .onion content is
>> treated equivalently as any other part of the web."
>> 
>> But, to motivate as well as give any concerned parties an
>> opportunity to be hard, I have republished the onion2bitcoin as well
>> as the bitcoin2onion anonymizing only the final 4 characters of the
>> .onion address instead of final 8.
>> 
>> -- http://virgil.gr/wp-content/uploads/2016/06/onion2btc.html
>> -- http://virgil.gr/wp-content/uploads/2016/06/btc2onion.html
>> 
>> -V
>> 
>> On Tue, May 31, 2016 at 10:05 PM, Virgil Griffith <i at virgil.gr>
>> wrote:
>> This seems like something people would have opinions on.  Anyone?
>> 
>> -V
>> 
>> On Monday, 30 May 2016, Virgil Griffith <i at virgil.gr> wrote:
>> 
>> Hello all.
>> 
>> I am preparing a longer response to the issues Isis et al mentioned.
>> Most are interrelated, but this one is not.  And I wanted to get
>> clarification on it.
>> 
>> Isis expressed a concern about making a list of bitcoin addresses
>> from .onion, citing, "Consent is not the absence of saying 'no' —
>> it is explicitly saying 'yes'."
>> 
>> For what it's worth, ahmia.fi [1] actually supports regex searching
>> right out of the box.  In fact, a single line of JSON spits out all
>> known bitcoin addresses ahmia knows about.
>> 
>> For example, here's an anonymized list going .onion -> BTC which I
>> mined from Ahmia,
>> * http://virgil.gr/wp-content/uploads/2016/05/btc-on-dot-onion.html
>> [6MB]
>> 
>> And here's the same information going BTC -> .onion
>> * http://virgil.gr/wp-content/uploads/2016/05/btc2domains.v2.txt
>> [2mb]
>> 
>> If you want to check the results you can ask Juha for the JSON query
>> to do this.
>> 
>> Lets go out on a limb and assume that regexs are okay.  Is the issue
>> then .onion search-engines?  I understand Isis's preference for
>> there to always be affirmative consent but does that mean that until
>> such a standard exists all search engines from onion.link, ahmia.fi
>> [1], MEMEX, NotEvil, and Grams are violating official Tor community
>> policy?
>> 
>> ----
>> Here's how I currently see this.  I put on my amateur legal hat and
>> say, "Well, the Internet/world-wide-web is considered a public
>> space.  Onion-sites are like the web, but with masked speakers."
>> 
>> *
>> 
> https://www.hks.harvard.edu/m-rcbg/research/j.camp_acm.computer_internet.as.public.space.pdf
>> * http://aims.muohio.edu/2011/02/01/is-the-internet-a-public-space/
>> 
>> Ergo, I would argue that, by default, content on .onion is public
>> the same way everything else on the web is.  If you don't want to be
>> "indexed", for physical spaces you go in-doors, or for the web you
>> put up a login.  As an aside, the web-standard is actually *kinder*
>> than physical public spaces because on the web one can have an
>> unobstrusive /robots.txt saying, "please don't index me".  Which is
>> a great thing.
>> 
>> Whereas some would say Tor users are "anonymous", others would
>> instead say any and everything Tor is "private".  I believe this
>> needs to be clarified.  I once proposed to Roger that he delineate
>> the sub-types of privacy in the same way Stallman delineated his
>> "Four Freedoms".  Roger replied that he preferred using the broad
>> catch-all term "Privacy".  These confusions may be a caveat of using
>> a broad catch-all term.  Interpreting broadly, Isis is correct.
>> However, this conclusion has a lot of unpleasant ramifications.
>> 
>> Comments appreciated,
>> -V
>> 
>> P.S. Mildly related, I saw this today involving DARPA, and Tor.
>> http://thehackernews.com/2016/05/darpa-trace-hacker.html
>> 
>> """
>> The aim of Enhanced Attribution program is to track personas
>> continuously and create “algorithms for developing predictive
>> behavioral profiles.”
>> """
>> 
>> I hope you all are aware this flows directly from MEMEX.  Right?
>> This, and MEMEX, seems a much more appropriate target for outrage.
>> A lot of this work that numerous community members have worked on
>> gives even me pause.
> 
> 
> 
> Links:
> ------
> [1] http://ahmia.fi
> 
> _______________________________________________
> tor-project mailing list
> tor-project at lists.torproject.org
> https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-project

-- 
There are 10 kinds of people in the world: those who understand binary, 
those who don't, and people who didn't expect a base 3 joke.


More information about the tor-project mailing list