[tor-talk] Funded search engine for onionspace?
Virgil Griffith
i at virgil.gr
Fri Feb 13 22:19:24 UTC 2015
- How does the custom google search thing works? Where does it get its
index? You expose all the tor2web onions on your sitemap, so google
crawls them and generates an index?
Correct :) Everything available on the Google Custom Search is also
available on a regular google search with the qualifier: "site:onion.city"
- I'm a bit concerned that clients connect directly to Google. Can
this be avoided and still keep the custom google search functionality?
Alas no. I'm aware this is suboptimal. I see GOOG search engine as a
temporary-ladder just to get the ball rolling. I am open to using any
other index. For what it's worth I'm very pleased with GOOG's
performance---right now it's searching an index of 650k onion pages and the
number grows every day.
- I don't like that the default link is through onion.city. This means
that onion.city watches *both* the search query *and* the content of
the communication. That's crazy.
In short, yes. However, you can prevent OnionCity from seeing the search
terms by using "site:onion.city" on GOOG. I.e.,
https://www.google.com/webhp?safe=off&q=site%3Aonion.city&q=site:onion.city
- It's especially crazy if you allow your clients to submit HTTP forms
over onion.city, since it basically means that onion.city gets to
see *all* the usernames and passwords. I bet there are many people
out there who don't really get the tor2web threat model, and it's
nasty to read their passwords.
Although we technically could read provided passwords, we don't keep logs
of passed traffic. However, I understand that many users don't understand
the tor2web threat model. But this is the same as all Tor2web nodes, yes?
This is not at all unique to OnionCity. As far as I know all Tor2web nodes
allow form submissions.
- There are various ways to solve or semi-solve this problem. My
preference is to *always* default to the onion link (and maybe also
have an option for a tor2web alternative). Combined with a nice
guide on how to download Tor, this might help user education and IMO
it's the responsible thing to do.
Currently the guide for downloading Tor is http://onion.city/security.html
. Can you suggest something better / more explicit?
You mentioned it'd be better to have it randomly pick among the available
Tor2web nodes instead of everything going through OnionCity. This breaks
the GOOG search engine which only wants to return "canonical" URLs. We
could talk about making OnionCity a DNS round-robin akin to how Tor2web.org
currently works, but then I'm just replicating Tor2web. We've discussed
OnionCity into Tor2web, but it was discouraged because OnionCity does
aggressive behind-the-scenes caching which made Tor2web uncomfortable. I
respect Tor2web's collective wishes.
- How do you crawl for more onions?
Right now I aggregate existing lists of onion sites and put them into the
site map.
* https://ahmia.fi/onions/
* http://skunksworkedp2cg.onion.city/sites.txt
* http://xlmvhk3rpdux26dz.onion.city/
* http://kkkkkku5juzqh33a.onion.city/
As-is GOOG has only indexed only 34% of the domains in the sitemap. This
can be revisited when GOOG has indexed >90%.
- It really needs HTTPS!
Agreed 110%. It's already be there but unfortunately providing HTTPS for
the CDN is currently out of my budget. This is me inquiring about some of
that MEMEX funding :P
- Are you planning to also index non-HTTP services?
HTTP and HTTPS. That's probably it. Open to others, but then you get
into diminishing returns per-unit-effort.
- Serving onion.city as a hidden service would be nice.
Open to this idea. Right now focusing on keeping response times low and
legal hardening. After that rolling out HTTPS. After that however a
Hidden Service would be a fine idea.
- Curious on the funding model here. Will there be ads?
Currently no funding model. Have considered putting ads on the search
results.
- Thanks and best of luck with your project!
Looking forward to Tor 0.2.6! Will be able to provide much more
informative error messages and diagnostics then! <3
-V
More information about the tor-talk
mailing list