[tor-dev] Hidden service search engine (GSoC)
George Kadianakis
desnacked at riseup.net
Sat Mar 8 12:34:58 UTC 2014
Rémi <remi.py at yandex.com> writes:
> Hy,
>
> I am currently a master student with a focus on natural language
> processing, machine learning, information retrieval and data mining.
>
> The Tor website lists a bunch of ideas, one of which is "Search Engine
> for Hidden Services"[1]. This project suits me well given my education
> and skill set and I would really enjoy it.
> Does tor-dev think this would be a good project? There are already many
> hidden search engines, although non are open source.
>
> I have done two smaller information retrieval projects in university
> this year, and I have a strong background in search engine algorithms.
> The components of the system that I am currently thinking of are:
> - index and features in a nosql database (possibly CodernityDB)
> - hidden service crawler
> - simple search using BM25, but recording click through and many
> features other than BM25.
> - Basic front-end.
> - A component for 'Learning to rank' based on more features, which
> should be used once there is significant click-through data. This should
> be an easy to use program that performs search engine optimization.
>
> The recording of the click through is done in order to learn to search
> better. This is important because there is no known search ranker that
> will give excellent results out of the box. Click through recording can
> be done by only recording feature weights.
> I would work in Python because I am very comfortable working with it.
>
>
> What are your thoughts?
>
You look like a reasonable candidate for this project.
The summer doesn't look like enough time to implement all the above
from scratch. You will probably need to use and extend some already
existing tools.
Feel free to submit an application for this project. However, be
warned that we've already received 4+ applications for this project so
it's going to be a tough competition. You are encouraged to submit to
other projects too.
More information about the tor-dev
mailing list