[tor-dev] GSoC: Ahmia.fi - Search Engine for Hidden Services
Juha Nurmi
juha.nurmi at ahmia.fi
Thu Apr 24 06:00:21 UTC 2014
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
On 22.04.2014 17:35, George Kadianakis wrote:
> Enjoy GSoC :)
I will :)
> BTW, looking again at your proposal, I see that you are going to
> do both popularity tracking and backlinks.
Yes, another crawler gathers backlinks from the public WWW and I will
start gathering the URL clicks from the users.
> How are these two technologies going to interact with each other?
> That is, how will the indexer consider the output of those two
> features?
Django front-end re-sorts the answers from YaCy back-end.
See https://ahmia.fi/static/gsoc/re_sort.jpg
I have this idea in mind: https://ahmia.fi/static/gsoc/sorter.py
The result is sorted according to YaCy result index, number of
backlinks and clicks which are scaled.
Note the scaling: p_info.backlinks = 1 / (float(index) + 1) etc.
sum_function = 3.0*self.yacy + 2.0*self.backlinks + 1.0*self.clicks
where 3, 2 and 1 are test coefficients. I will optimize these and made
a better model if necessary. However, clicks are easily spoofed and
there have to be small coefficient for them.
> Also, with your newly acquired knowledge about backlinks, how long
> is it going to take your incorporate them in ahmia? Are you
> actually going to do it during the "Use an another crawler to
> search .onion pages from the public Internet" phase?
We can test it when popularity tracking and backlinks crawler are working.
- -Juha
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/
iQEcBAEBAgAGBQJTWKhsAAoJELGTs54GL8vA+WAH/1i4sCvvcwotn5b39Ox8yldn
Wv6mBxqlIiaoeBj1Eeu+A92QfGvvpxdWDb7Kn3+3u0IO0wXcZlf0SrIri11IgprW
1f8x5BMDYiaFl12dVO/3jfXSmdfKQ24AdKknfK9wuD63266L2Tks/DVURHQKrYaM
zTfYJKZNWJtOPxUj45lHknHxDWVzRlmqiksRn1aPwx2EW5dpKCCVkV9ySnJdZW74
DWs1es1rLKj6UVmVl6w88PJ/C1COWhMQspXtYIZ8paZQfMHtEgDxLuifITIHgdBh
TdGLUEVteUl5wyCNjDh1Q+ZEkdbMvcpNZuP5D3lUYweHz0cMMOGHC0oaLlJS4KE=
=48jK
-----END PGP SIGNATURE-----
More information about the tor-dev
mailing list