[tor-dev] GSoC: Ahmia.fi - Search Engine for Hidden Services
Juha Nurmi
juha.nurmi at ahmia.fi
Sun Apr 27 07:15:00 UTC 2014
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
On 25.04.2014 17:27, George Kadianakis wrote:
> Juha Nurmi <juha.nurmi at ahmia.fi> writes:
>
>> On 22.04.2014 17:35, George Kadianakis wrote:
>>> Enjoy GSoC :)
>>
>> I will :)
>>
>>> BTW, looking again at your proposal, I see that you are going
>>> to do both popularity tracking and backlinks.
>>
>> Yes, another crawler gathers backlinks from the public WWW and I
>> will start gathering the URL clicks from the users.
>>
>>> How are these two technologies going to interact with each
>>> other? That is, how will the indexer consider the output of
>>> those two features?
>>
>> Django front-end re-sorts the answers from YaCy back-end.
>>
>> See https://ahmia.fi/static/gsoc/re_sort.jpg
>>
>> I have this idea in mind: https://ahmia.fi/static/gsoc/sorter.py
>>
>> The result is sorted according to YaCy result index, number of
>> backlinks and clicks which are scaled.
>>
>> Note the scaling: p_info.backlinks = 1 / (float(index) + 1)
>> etc.
>>
>> sum_function = 3.0*self.yacy + 2.0*self.backlinks +
>> 1.0*self.clicks
>>
>> where 3, 2 and 1 are test coefficients. I will optimize these and
>> made a better model if necessary. However, clicks are easily
>> spoofed and there have to be small coefficient for them.
>>
>
> That makes sense.
>
> BTW, what is the 'yacy' score? Is it just the order that YaCy's
> indexer chose for each result? Or does YaCy actually expose a
> score for each result? How is the score derived? Or do you treat it
> as a blackbox and assume it's the most accurate of backlinks and
> popularity.
>
I am using only the order information.
BTW, we (Mikko installed new servers) are migrating YaCy servers and
took down the old one system. There should be a working crawler +
fresh full text search results soon :)
- -Juha
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/
iQEcBAEBAgAGBQJTXK5uAAoJELGTs54GL8vA1bcH/R/8xYJMCk7rc296/UBWBlaX
SDGYO/85EjbdBUokleQAZ8odxrV+rNCbsWMbncddo8QLxl6w99tS9Wz1ehZ+KOI2
beSCSEdS46gnztoGTRrRos4YFxEfbq708wFUh0CDQbzeT9doBX6dAV62FXhP8Fgm
sY/YvqNMJSBnqqlojsAfHV70IorjveEJ23pnktX8fcfkTqM+xBIVk0Ul2zggQNW+
c/d9SuaZLDB2Fdbsch4Ip3Tln8C/tLF7HC1cyRh7QDwU1zmr8UUe0N3mmzwEqUWA
h/uD/U3yZSNQfGrSI8/19QjvsDqCdoWIP/i78B90iIZhJ8YNlyN+cydb1O+cj9A=
=Dfu/
-----END PGP SIGNATURE-----
More information about the tor-dev
mailing list