[tor-dev] [GSOC 16] Ahmia status update #6
Nurmi, Juha
juha.nurmi at ahmia.fi
Sat Aug 13 06:39:38 UTC 2016
Thanks Ismael!
Great work. You are very productive.
-Juha
On Sat, Aug 13, 2016 at 1:02 AM, Ismael R <zma at riseup.net> wrote:
> Hi everyone,
>
> I'm working on ahmia.fi, the hidden service search engine and you're
> reading
> status update #6.
>
> During the last two weeks, I finished porting the django app to the new
> structure. I'm also working on last minute things before shipping the new
> site
> online.
>
> I will continue updating documentation and add some unit tests to the
> project.
>
> The code is not merged yet but you're welcome to check it on my forks. [1]
> [2]
>
>
> Since this status report is short, here is a list of goals I had in my
> initial
> project proposition and what work has been done on each.
>
> Review code and infrastructure:
> - Split the project in several repositories
> - Improve documentation
> - Automate testing (Travis.CI)
> - Track code quality (Landscape.IO)
> - Track requirements (Requires.IO)
> - Refactor each subproject
>
> Improve search results:
> - Better use of elasticsearch (use of stemmers, shingles, term-centric
> search)
> - Search results are now pages instead of domains.
>
> Improve UI/UX:
> Not much work has been done for this goal. The website has been in the
> process
> of porting old pages to a new design. All pages are now using the new
> design.
>
> Gather more statistics:
> - Pagerank is now used to compute an authority score for each page
> - I suggested that we could use a self hosted statistics framework like
> piwik
> [3] but no decision has been made.
>
> Use stats to better rank search results:
> - Results are ranked by authority score.
>
> Make sense of the indexed info to understand a search meaning:
> - Shingles enable us to differenciate these two queries: "i'm not happy i'm
> working" and "i'm happy i'm not working".
> - Synonyms could be used by the search algorithm if we provided a synonym
> dictionnary. No work has been done at making this work.
>
> Make a google trend-like interface to visualize searches over time:
> No work has been done to reach this optional goal. Even some stats
> fonctionnalities were dropped in the new site because they were "domain-
> centric" when a search engine needs to be "page-centric". We could probably
> index searches in elasticsearch and use Date Histogram Aggregation [4] to
> display trends.
>
> Make stats available with the API:
> No work has been done to reach this optional goal. Some API endpoints were
> also dropped because they were domain-centric. It would be great to have an
> API with a coherent url scheme. I think Django Rest Framework can help
> design
> that API while keeping the code simple.
>
>
> That's it for this week,
> Have a nice weekend.
>
> Ismael R.
>
>
> [1] https://github.com/iriahi/ahmia-site
> [2] https://github.com/iriahi/ahmia-crawler
> [3] https://piwik.org/
> [4] https://www.elastic.co/guide/en/elasticsearch/reference/
> current/search-aggregations-bucket-datehistogram-aggregation.html
> [5] http://www.django-rest-framework.org/
> _______________________________________________
> tor-dev mailing list
> tor-dev at lists.torproject.org
> https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.torproject.org/pipermail/tor-dev/attachments/20160813/4d44f53a/attachment-0001.html>
More information about the tor-dev
mailing list