[tor-bugs] #6232 [Analysis]: Make entropy-over-time graph
Tor Bug Tracker & Wiki
torproject-admin at torproject.org
Tue Jul 17 13:30:30 UTC 2012
#6232: Make entropy-over-time graph
-------------------------+--------------------------------------------------
Reporter: arma | Owner:
Type: enhancement | Status: needs_revision
Priority: normal | Milestone:
Component: Analysis | Version:
Keywords: | Parent:
Points: | Actualpoints:
-------------------------+--------------------------------------------------
Changes (by karsten):
* status: needs_review => needs_revision
Comment:
A few comments after re-reading the whole ticket:
- I merged George's patch (thanks!) that outputs degree of anonymity
instead of plain entropy. I'll run it shortly and will post the resulting
graph once I have it.
- Should we add a second graph that plots entropy and maximum entropy as
two lines, as Rob suggested above? That graph should probably consist of
2 x 2 sub graphs for the four cases we distinguish. Should be trivial to
extend the script to output entropy and max_entropy along with their
quotient. I'll look into that and write the graphing code in a bit.
- I wonder if entropies based on subsets of Exit and Guard flagged relays
are correct. I spent yesterday afternoon on trying to learn how path
selection really works
([https://trac.torproject.org/projects/tor/ticket/5755#comment:11 #5755]).
I think we'll have to take bandwidth weights as reported in the footer
section of a consensus into account, too. Those bandwidth weights
influence, for example, how to weight the consensus weight of a relay with
the Exit flag and a relay with Exit ''and'' Guard flag for the exit
position. In a consensus published yesterday, the former was weighted
with Wee=1.0, whereas the latter was weighted with Wed=0.4272. Similarly,
bandwidth weights for the guard position were Wgd=0.2864 and Wgg=0.6446,
so quite different. If we only look at the Exit ''or'' Guard flag of a
relay, we might be quite off. But before we change anything here, I want
to hear back from Mike or Roger if my understanding of path selection is
correct.
- The GeoIP database is part of the sources in metrics-tasks.git, right?
Can we change that and have users provide their own geoip file? I'm
worried that the current "a1" madness influences the results, and I'd like
to swap the current database with the one from February which didn't have
"a1" relays all over.
- Can we add AS-based entropy values, too? There's an AS database from
Maxmind that we could use here. Again, users could provide that database
file, so there's no need to commit it to the Git repo.
- In the longer term, do we want to include family diversity? That
metric would consider all relays in the same relay family as one entity,
similar to how we consider all relays in the same country as one entity in
the country diversity metric. I admit that it's hard to extract families
using the current code, because we'd have to parse server descriptors for
that, too. I'm also not certain that the results will be meaningful. So,
longer-term.
- A shorter-term goal could be to compute bandwidth diversity based on
the relays' advertised bandwidths, not based on their consensus weights.
Relays report their advertised bandwidth in their server descriptor; it's
the minimum of bandwidth rate, burst, and observed bandwidth. We'll want
to compute bandwidth diversity for all relays and for exit/guard subsets
as well as location diversity. This is what Roger was referring to in the
last but one paragraph of the ticket description. Again, I admit that
it's non-trivial to extract advertised bandwidths, because we'll have to
parse server descriptors. But it's easier to compute than relay families.
gsathya, are you up for more coding fun? Didn't you worry that this task
might be too trivial for a thesis? Hah! :)
--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/6232#comment:32>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online
More information about the tor-bugs
mailing list