[metrics-bugs] #32265 [Metrics/Exit Scanner]: MS: Format an exit list from a previous exit list and exitmap output
Tor Bug Tracker & Wiki
blackhole at torproject.org
Wed Nov 20 13:42:44 UTC 2019
#32265: MS: Format an exit list from a previous exit list and exitmap output
----------------------------------+------------------------------
Reporter: irl | Owner: irl
Type: task | Status: needs_review
Priority: Medium | Milestone:
Component: Metrics/Exit Scanner | Version:
Severity: Normal | Resolution:
Keywords: | Actual Points:
Parent ID: #29654 | Points:
Reviewer: karsten | Sponsor:
----------------------------------+------------------------------
Comment (by karsten):
Replying to [comment:6 irl]:
> Replying to [comment:5 karsten]:
> > Glad to see that the rewrite is progressing so quickly!
> >
> > Couple remarks/questions:
> > - Why 48 hours and not 24 hours? Doesn't the current exit scanner
keep scan results for 24 hours? I might be wrong, though. Let's use
whatever the current scanner does.
>
> https://2019.www.torproject.org/tordnsel/exitlist-spec.txt
>
> It discards relays that were not seen in the last 48 hours in a
consensus.
Okay, let's use 48 hours then.
> > - Rather than downloading exit lists from CollecTor, wouldn't it be
sufficient to just read the latest exit list previously written by this
scanner? And if there's none, just assume that no previous scans have
happened. In theory, this should be all we need to learn.
>
> Probably, but this was a handy way to get test data and I wanted to try
out the new Stem functionality. It would be nice to have a method to
bootstrap a new scanner but this could just mean manually downloading the
latest exit list and putting it in the right place.
Actually, I think it's harmful to download exit lists from CollecTor and
merging them with the scanner's own measurements. We should instead merge
new scan results with previous local results. It's also yet another
dependency to download something from CollecTor that is not really needed.
I'd say kill this code.
> > - It seems that `LastStatus` is only taken from exit lists downloaded
from CollecTor but never set by new measurements. We should make a plan
what to do with this field. Take it out? Populate it with consensus valid-
after times?
>
> Right, this is the tricky bit. Do you know if anything consumes the
LastStatus or Published timestamps? Ideally we could just drop these but
for now I'm synthesizing them from the timestamp of the last measurement
which could be close enough for the consumers.
Well, the spec says what these fields are being used for: `Published` is
used to skip relays that haven't published a new descriptor since the one
in the current consensus, and `LastStatus` is used to know when to throw
out relays from the list. This is all under the assumption that the
scanner reads its previous exit list from disk before making measurements.
My suggestion would be to use the consensus valid-after time as
`LastStatus` time. It's pretty much the same as the `published` time in a
version 2 status, and it would work for this purpose.
> > - Does exitmap with the plugin use previous scans as input to decide
which relays to scan? I believe that it uses some logic to avoid scanning
relays too frequently. This has two effects: it doesn't generate more load
on the network and on single relays than necessary, and it ensures that
new relays are scanned sooner. As a result, the new scanner could be run
once or twice per hour, rather than every 2 or 3 hours (at 45 minutes
runtime).
>
> No. It scans the entire network every time. It does this asynchronously,
and doesn't try to prioritize anything. Just whichever circuits are built
first will be tested first. I was even thinking it could run continuously.
If exit relays cannot cope with two HTTP requests an hour, perhaps they
shouldn't be exit relays.
Ideally, we would change as few variables at the same time as possible, in
order to compare the new results with the old ones. Changing the
scheduling from "only scan relays with changed descriptors" to "scan all
relays once per hour" seems like a major design change that we could make
at a later time.
--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/32265#comment:8>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online
More information about the metrics-bugs
mailing list