[metrics-bugs] #21315 [Obfuscation/Snowflake]: publish some realtime stats from the broker?
Tor Bug Tracker & Wiki
blackhole at torproject.org
Thu Apr 11 20:39:04 UTC 2019
#21315: publish some realtime stats from the broker?
-----------------------------------+---------------------------
Reporter: arma | Owner: (none)
Type: enhancement | Status: new
Priority: Medium | Milestone:
Component: Obfuscation/Snowflake | Version:
Severity: Normal | Resolution:
Keywords: | Actual Points:
Parent ID: #29461 | Points:
Reviewer: | Sponsor: Sponsor19
-----------------------------------+---------------------------
Comment (by irl):
Replying to [comment:5 cohosh]:
> It sounds like we have a few things we want to achieve/learn from
collected metrics:
> - Detect censorship events
> - Allow current or potential proxies to see if they are needed
> - Allow clients to see whether their connection issues are due to
censorship or proxy availability
> - Help us figure out whether we should be doing something different in
distributing proxies to clients
These all seem like good goals.
> We current collect and "publish" information on:
> - how many snowflake are currently available along with their SIDs
(available at broker /debug handler). This is good for more detailed
monitoring of censorship events. Even though we collect bridge usage
metrics, collecting broker usage metrics will narrow down where the
censorship is happening.
> - country stats of domain-fronted client connections (logged, most
recent snapshot at broker /debug)
> - the roundtrip time it takes for a client to connect to get a snowflake
proxy answer (available at broker /debug)
Should we be already archiving this data?
> Some of the metrics mentioned above will be easier to implement than
others. The best place to collect statistics is at the broker, but some of
the data mentioned would require proxies to report metrics to the broker
for collection. We have to be a bit careful with this since anyone can run
a proxy. It will also impact the decisions we make for #29207.
We collect a lot of statistics at relays and bridges, which anyone can
run. We are working on methods of improving robustness against these
statistics being manipulated, but so far have not detected anyone
reporting values that are not normal. It is good to have criteria for
determining, based on stats others report, what you would be expecting so
that anomalies can be detected. For example, we would expect relay
bandwidth usage among relays to be proportional to consensus weight.
> > I would also be interested in stats about users and usage (including
e.g. number of users being handled divided by number of snowflakes
handling them)
>
> This is a bit tricky. The broker knows which proxies it hands out the
users but doesn't know the state of the clients' connections to those
proxies (e.g., when they have been closed). It's also worth noting that
different "types" of proxies (standalone vs. browser-based) can handle a
different amount of users at once. Perhaps a more useful metric would be
for snowflake proxies to advertize to the broker how many available
"slots/tokens" they have when they poll for clients. This could be added
to the broker--proxy WebSocket protocol. It would also avoid collecting
more data on clients which is generally safer
This sounds like a reasonable approach. You might want to take a look at:
* https://research.torproject.org/techreports/countingusers-2010-11-30.pdf
* https://research.torproject.org/techreports/counting-daily-bridge-
users-2012-10-24.pdf
This will give you an idea of how we do this for other parts of Tor.
> > how many times are you giving snowflakes out? How many times did you
stop giving a snowflake out because you've given it out so many times
already? These questions tie into the address distribution algorithm
question
Can this also be an indirect measurement of number of users?
> The above comment addresses this as well. The broker doesn't really
decide whether or not they've given a snowflake out too many times. I
think more important to deciding whether we are giving out proxies in a
good way is to try to measure how "reliable" individual proxies have been
in the past. This is related to setting up persistent identifiers
(#29260).
For relays, directory authorities track the mean time between failures,
and we track this in Tor Metrics too.
> It might also be interesting to have some kind of proxy diversity metric
(e.g., whether 90% of all connections are handled by the same proxy). We
can get some idea with persistent identifiers (#29260), but of course
using a persistent identifier will always be optional. We can also do
collection of geoip country stats of proxies.
We don't really have this metric for relays yet, so if you have ideas that
would be applicable to relays too then that would be great. We know about
country/AS distribution, but we haven't quantified the diversity using any
particular formula.
> - Log all of the statistics in a reasonable format
This would ideally be a format that Tor Metrics is already handling. If it
could be based on the Tor directory protocol meta-format (ยง1.2 dir-spec)
then that would be great. We don't want to bring in dependencies for
parsing yaml/toml/etc. if we can help it.
> - coordinate with the metrics team to get these metrics collected and
visualized somewhere
Please also coordinate on what you want to collect, so we can consider if
that information already comes from somewhere, if we already had a plan
for it, and if it is safe or not.
--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/21315#comment:9>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online
More information about the metrics-bugs
mailing list