[tor-bugs] #29734 [Obfuscation/Snowflake]: Broker should receive country stats information from Proxy and Client
Tor Bug Tracker & Wiki
blackhole at torproject.org
Thu Mar 28 19:40:56 UTC 2019
#29734: Broker should receive country stats information from Proxy and Client
-------------------------------------+-----------------------------
Reporter: cohosh | Owner: cohosh
Type: enhancement | Status: merge_ready
Priority: Medium | Milestone:
Component: Obfuscation/Snowflake | Version:
Severity: Normal | Resolution:
Keywords: snowflake, geoip, stats | Actual Points: 2
Parent ID: #29207 | Points: 1
Reviewer: ahf | Sponsor: Sponsor19
-------------------------------------+-----------------------------
Comment (by dcf):
I have many comments, but overall my impression is good and I think you
can move ahead with this.
My big-picture question is: what do we plan to do with this data? If it's
to detect blocking events by comparing the broker's statistics against the
bridge's, I think we should at least sketch out those analysis scripts, in
order to see whether the client geoip data we will be collecting is suited
to the requirements. My main point is that we shouldn't collect data just
because it may be useful; instead we should design safe data collection
around some question we want to answer. As it stands, the branch will
collect more precise client data than Tor Metrics does (Tor Metrics
doesn't publish raw numbers but applies some fuzzing and binning). Having
/debug display precise counts is a danger in the following scenario: an
observer wants to determine whether a particular client is accessing the
Snowflake broker. Whenever the observer sees a suspected connection, it
checks the /debug output to see whether the count has incremented.
Perhaps we could do a test deployment for a few days, to get an idea of
what the data looks like. In fact, I think it's a good idea to try that,
before merging. If there's a research question that we think this data
could help us answer, we could ask the
[https://research.torproject.org/safetyboard.html Safety Board] to
evaluate it.
----
* This branch only tracks client accesses, not proxy accesses. What will
happen with proxy accesses, are they planned to go in the same metrics.log
file?
*
[https://github.com/cohosh/snowflake/blob/4de81ed039ce28264a98a040a0fde2237844e779/broker/broker.go#L271
broker.go]: The SIGHUP handler introduces a race condition because other
goroutines may be using the tables while they are being replaced. Even
though the table is replaced
[https://github.com/cohosh/snowflake/blob/4de81ed039ce28264a98a040a0fde2237844e779/broker/metrics.go#L64
all at once with an assignment], I don't think that even that assignment
is guaranteed to be atomic. The table really needs a lock—not only while
being reloaded from a file but on every access (because SIGHUP means the
data structure can change at any time). Alternately, consider removing the
SIGHUP feature and only loading the database once at startup.
*
[https://github.com/cohosh/snowflake/blob/4de81ed039ce28264a98a040a0fde2237844e779/broker/broker.go#L243
broker.go]
{{{
flag.StringVar(&geoipDatabase, "geoipdb", "/usr/share/tor/geoip", "path to
correctly formatted geoip database mapping IPv4 address ranges to country
codes")
flag.StringVar(&geoip6Database, "geoip6db", "/usr/share/tor/geoip6", "path
to correctly formatted geoip database mapping IPv6 address ranges to
country codes")
}}}
Is there any way to disable the geoip feature, if I don't have the
necessary files, other than by providing an invalid path? Maybe we don't
care to provide a way to disable the feature and it's the operator's
responsibility to provide some kind of geoip files; but in that case, it
may be better to exit with an error if the files cannot be read, rather
than logging the error and continuing on. It is perhaps surprising that
you can explicitly ask for a certain file on the command line with
`-geoipdb path`, and the broker will run even if the path cannot be read.
*
[https://github.com/cohosh/snowflake/blob/4de81ed039ce28264a98a040a0fde2237844e779/broker/metrics.go#L47
UpdateCountryStats]: It looks to me like if a geoip lookup fails, it is
silently ignored—this could be misleading. It would be good to distinguish
three cases: geoip file present and lookup found a country code; geoip
file present but lookup found nothing; geoip file absent so no lookup is
possible.
* Adding to that, tor records the hashes of its geoip files:
[https://gitweb.torproject.org/torspec.git/tree/dir-
spec.txt?id=6e58fa857a95e18c5fda0ecf17955da7919b260b#n855 geoip-db-digest]
and [https://gitweb.torproject.org/torspec.git/tree/dir-
spec.txt?id=6e58fa857a95e18c5fda0ecf17955da7919b260b#n861 geoip6-db-
digest]. It would be good to record that information so we can
retroactively determine if geoip files were out of date, or diagnose geoip
failures.
* I think that some of the logging is happening at too low a level. The
most basic functions should return errors and not log. For example
[https://github.com/cohosh/snowflake/blob/4de81ed039ce28264a98a040a0fde2237844e779/broker/geoip.go#L68
geoipStringToIP] should just return the `err` from `strconv.ParseUint`.
[https://github.com/cohosh/snowflake/blob/4de81ed039ce28264a98a040a0fde2237844e779/broker/metrics.go#L78
NewMetrics], too, should return any error resulting from filesystem
operations.
*
[https://github.com/cohosh/snowflake/blob/4de81ed039ce28264a98a040a0fde2237844e779/broker/geoip.go#L164
GeoIPRangeClosure]: I don't understand why this is called "closure"; it's
just a function.
*
[https://github.com/cohosh/snowflake/blob/4de81ed039ce28264a98a040a0fde2237844e779/broker/geoip.go#L15
geoip.go]
[https://github.com/cohosh/snowflake/blob/4de81ed039ce28264a98a040a0fde2237844e779/broker/geoip.go#L67
geoip.go] "4-byte unsigned integers": Please document whether it's big-
endian or little-endian. "INTIPLOW,INTIPHIGH,CC": Similarly, document
whether the ranges are inclusive of the high element or exclusive (I guess
it would have to be inclusive, in order to represent 255.255.255.255?).
* The `parseEntry` functions need error checking on `geoipStringToIP` and
`net.ParseIP`, or else they will store a `nil` that will result in a panic
later on.
[https://github.com/cohosh/snowflake/blob/4de81ed039ce28264a98a040a0fde2237844e779/broker/geoip.go#L93
This error] will be more useful if it includes a line number. Let me
suggest a different factoring of the geoip parsing code. Have `parseEntry`
be a plain function, not a method on `GeoIPv4Table`, and have it return
`(GeoIPEntry, error)` rather than inserting into the table itself.
GeoIPLoadFile can do the entry insertion, count lines and annotate the
error result from `parseEntry`.
*
[https://github.com/cohosh/snowflake/blob/4de81ed039ce28264a98a040a0fde2237844e779/broker/geoip.go#L148
geoip.go]: Check [https://golang.org/pkg/bufio/#Scanner.Err scanner.Err()]
after the loop.
*
[https://github.com/cohosh/snowflake/blob/4de81ed039ce28264a98a040a0fde2237844e779/broker/geoip.go#L53
GeoipError]: Unless you need to do type matching on errors, you can just
use `errors.New` or `fmt.Errorf` instead of making a new type.
*
[https://github.com/cohosh/snowflake/blob/4de81ed039ce28264a98a040a0fde2237844e779/broker/geoip.go#L192
GetCountryByAddr]: I think a more natural API for this function would be
to have it take a `net.IP` instead of a `string` (seeing as all it does
with the string is immediately parse it, and
[https://github.com/cohosh/snowflake/blob/4de81ed039ce28264a98a040a0fde2237844e779/broker/metrics.go#L29
UpdateCountryStats] is already parsing it).
* Can
[https://github.com/cohosh/snowflake/blob/4de81ed039ce28264a98a040a0fde2237844e779/broker/geoip.go#L164
GeoIPRangeClosure] be written as `bytes.Compare(key.To16(),
entry.ipHigh.To16()) <= 0`, and
[https://github.com/cohosh/snowflake/blob/4de81ed039ce28264a98a040a0fde2237844e779/broker/geoip.go#L177
GeoIPCheckRange] as `bytes.Compare(key.To16(), entry.ipLow.To16()) >= 0 &&
bytes.Compare(key.To16(), entry.ipHigh.To16()) <= 0`?
*
[https://github.com/cohosh/snowflake/blob/4de81ed039ce28264a98a040a0fde2237844e779/broker/metrics.go#L48
metrics.go]
{{{
m.countryStats.counts[country] = m.countryStats.counts[country] + 1
}}}
You can do this as `m.countryStats.counts[country]++` or
`m.countryStats.counts[country] += 1`.
*
[https://github.com/cohosh/snowflake/blob/4de81ed039ce28264a98a040a0fde2237844e779/broker/metrics.go#L63-L64
metrics.go]: I don't think the initial `nil` assigment does anything.
{{{
m.tablev4 = nil // garbage collect previous table, if applicable
m.tablev4 = tablev4
}}}
*
[https://github.com/cohosh/snowflake/blob/4de81ed039ce28264a98a040a0fde2237844e779/broker/metrics.go#L97-L98
metrics.go]
{{{
for {
<-heartbeat
}}}
Consider doing this as `for range heartbeat {`. That way, the loop will
terminate if the `heartbeat` channel gets closed.
* If `NewMetrics` is called more than once, there will be multiple open
file handles to metrics.log (and only the first one will actually work, I
think). Is it possible that `NewMetrics` could be called more than once?
*
[https://github.com/cohosh/snowflake/blob/4de81ed039ce28264a98a040a0fde2237844e779/broker
/snowflake-broker_test.go#L274-L279 snowflake-broker_test.go]: I'm not
crazy about having the tests depend on external files, especially ones
that may change. Ideally the test code would include its own small geoip
files, either as separate files or as literals in the source code. But at
least, the test should [https://golang.org/pkg/testing/#hdr-Skipping
t.Skip] if a file is missing, rather than report spurious failure. I
suggest adding a layer of abstraction: a `GeoIPLoad(io.Reader)` that
parses any `io.Reader` (for examples a `bytes.Reader`), and write
`GeoIPLoadFile` in terms of it.
* It would be nice to also have tests that exercise the error paths in
geoip parsing.
* I would like to see unit tests for basic functions like
`GeoIPRangeClosure` and `GeoIPCheckRange`. They don't have to be long,
just test the edge conditions of an instance.
--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/29734#comment:18>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online
More information about the tor-bugs
mailing list