[tor-dev] Issues With Ticket #7532 - "Count unique IPs in an anonymous way"
samir menon
menon.samir at gmail.com
Tue Mar 28 20:24:15 UTC 2017
This ticket [1] was suggested as a GSoC project, but I think there might be
an issue with the security model/perceived threat.
To summarize the ticket and its child [1], basically, we currently store
all the IP's seen by a node so that we can count unique IP's. The idea is
that this is dangerous; if a node is compromised, then all of those IP
addresses can be retrieved from memory. Therefore, a variety of mitigation
methods have been proposed (most prominently, the 'Probabilistic Counting
Algorithm' from [2])
Here's my issue: what about brute force?
No matter what method we use, we will arrive at a data structure that
should be able to, given an IP address, tell us whether it is new (and we
should increment the unique counter) or old (and we should leave the unique
counter the same), with some reasonably small false positive rate.
Basically, we're supposed to use some kind of Bloom filter like structure.
Then can't that structure then be brute-forced, offline, by an attacker?
IPv4 addresses are 32-bits (~4.3 billion of them), so an attacker could
just run whatever method we use to check membership over and over, and then
recover the set of IP's. The same happens if we hash the IP's beforehand.
So, is this attack acceptable? The only mitigation I've seen is the one
referenced by 'Aaron' in the ticket, which is the system that git uses,
cryptolog; there, they have a random salt that changes daily. Then, an
attacker can only learn the IP's for one day. This sounds like a reasonable
compromise to me, but then the implementation becomes rather simple; just
hash the IP's with a random salt that changes daily before putting them in
the set.
IPv6 also solves this (128 bits), but there again, the solution is just to
hash the IP's before storing them - the Bloom filter/'Probabilistic
Counting Algorithm' is unnecessary.
I think I must be missing something about how the 'Probabilistic Counting
Algorithm' works - somehow, it needs to keep track of the # of unique IP's
without knowing (with a high probability) whether any 1 individual IP has
been seen.
Any help/pointing out of errors in my reasoning would be useful.
Thanks,
Samir Menon
menon.samir at gmail.com
samir2 at stanford.edu
[1] https://trac.torproject.org/projects/tor/ticket/7532
[2]
http://www.mathcs.emory.edu/~cheung/papers/StreamDB/Probab/1985-Flajolet-Probabilistic-counting.pdf
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.torproject.org/pipermail/tor-dev/attachments/20170328/ee342766/attachment.html>
More information about the tor-dev
mailing list