[tor-dev] Feedback on obfuscating hidden-service statistics

Tue Dec 2 15:31:52 UTC 2014

Comments on proposal 238:
1. I’m not convinced that the proposed amount of obfuscation is sufficient for the HS descriptor count. Adding noise to cover the contribution in a single period of any single HS doesn’t cover its vector of contributions. Thus, if over time the number of HSes stays the same (or has some other pattern that can be guessed by the adversary), then the randomness of the noise in the descriptor counts can effectively be removed by taking, say, taking the average. The best solution to this that I can think of is to bin every k consecutive integers and report the bin of the count after noise has been added. Then over time an adversary can at worst determine that the number of HSes lies within a range k. This applies to the cell counts also.

2. In 2.3, what exactly are “unique hidden-service identities”? .onion addresses?

3. It would hugely improve statistics accuracy to aggregate the statistics and only add noise once. However, this would require that the relays participate in a distributed protocol (e.g. [0]) rather than stick numbers in their extra-info docs.

4. Some possible privacy issues with revealing descriptor publication counts:
  - You wish to use hidden services in a way that involves a lot of .onion addresses for your service. This will blow past our noise, which I am assuming is calibrated to hide any single publication (or a small constant number of them). Then the total count could reveal when this new service appeared and is active (assuming the number of other descriptor publications is stable or otherwise predictable, say because they correspond to public HSes whose status can determined via a connection attempt).
  - You can factor out the noise over time if the total count is stable or otherwise predictable. This is the same issue as #1 above and using bins could work here as well.

[0] Our Data, Ourselves: Privacy via Distributed Noise Generation
  by Cynthia Dwork, Krishnaram Kenthapadi, Frank McSherry, Ilya Mironov, and Moni Naor
  EUROCRYPT 2006
  <http://research.microsoft.com/pubs/65086/odo.pdf>

On Nov 25, 2014, at 5:14 PM, A. Johnson <aaron.m.johnson at nrl.navy.mil> wrote:

> Hi George,
> 
>> I posted an initial draft of the proposal here:
>> https://lists.torproject.org/pipermail/tor-dev/2014-November/007863.html
>> Any feedback would be awesome.
> 
> OK, I’ll have a chance to look at this in the next few days.
> 
>> Specifically, I would be interested in undertanding the concept of
>> additive noise a bit better. As you can see the proposal draft is
>> still using multiplicative noise, and if you think that additive is
>> better we should change it. Unfortunately, I couldn't find any good
>> resources on the Internet explaining the difference between additive
>> and multiplicative noise. Could you expand a bit on what you said
>> above? Or link to a paper that explains more? Or link to some other
>> system that is doing additive noise (or even better its implementation)?
> 
> The technical argument for differential privacy is explained in <http://research.microsoft.com/en-us/projects/databaseprivacy/dwork.pdf>.  The definition appears in Def. 2, the Laplace mechanism is given in Eq. 3 of Sec. 5, and Thm. 4 shows why that mechanism achieves differential privacy.
> 
> But that stuff is pretty dry. The basic idea is that you’re trying to the contribution of any one sensitive input (e.g. a single user’s data or a single component of a single user’s data). The noise that you need to cover that doesn’t scale with the number of other users, and so you use additive noise.
> 
> Hope that helps,
> Aaron
> _______________________________________________
> tor-dev mailing list
> tor-dev at lists.torproject.org
> https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev