[tor-dev] Feedback on obfuscating hidden-service statistics
George Kadianakis
desnacked at riseup.net
Wed Nov 26 12:45:59 UTC 2014
"A. Johnson" <aaron.m.johnson at nrl.navy.mil> writes:
> Hi George,
>
>> I posted an initial draft of the proposal here:
>> https://lists.torproject.org/pipermail/tor-dev/2014-November/007863.html
>> Any feedback would be awesome.
>
> OK, I’ll have a chance to look at this in the next few days.
>
>> Specifically, I would be interested in undertanding the concept of
>> additive noise a bit better. As you can see the proposal draft is
>> still using multiplicative noise, and if you think that additive is
>> better we should change it. Unfortunately, I couldn't find any good
>> resources on the Internet explaining the difference between additive
>> and multiplicative noise. Could you expand a bit on what you said
>> above? Or link to a paper that explains more? Or link to some other
>> system that is doing additive noise (or even better its implementation)?
>
> The technical argument for differential privacy is explained in
> <http://research.microsoft.com/en-us/projects/databaseprivacy/dwork.pdf>.
> The definition appears in Def. 2, the Laplace mechanism is given in
> Eq. 3 of Sec. 5, and Thm. 4 shows why that mechanism achieves
> differential privacy.
>
> But that stuff is pretty dry. The basic idea is that you’re trying to
> the contribution of any one sensitive input (e.g. a single user’s data
> or a single component of a single user’s data). The noise that you
> need to cover that doesn’t scale with the number of other users, and
> so you use additive noise.
>
Thanks for the resources!
I think I now get the general idea. I don't really understand why it
works or why Laplace is the best distribution for this job, but maybe
it doesn't matter too much for now.
The next problem is how to find the proper parameters for the Laplace
distribution. I guess the mean μ needs to be 0, but the hard part is
'b'. In a few papers I read, they set 'b' to (Δf/ε).
In the above, Δf is the "largest change a single participant could
have on the output" of the query. Trying to fit this database paradigm
to our use case, the largest change a single HS could cause to the
HSDir HS counting stats is change the result by 1. So Δf is 1, and I
think that ε is some kind of security (sensitivity) parameter, let's
set that to 0.3 or something.
So this gives us approx Laplace(0, 4) which can be seen with blue color here:
https://upload.wikimedia.org/wikipedia/commons/0/0a/Laplace_pdf_mod.svg
In the end of this post, I put a few samples from this distribution [0].
The generated noise seems reasonable for this job.
Now, I'm wondering how to do the same thing for the RP cell
statistics. In this case, Δf would have to be the largest amount of
cells we hope to obfuscate in an RP circuit. This is a chicken-and-egg
situation, since we don't really know how many cells we usually get
without doing these stats first.
Maybe we can use the preliminary stats from #13192, which contain both
RP and IP cells (but IP cells will probably be a minority). Or maybe
we can fit the distribution dynamically based on the amount of cells
we receive every day (does this even make sense)? Or what?
BTW, I plan to start implementantion of this early next week, so that
it's ready by mid-December. I hope we have a good solution to this by
then, otherwise I will have to do something else (round up the stats
to the nearest multiple or something) :/
Thanks!
[0]:for i in xrange(100):
print numpy.random.laplace(0,4)
....:
3.75587440621
-4.28136229035
4.76311443928
4.05142557505
1.70198910055
-3.37374208295
1.12837234927
-0.905282823974
7.66083097188
0.246385660561
-3.52939581339
-1.3368353768
-1.7482807282
2.98489896819
2.87155179984
-2.72961210143
-3.04409210121
-1.1975804202
-0.34861261134
0.953918739146
-14.3586324803
0.272984575989
3.41377347603
6.48752681038
-4.74036696099
0.668294672995
3.15847434594
-1.58855932489
1.65921624515
-0.529373859224
1.1739048689
-2.2201602699
-0.510111160097
2.58474424973
-7.4773321899
-13.4406958005
-1.34083931335
0.34051030906
-1.09939649788
0.647560027442
2.05240761873
-0.275439053432
10.1238334205
9.0960448449
3.20236196087
-2.27093832694
-19.187310803
0.894898545361
3.62459774003
2.10979313978
-0.633823085078
3.32591049399
3.11206489604
6.52626921692
2.68590966921
1.64033470377
0.997911309606
2.39357922671
0.308907976786
-3.02768280735
3.07096999256
-0.907608650976
1.72587291595
-0.838153001361
-1.23764100768
-9.56662634071
7.89275256421
10.4346665539
-0.522605672578
1.88585708734
1.77708545023
-0.301420228241
8.69964251692
-3.35490635732
-3.14148766097
1.73070057195
0.0426008469217
-1.74373108092
4.18116416817
0.139266645962
-1.32024236062
-2.40639481448
-0.364266143555
-3.6882489347
5.79025078063
0.386467380832
2.5388775702
1.60630747885
3.53930934459
-3.2270856708
4.15611732496
6.53669582267
3.83838409062
-2.62835636891
-1.36484455975
5.02827935505
0.693370215176
1.91312352565
1.93007931702
3.24710666718
More information about the tor-dev
mailing list