[metrics-team] PrivCount in Tor session in Rome

teor teor2345 at gmail.com
Tue Mar 13 12:33:18 UTC 2018


> On 13 Mar 2018, at 12:57, Karsten Loesing <karsten at torproject.org> wrote:
> 
>> On 2018-03-13 12:06, teor wrote:
>> 
>>> On 13 Mar 2018, at 11:41, Karsten Loesing <karsten at torproject.org> wrote:
>>> 
>>> Hi teor,
>>> 
>>>> On 2018-03-13 09:00, teor wrote:
>>>>>> 2. What analysis can the metrics team do to help with PrivCount
>>>>>> design/development? There's something in the notes about flags changing
>>>>>> in 24 hour periods or possible partition of relays. Can you elaborate
>>>>>> and make these questions a lot more concrete? Maybe this is something I
>>>>>> can do in the next few days, with enough time for you to discuss more
>>>>>> with irl while you're in Rome?
>>>>> 
>>>>> We want to partition the reporting relays into 3 groups at random.
>>>>> (Or maybe some other number: there is a tradeoff between the number of
>>>>> groups, which resists manipulation by a single relay, and the quality of the
>>>>> resulting statistic.)
>>>>> 
>>>>> If we select relays from the consensus at random, do we get a roughly
>>>>> even distribution of consensus weight, guard weight, middle weight, and
>>>>> exit weight?
>>>>> 
>>>>> What if we only have 5% of relays reporting statistics?
>>>>> Can we still get roughly even total partition weights at random?
>>>>> (Please choose relays on the latest tor versions, because they will be the
>>>>> first to deploy PrivCount.)
>>> 
>>> Here's a graph (with and without annotations):
>>> 
>>> https://people.torproject.org/~karsten/volatile/partitions-2018-03-13.pdf
>>> 
>>> https://people.torproject.org/~karsten/volatile/partitions-2018-03-13-annotated.pdf
>> 
>> 0.3.2 has the expected consensus weight distribution.
>> And it's 2 months since 0.3.2 became stable:
>> https://trac.torproject.org/projects/tor/wiki/org/teams/NetworkTeam/CoreTorReleases
>> 
>> I would be happy to wait 2 months after a stable release for good statistics.
>> 
>>> Let me know if this makes sense, or which parameters I should tweak.
>> 
>> Can we focus on 0.3.2, and all relays?
> 
> That would be 0.3.2 or higher then. And all relays for comparison. Sure!
> 
>>> For
>>> example:
>>> 
>>> - Different number of groups (currently 3).
>> 
>> Can we try 3 and 5?
> 
> Yep!
> 
>>> - Different number of simulations (currently 1000).
>> 
>> That's fine.
> 
> Or, 40 simulations per consensus = 40 * 24 = 960 simulations in total.
> 
>>> - Different number of consensuses as input (currently 1).
>> 
>> We'll be collecting over a day, so please use 24 consensuses.
> 
> Okay. Note that I'm simply taking 24 consensuses rather than 1 and
> running simulations on that. I'm not tracking how relays stay online
> over these 24 hours. That would be a different simulation.
> 
>>>>> If we can't get even partitions by choosing relays at random, we will need
>>>>> to choose partitions weighted by consensus weight. Let's decide if we
>>>>> want to do that analysis after we see the initial results.
>>> 
>>> Let me know if you want me to try out a different algorithm. The current
>>> algorithm simply assigns relays to groups at random.
>> 
>> That seems to get us what we want, let's keep selecting at random.
> 
> Alright.
> 
> New graph:
> 
> https://people.torproject.org/~karsten/volatile/partitions-2018-03-13a.pdf

All these look fine.

But I'm having a bit of trouble seeing differences in the cumulative sum
graphs. Can we do a distribution of total consensus weights for the next
set of graphs? (That is, a graph that looks like this: _/\_, not this: _/-- )

When we choose a set of statistics to move to PrivCount, let's do a
simulation on historical relay stats, to check that we can add noise,
partition, aggregate, and bin, and interpret the results.

We'll need to do some more work before we're ready to do a simulation.
Like estimate individual client usage.

T


More information about the metrics-team mailing list