[tor-bugs] #7009 [Tor]: Handle unstable relays better
Tor Bug Tracker & Wiki
blackhole at torproject.org
Mon Jul 1 01:40:23 UTC 2013
#7009: Handle unstable relays better
---------------------------------+------------------------------------------
Reporter: arma | Owner:
Type: project | Status: new
Priority: normal | Milestone: Tor: unspecified
Component: Tor | Version:
Keywords: tor-relay, SponsorJ | Parent:
Points: | Actualpoints:
---------------------------------+------------------------------------------
Comment(by nickm):
I've been doing some experiments on consensus diffs. These are performed
against the current June archive of consensuses from metrics.tpo.
Unfortunately, it seems that microdescriptor consensuses are not included
in these archives, so my next step is to reconstruct/fake those, and run
my analysis on them.
Here are the average compressed and uncompressed consensus sizes, for
reference:
{{{
gz: mean 276414. median 276649
bz2: mean 229236. median 229452
xz: mean 248717. median 248908
uncompressed: mean 854707. median 856168
}}}
Note that we could save over 15% of our consensus traffic just by
switching our compression to bzip2.
First, I experimented with different diff formats, compressed both with
"gzip -9" and "bzip -9". I tried standard diff, unified diff (expected to
suck), and ed-style diff. Predictably, the ed style diff was the
smallest. I did these comparisons between consensuses that were 1, 2, 4,
6, and 8 hours apart.
{{{
EXPERIMENT 1: REGULAR CONSENSUSES
diff_gz: lag 1: mean 56833. median 55956
diff_gz: lag 2: mean 94252. median 93464
diff_gz: lag 4: mean 155933. median 155169
diff_gz: lag 6: mean 210275. median 209497
diff_gz: lag 8: mean 257186. median 257104
diff_bz2: lag 1: mean 54055. median 53210
diff_bz2: lag 2: mean 88863. median 88298
diff_bz2: lag 4: mean 146734. median 145987
diff_bz2: lag 6: mean 197943. median 197360
diff_bz2: lag 8: mean 242272. median 242403
diff_u_gz: lag 1: mean 173493. median 170642
diff_u_gz: lag 2: mean 233502. median 231120
diff_u_gz: lag 4: mean 296600. median 294960
diff_u_gz: lag 6: mean 335361. median 334571
diff_u_gz: lag 8: mean 361879. median 361714
diff_u_bz2: lag 1: mean 150857. median 148773
diff_u_bz2: lag 2: mean 202537. median 200466
diff_u_bz2: lag 4: mean 259363. median 257591
diff_u_bz2: lag 6: mean 299156. median 298317
diff_u_bz2: lag 8: mean 326355. median 326309
diff_e_gz: lag 1: mean 32380. median 31956
diff_e_gz: lag 2: mean 54258. median 53950
diff_e_gz: lag 4: mean 90962. median 90803
diff_e_gz: lag 6: mean 123734. median 123457
diff_e_gz: lag 8: mean 152219. median 152158
diff_e_bz2: lag 1: mean 29329. median 29017
diff_e_bz2: lag 2: mean 48470. median 48333
diff_e_bz2: lag 4: mean 80394. median 80229
diff_e_bz2: lag 6: mean 108689. median 108410
diff_e_bz2: lag 8: mean 133337. median 133124
condiff_gz: lag 1: mean 43806. median 35518
condiff_gz: lag 2: mean 70634. median 60643
condiff_gz: lag 4: mean 109675. median 112926
condiff_gz: lag 6: mean 140493. median 144589
condiff_gz: lag 8: mean 165163. median 167669
condiff_bz2: lag 1: mean 39425. median 32417
condiff_bz2: lag 2: mean 62459. median 53976
condiff_bz2: lag 4: mean 95757. median 98261
condiff_bz2: lag 6: mean 121948. median 125166
condiff_bz2: lag 8: mean 142900. median 144902
}}}
The "conndiff" algorithm is a quick hack I tried myself to try to answer
the question, "How bad would performance be if I hand-rolled a special-
purpose solution?" It works by copying the header and the footer
verbatim, and between them doing a router-by-router comparison. If a
router has been removed, it emits "-\n". If a router has been added, it
emits "* " and the new router entry. If a router has had its lines
change, but not the first line, it emits ".\n" and then all changed lines.
If a router has had lines change including the first line, it emits the
new first line, followed by all changed lines.
Next, I looked at the results. Where were most of the lines in the diffs?
Which lines and fields changed the most? What I found was that the most
commonly changed line was "w", followed by "r", followed by "s". When the
r lines changed, they most frequently changed in their descriptor digest
and published fields.
To experiment with the effects of quantizing w a little harder (yes, I
know that presents problems for bw measurement), I tried a variant
experiment where I rounded all weights under 8 to 8, and retained only the
1 most significant bit of bandwidths under 128, only the 2 msb of
bandwidths under 1024, and only the 3 msb of all other bandwidths.
{{{
EXPERIMENT 2: ROUNDED BANDWIDTHS
gz: mean 273004. median 273213
bz2: mean 225879. median 226083
xz: mean 246151. median 246396
uncompressed: mean 854437. median 855888
diff_gz: lag 1: mean 47681. median 46928
diff_gz: lag 2: mean 82396. median 81765
diff_gz: lag 4: mean 141794. median 140603
diff_gz: lag 6: mean 194732. median 194116
diff_gz: lag 8: mean 240577. median 240687
diff_bz2: lag 1: mean 46906. median 46173
diff_bz2: lag 2: mean 79749. median 79272
diff_bz2: lag 4: mean 135840. median 134990
diff_bz2: lag 6: mean 185960. median 185280
diff_bz2: lag 8: mean 229477. median 229776
diff_u_gz: lag 1: mean 95696. median 90798
diff_u_gz: lag 2: mean 151455. median 147334
diff_u_gz: lag 4: mean 227687. median 224847
diff_u_gz: lag 6: mean 280858. median 278609
diff_u_gz: lag 8: mean 319106. median 318876
diff_u_bz2: lag 1: mean 86029. median 81964
diff_u_bz2: lag 2: mean 134455. median 131198
diff_u_bz2: lag 4: mean 201620. median 199355
diff_u_bz2: lag 6: mean 249981. median 247689
diff_u_bz2: lag 8: mean 288642. median 288522
diff_e_gz: lag 1: mean 27796. median 27497
diff_e_gz: lag 2: mean 48344. median 48051
diff_e_gz: lag 4: mean 83978. median 83712
diff_e_gz: lag 6: mean 116101. median 115716
diff_e_gz: lag 8: mean 144135. median 143900
diff_e_bz2: lag 1: mean 25905. median 25665
diff_e_bz2: lag 2: mean 44093. median 43915
diff_e_bz2: lag 4: mean 75155. median 74913
diff_e_bz2: lag 6: mean 102904. median 102554
diff_e_bz2: lag 8: mean 127088. median 126825
condiff_gz: lag 1: mean 41209. median 33046
condiff_gz: lag 2: mean 67233. median 57367
condiff_gz: lag 4: mean 105669. median 108934
condiff_gz: lag 6: mean 136169. median 140411
condiff_gz: lag 8: mean 160624. median 163230
condiff_bz2: lag 1: mean 37453. median 30484
condiff_bz2: lag 2: mean 59874. median 51744
condiff_bz2: lag 4: mean 92699. median 95425
condiff_bz2: lag 6: mean 118630. median 121854
condiff_bz2: lag 8: mean 139430. median 141450
}}}
That helps a little, but it's sure not an obviously wonderful idea.
Next step: ersatz microdesc consensuses.
--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/7009#comment:3>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online
More information about the tor-bugs
mailing list