[tor-bugs] #24628 [Metrics/Consensus Health]: bwauth= bug in consensus health
Tor Bug Tracker & Wiki
blackhole at torproject.org
Fri Jan 26 03:38:30 UTC 2018
#24628: bwauth= bug in consensus health
--------------------------------------+---------------------
Reporter: tom | Owner: tom
Type: defect | Status: new
Priority: Medium | Milestone:
Component: Metrics/Consensus Health | Version:
Severity: Normal | Resolution:
Keywords: | Actual Points:
Parent ID: | Points:
Reviewer: | Sponsor:
--------------------------------------+---------------------
Comment (by tom):
Started digging into this using the example in #24877. Really weird.
{{{
bastet maatuska moria gabelmoo fara
Pg 4800 6080 5100 5270 4160
21 4770 6080 5110 5240 4160
20 4760 6070 5070 5230 4160
19 4750 6060 5060 5190 4160
18 4750 6050 5050 5190 4150
}}}
That's the page value (for 2018-01-11-20-00) and the vote values (from
collector) for surrounding hours. The consensus document says 5070, the
page says 5070 also. So the votes are wrong.
I parsed the moria vote with stem, and it gave me 5070.
I searched for any vote in January, made by moria that had a Measured
value of 5100. I got the following:
* ./11/2018-01-11-23-00-00-vote-
D586D18309DED4CD6D57C18FDB97EFA96D330566-A15ABFB2A6F993F16E8645C9C3AF16E13EA7934A-r
ForEdSnowden AaHRX5/GftBfopZv+IMGftUCzwg F+hQq2QhOd7rM7N7z+K+cBw/HCA
2018-01-11 14:52:14 51.15.133.16 9001 0
* ./24/2018-01-24-17-00-00-vote-
D586D18309DED4CD6D57C18FDB97EFA96D330566-5391D5738EF960227FDBA4776D914150BFEA1EDF-r
ForEdSnowden AaHRX5/GftBfopZv+IMGftUCzwg lmIBHy2xp+nAsZ9PuZMBPDLm660
2018-01-24 14:58:17 51.15.133.16 9001 0
* ./24/2018-01-24-16-00-00-vote-
D586D18309DED4CD6D57C18FDB97EFA96D330566-50F9CBB890A5BF2E4919DDB8FB5577FE48C34517-r
ForEdSnowden AaHRX5/GftBfopZv+IMGftUCzwg lmIBHy2xp+nAsZ9PuZMBPDLm660
2018-01-24 14:58:17 51.15.133.16 9001 0
Okay so 23:00 is nearby.
If I expand the table:
{{{
bastet maatuska moria gabelmoo fara
Pg 4800 6080 5100 5270 4160
0 4800 6110 5130 5270 4160
23 4800 6110 5100 5270 4160
22 4770 6110 5110 5250 4160
21 4770 6080 5110 5240 4160
20 4760 6070 5070 5230 4160
19 4750 6060 5060 5190 4160
18 4750 6050 5050 5190 4150
}}}
I checked henryi's timezone, and it's in UTC. The filename is written out
based on the consensus's time in the file.
Then I ran ps. And I found 3 processes running, one that had been running
for 30 minutes, one for 2.5 hours, and one for 3.5 hours.
Things are starting to come together. Maybe.
I already know the script sometimes dies due to out of memory errors. Now
I think I see why. I call subprocess.call at the end as a convenience.
This invokes fork, doubling the amount of memory I've used. (And it's a
lot.)
I'm going to replace those calls and hopefully it will resolve ALL of the
weird-ass errors we've been seeing with consensus-health.
--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/24628#comment:3>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online
More information about the tor-bugs
mailing list