[tor-bugs] #5232 [BridgeDB]: Import bridges into BridgeDB in a separate thread and database transaction

Fri Mar 21 12:47:22 UTC 2014

#5232: Import bridges into BridgeDB in a separate thread and database transaction
-------------------------+-------------------------------------------------
     Reporter:  karsten  |      Owner:  sysrqb
         Type:  defect   |     Status:  needs_revision
     Priority:  major    |  Milestone:
    Component:           |    Version:
  BridgeDB               |   Keywords:  bridgedb-email, bridgedb-db,
   Resolution:           |  bridgedb-https, bridgedb-0.1.x
Actual Points:           |  Parent ID:
       Points:           |
-------------------------+-------------------------------------------------

Comment (by sysrqb):

 Replying to [comment:15 isis]:
 > Sweet. I had to deal with a bit of merge conflicts to get it into
 master... do you mind if I separate the additions to the unittest in
 2a21dfcb55e659775fcde9dd4f668b98f41d0fd6 into another unittest? If I do
 it, then you won't have to deal with the merge conflicts too.
 >
 Nope, that's fine by me. I have some more, but I've made them shorter this
 time.

 > So, this seems to work great, the parsing is done in a separate thread!
 However, the call which takes longer, especially at start up time, is the
 call to `bridgedb.Stability.addOrUpdateBridgeHistory()`. However, after
 start up, the HTTPS distributor continues to function and hand out bridges
 while the new descriptors are being parsed.
 >

 Indeed. This was a tradeoff between complexity and availability. In theory
 this branch should significantly increase the latter with a small amount
 of the former.

 > For 10,000 bridge descriptors, with `addOrUpdateBridges()`:
 > {{{
 >  * Starting the servers took:      1h 6m 58s
 >  * Restarting (SIGHUP) took:          2m 13s
 >  * Dumping buckets (SIGUSR1) took:       11s
 > }}}
 >

 10,000 bridges, time taken until normal operation resumed
 (time in parenthesis describe additional time taken for stability
 calculations)
 {{{
   * Starting the server took:      32s (3s + 11s (for cleanup))
   * Restarting (SIGHUP) took:      43s (4s + 10s (for cleanup))
 }}}

 I also checked the availability of the email and http distributors during
 reload and found that they do become unavailable for a few seconds (which
 is much better than the current situation, but still not good). There's
 only one blocking operation during reload which is when we overwrite the
 current data structures (with the new ones we just created in a background
 thread) in the main thread, so this seems like the obvious place for the
 bottleneck.

 More testing and a comparison of startup timing compared to master will
 follow.

--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/5232#comment:16>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online