[tor-bugs] #4405 [BridgeDB]: bridgedb's list of tor exit relays is down since bulk exit list is down
Tor Bug Tracker & Wiki
blackhole at torproject.org
Wed Feb 11 21:35:58 UTC 2015
#4405: bridgedb's list of tor exit relays is down since bulk exit list is down
-------------------------+-------------------------------------------------
Reporter: arma | Owner: isis
Type: defect | Status: closed
Priority: normal | Milestone:
Component: | Version:
BridgeDB | Keywords: bridgedb-0.2.5, isis2015Q1Q2,
Resolution: fixed | isisExB, isisExC
Actual Points: | Parent ID:
Points: |
-------------------------+-------------------------------------------------
Comment (by isis):
Replying to [comment:12 arma]:
> (For those following along and not wanting to paw through all the git
commits: it looks like isis opted to resume fetching from TorBulkExitList
now that it's more consistently back up.)
Oh, sorry to make you paw through them! I should have summarised better.
Basically, BridgeDB for the past several years had a cronjob to download
the exitlist via https://check.torproject.org/cgi-
bin/TorBulkExitList.py?ip=38.229.72.19&port=443. The issue wasn't that the
TorBulkExitList.py script was down (or, this hasn't been an issue for
quite a while). Rather, the issue was that BridgeDB had all these cronjobs
running to download new versions, but the new ones weren't getting loaded
into BridgeDB. Because BridgeDB tends to run for at least a couple months
at a time without the process being restarted, this could lead to
BridgeDB's notion of which IPs are Tor exits being slightly off, thus
allowing someone to use those exits which BridgeDB doesn't know about to
more effectively bypass BridgeDB's various rate-limiting mechanisms and
gain more information on bridge nodes.
Additionally, I personally think it's messy to have cronjobs downloading
files that are supposed to be in a certain directory and then get loaded
into the process… (I'd rather have BridgeDB do it's own things, as much as
possible by itself, so that it's easier for others to someday run their
own BridgeDBs.) So I changed this to no longer need the external cronjob,
but rather to use `twisted.internet.task`. Because of this, there is now
infrastructure to have BridgeDB run other repeating tasks without writing
much code:
In `bridgedb.conf`:
{{{
# TASKS is a dictionary mapping the names of tasks to the frequency with
which
# they should be run (in seconds). If a task's value is set to 0, it will
not
# be scheduled to run.
TASKS = {
# Download a list of Tor exit relays once every three hours (by
running
# scripts/get-exit-list) and add those exit relays to the list of
proxies
# loaded from the PROXY_LIST_FILES:
'GET_TOR_EXIT_LIST': 3 * 60 * 60,
}
}}}
And in `lib/bridgedb/Main.py`:
{{{
tasks = {}
# Setup all our repeating tasks:
if config.TASKS['GET_TOR_EXIT_LIST']:
tasks['GET_TOR_EXIT_LIST'] = task.LoopingCall(
proxy.downloadTorExits,
proxyList,
config.SERVER_PUBLIC_EXTERNAL_IP)
# Schedule all configured repeating tasks:
for name, seconds in config.TASKS.items():
if seconds:
try:
tasks[name].start(abs(seconds))
except KeyError:
logging.info("Task %s is disabled and will not run." %
name)
else:
logging.info("Scheduled task %s to run every %s seconds."
% (name, seconds))
# Actually run the servers.
try:
logging.info("Starting reactors.")
reactor.run()
}}}
Also, the entire process of retrieving, parsing, and loading the exit list
is now async, and data is handled as it arrives off the wire, so there's
no stalling while writing to disk.
Lastly, I wrote those patches two years ago. This ticket somehow got lost
in the bug tracker, and the branch got buried by other branches. :(
--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/4405#comment:13>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online
More information about the tor-bugs
mailing list